Folks… I know this is old… About 7 months and I know gnatfnt probably wont come back to see this but… I found this moderately useful. I was in a different but similar situation where I felt I needed to take the pdf title from the metadata since the filenames werent even close and sort and archive them in an encrypted zip. So I did this in my script. I added some error handling and made it not so dumb
#!/bin/sh
# Set a flag to track whether any files were renamed
renamed_any=0
# Find all PDF files in the current directory
for pdf in *.pdf; do
# Only process if it's a file
[ -f "$pdf" ] || continue
# Extract the title from the PDF file using pdfinfo
title=$(pdfinfo "$pdf" | awk -F ': ' '/Title:/ {gsub(/^[ \t]+|[ \t]+$/, "", $2); print $2}')
# Check for non-empty title
if [ -n "$title" ]; then
# Replace spaces with underscores
new_name=$(echo "$title" | tr ' ' '_').pdf
# Check for filename conflict
if [ "$new_name" != "$pdf" ] && [ ! -e "$new_name" ]; then
if mv "$pdf" "$new_name"; then
renamed_any=1
else
echo "Failed to rename '$pdf' to '$new_name'" >&2
fi
else
echo "Cannot rename '$pdf': target '$new_name' exists or is the same as the source." >&2
fi
else
echo "No title found for '$pdf', skipping." >&2
fi
done
# Archive the renamed PDF files into an encrypted zip file if any were renamed
if [ "$renamed_any" -eq 1 ]; then
zip -P "CHANGEME" archived_pdfs.zip *.pdf || echo "Failed to create zip archive." >&2
else
echo "No files were renamed; skipping zip creation." >&2
fi
As for my feedback for the OP
I did make some changes to what you did. Minor ones and I will explain them after I output the script here:
#!/bin/sh
keywords="keyword1 keyword2 keyword3 keyword4 keyword5 keyword6 keyword7 keyword8 keyword9 keyword10"
for file in *.pdf; do
[ -e "$file" ] || continue # Skip if no PDF files exist
echo "Processing file: $file"
# Extract dates from the PDF using pdfgrep
dates=$(pdfgrep -A 3 "abc\|abcd\|abcd;" "$file" | \
grep -oE '202[0-9]-[0-9][0-9]-[0-9][0-9]\|[0-9][0-9]-[0-9][0-9]-202[0-9]\|[0-9][0-9]-[0-9][0-9]-2[0-9]\|[0-9][0-9]\.[0-9][0-9]\.2[0-9]')
# Check for valid date presence
if [ -n "$dates" ]; then
latest_date=$(echo "$dates" | sort -u | tail -n 1)
else
echo "No valid dates found, skipping..."
continue
fi
echo "Latest date: $latest_date"
keyword=""
# Loop over keywords to build keyword string
for key in $keywords; do
if match=$(pdfgrep -i "$key" "$file" | grep -oE "$key"); then
keyword="${keyword}${match}_"
fi
done
# Remove trailing underscore if any keywords were found
keyword=${keyword%_}
echo "Keywords found: $keyword"
# Determine new filename based on dates and keywords
if [ -n "$latest_date" ]; then
new_filename="${latest_date}${keyword:+_$keyword}.pdf"
# Handle filename collisions
counter=1
while [ -e "$new_filename" ]; do
new_filename="${latest_date}${keyword:+_$keyword}_$counter.pdf"
counter=$((counter + 1))
done
mv "$file" "$new_filename" && echo "Renamed: $file to $new_filename"
fi
done
So I am a bit pedantic. I like scripts to be AS PORTABLE as possible which means full POSIX compliance. Its often overlooked . What did I do to achieve posix complaiance? Well I just changed #!/bin/bash to #!/bin/sh to ensure compatibility with POSIX-compliant shells. That really is all you need to do. If you want to be extra pedantic
Also I did change how the keyword handling functioned. Instead of using an array syntax that is specific to bash, a space-separated list was my choice instead, which works with sh.
I think you shouldn’t skip out on error handling if you ever intend to use this script later. So I added error handling for files not being found by placing check [ -e "$file" ] || continue to skip the loop iteration if no PDF files exist, ensuring that the script does not run without valid input.
In addition to the above changes I feel you should use a pipe to pass the content to the grep call for better readability too. Just a little creature comfort here. Additionally, sort -u is used to avoid duplicates directly while obtaining the latest date.
The next change I made was with your keyword building. Instead of concatenating strings with _ and checking for any null characters, I directly utilized string expansion with keyword="${keyword}${match}_" while keeping the existing checks. You might find this useful for future scripts. 
Now moving onto the next thing. Something I make a habit of in my scripts and you may find helpful in later script writing is ensuring that I properly write new filenames. I used conditional expansion ${keyword:+_$keyword} to only include the keyword part in the filename if keyword is non-empty, also compacting the condition and removing a redundant if check that you placed in your code.
The last bit of Error Handling was for you move operation/ The mv command now has an && to echo the rename message only if the move was successful which in my opinion is just another way to ensure the script handles everything appropriately.
If you never come back @gnatfnt hopefully this email notification finds you well 