(mis)adventures in software development...

29 March 2012

Scripts for batch renaming of files to add a date prefix

Category Programming

A financial institution of which I am a customer (which shall remain nameless) frequently (like almost every weekday!) emails me PDF statements. I rarely need to refer to said statements, so I could just configure my email client to automatically filter them away into some folder and not give them the slightest consideration. But occasionally I might need to check something in one of these statements, usually around Tax Return time, and usually something from many months/years past. Sifting through PDFs stored as email attachments is not ideal. Therefore, I download the statements, and store them in a Dropbox folder for safekeeping, in case The Tax Man ever enquires about their numerical specifics. This makes things easier to find if I need to look something up. Or it would if the financial institution in question had shown more foresight in the naming convention of said statements. They have filenames like:

29 March 2012 Statement.pdf

I probably shouldn’t complain about this. At least there is an unambiguous and — more importantly — consistent date component as part of the filename. Including the year. That’s certainly better than having each and every statement file called “Statement.pdf”.

Still, ideally I’d want the date to be of the format YYY-mm-dd, so that a statement for a particular date is easier to find, given a directory full of these files.

But I’m certainly not going to bother renaming each file manually. Especially since I’m a coder. I’ll write a script to do this instead. But if I’m going to write a script, might as well also automate the step of copying these PDF files from the “Downloads” folder, where the attachments initially end up, to their ultimate destination in my Dropbox.

So here’s an implementation of this as a Bash script:



shopt -s nullglob
for f in ${FROM_DIR}/*Statement.pdf
    date_part=`basename "${f}" | sed -e 's/ *[S|s]tatement.*pdf//;'`
    new_date=`date --date="$date_part" "+%Y-%m-%d"`
    suffix=`basename "${f}" | sed -e 's/ //g;'`
    if [ -e "${TO_DIR}/${new_filename}" ]; then
        echo "`basename "${f}"` not copied. File already exists: ${new_filename}"
        echo "Moving `basename "${f}"` to ${new_filename}"
        mv -i "$f" "${TO_DIR}/${new_filename}"

And here’s the same thing as a PowerShell script, so that I can download these statements from whatever machine I’m currently working on, be it Linux or Windows:


$files = get-childitem $from_dir\* -include *Statement.pdf
foreach ($file in $files) {
    if ($file.name -match "^\d\d [A-Z]{3} \d{4} Statement.pdf$") {
        $date_part = [regex]::Replace($file.name,(" *Statement.*\.pdf"),"")
        $new_date = get-date $date_part -format yyyy-MM-dd
        $suffix = $file.name -replace " ", ""
        if ($?) {
            $new_file = $new_date.toString() + "_" + $suffix
            "Moving " + $file.name + " to " + $new_file
            $new_path = $to_dir + "\" + $new_file
            if (Test-Path  ($new_path) -PathType leaf) {
                "File already exists in destination!!!"
            else {
                move-item $file $new_path
        else {
            "Error processing " + $file.name
    else {
        "Skipping " + $file.name