Mass Archived File Renaming

I’ve been experimenting with data hoarding/archival as a hobby/learning material for a few years. Back when I was still learning the syntax and how to use of a lot of tools, I used to use a different naming scheme for my folder structure.

When I downloaded material from a website using an archiver tool, I usually had it download into some top-level directory like “Download”
Inside of Download, every user had a folder that would use their username followed by a user ID in brackets.
Inside of the user folder, the files I wanted to be archived would be saved, again usually with the video/post title, followed by the upload/page/video ID in brackets. Sometimes, for playlists or multi-file posts, there would be an index number immediately after the post ID.
After I learned how to use the download tools better, I started to save a metadata folder or two inside the user’s folder, along with the files.
Inside the metadata folders, there are .txt or .json files, one for each downloaded video/picture/post/file, with the same name (numerical ID).ext format.

As time has gone on I have moved towards prefering strictly numerical folders and filenames (since the metadata files contain all relevant information for each file anyway).
I’ve also gotten fed up with channels/users changing their username, causing archive tools to redownload ALL of a user’s past uploads under a slightly different folder name.
I’ve also caught the same thing happening for individual posts that get renamed e.g. YouTube videos that get renamed years later (but obviously have the same ID)

When I eventually decided to move to more numerical file & folder naming, I picked up all of my “old” archived stuff and essentially moved it to “Downloads (old)” or something along those lines, hoping that one day I’d learn how to efficiently go through and prune the unnecessary characters from the file tree.

That’s what I’ve been trying to do for the last few days. I’ve been trying to come up with a command that will recursively go through an entire file tree like this:

Downloads/user’s F@ncyN4mE (12345)/cool tutorial on how to draw things (67890) p0.jpg
Downloads/user’s F@ncyN4mE (12345)/cool tutorial on how to draw things (67890) p1.jpg
Downloads/user’s F@ncyN4mE (12345)/metadata/cool tutorial on how to draw things (67890).txt
Downloads/user’s F@ncyN4mE (12345)/banner (20230101).jpg
Downloads/user’s F@ncyN4mE (12345)/profilePic (20230101).jpg
Downloads/user’s F@ncyN4mE (12345)/profilePic (20230102).jpg
Downloads/user’s F@ncyN4mE (12345)/profilePic (20230203).jpg
Downloads/professional artist with 日本語 name (11223344)/クリップスタジオシェーディング (55678) p0.jpg
Downloads/professional artist with 日本語 name (11223344)/クリップスタジオシェーディング (55678) p1.jpg
Downloads/professional artist with 日本語 name (11223344)/クリップスタジオシェーディング (55678) p2.jpg
Downloads/professional artist with 日本語 name (11223344)/クリップスタジオシェーディング (55678) p3.jpg
Downloads/professional artist with 日本語 name (11223344)/metadata/クリップスタジオシェーディング (55678).json

And rename everything like this:

Downloads/12345/67890 p0.jpg
Downloads/12345/67890 p1.jpg
Downloads/12345/metadata/67890.txt
Downloads/12345/banner (20230101).jpg
Downloads/12345/profilePic (20230101).jpg
Downloads/12345/profilePic (20230102).jpg
Downloads/12345/profilePic (20230203).jpg
Downloads/11223344/55678 p0.jpg
Downloads/11223344/55678 p1.jpg
Downloads/11223344/55678 p2.jpg
Downloads/11223344/55678 p3.jpg
Downloads/11223344/metadata/55678.json

All while preferably preserving folder timestamps (less important) & file timestamps (very important, MUST be left alone, as I sometimes reference these in my different archive datasets)

All of this stuff is now located on my Unraid-based NAS, and I’ve been looking at ways that I could mass-rename files on a Linux machine using rules.
I’ve been looking into regex for the past few days, but since I’m still just learning the ropes for this type of stuff, I thought I should ask for help.

I figured out a way to use regex to select all of the text in the file names up to and including the opening bracket before the post ID, but I have no idea how to add multiple rules or exceptions (in case I want to keep a channel’s banner image, or user’s avatar/profile picture).
I’m usually pretty good at reverse-engineering things once I have them; usually if I just find a few snippets of code for something, I can repurpose it to do what I need, but this sort of mass, recursive, rule-based file operation is too many components for me to start from scratch with.
That’s why I thought I’d come to Level1; I was hoping someone here would be well-versed enough in Linux to show me how to do this.

Check this demonstration, maybe it gives some kind of inspiration:
(timestamp 16:40)

1 Like

Id ask chatgpt
Whats the worst the could happen

This

I can provide you with a basic Python script that uses regex to rename your files as you’ve described. This script will traverse directories, identify patterns in filenames, and rename them as needed.

Here’s a script to achieve this:

pythonCopy code

import os
import re

def rename_files(path):
    for root, dirs, files in os.walk(path, topdown=False):  # topdown=False ensures we rename files before renaming their parent directories
        for filename in files:
            new_filename = re.sub(r'^.*?\((\d+)\)', r'\1', filename)
            if new_filename != filename:
                src = os.path.join(root, filename)
                dst = os.path.join(root, new_filename)
                os.rename(src, dst)
                print(f"Renamed: {src} -> {dst}")

        for dirname in dirs:
            new_dirname = re.sub(r'^.*?\((\d+)\)', r'\1', dirname)
            if new_dirname != dirname:
                src = os.path.join(root, dirname)
                dst = os.path.join(root, new_dirname)
                os.rename(src, dst)
                print(f"Renamed: {src} -> {dst}")

# Use the script
path = "/path/to/Downloads"  # replace this with the path to your Downloads directory
rename_files(path)

Here’s what the script does:

  1. Uses os.walk to traverse through directories and get filenames.
  2. Uses re.sub with a regex pattern to find filenames with your specified structure and rename them based on the pattern.
  3. Does the same for directory names.

What you should do:

  1. Backup your data. Whenever you’re running scripts that modify files/folders, there’s always a chance something could go wrong.
  2. Replace the path variable value with the path to your “Downloads” directory.
  3. Run the script.

This script should take care of the renaming as you’ve described. The Python os.rename function should not alter timestamps. However, I strongly recommend testing the script on a small subset of your data before applying it to everything, just to be safe.

1 Like

I asked a few more questions

an it report the errors and file so I can deal with it myself

ChatGPT

Certainly! We can modify the script to log the errors to a file so that you can review them later and address any issues manually. Let’s add logging capabilities to the script:

pythonCopy code

import os
import re
import logging

# Set up logging
logging.basicConfig(filename='rename_errors.log', level=logging.ERROR, format='%(asctime)s - %(message)s')

def rename_files(path):
    for root, dirs, files in os.walk(path, topdown=False):  # topdown=False ensures we rename files before renaming their parent directories
        for filename in files:
            new_filename = re.sub(r'^.*?\((\d+)\)', r'\1', filename)
            if new_filename != filename:
                src = os.path.join(root, filename)
                dst = os.path.join(root, new_filename)
                try:
                    os.rename(src, dst)
                    print(f"Renamed: {src} -> {dst}")
                except Exception as e:
                    error_message = f"Error renaming file: {src}. Error: {e}"
                    print(error_message)
                    logging.error(error_message)

        for dirname in dirs:
            new_dirname = re.sub(r'^.*?\((\d+)\)', r'\1', dirname)
            if new_dirname != dirname:
                src = os.path.join(root, dirname)
                dst = os.path.join(root, new_dirname)
                try:
                    os.rename(src, dst)
                    print(f"Renamed: {src} -> {dst}")
                except Exception as e:
                    error_message = f"Error renaming directory: {src}. Error: {e}"
                    print(error_message)
                    logging.error(error_message)

# Use the script
path = "/path/to/Downloads"  # replace this with the path to your Downloads directory
rename_files(path)

This script initializes a logger that writes error messages to rename_errors.log. Whenever there’s an error renaming a file or directory, the script will log the error and the path of the file/directory that caused the issue. After running the script, you can review rename_errors.log for any issues.

1 Like

@Baz Thanks; this is exactly the type of material I thought I’d be looking for when the time came to deal with this. I got the other solution posted to work, but I will still look into this (I want to brush up on my Unix knowledge anyway).

@Marten Thanks, I had considered ChatGPT but upon being hit with a login requirement, I immediately dismissed it and came here hoping for pointers to material like what @Baz suggested. Instead of just having the machine do all the work for me, I wanted the chance to actually learn something, too.

But I still had a look at the code anyway and modified it myself to work a little more flexibly with my file structure. The current script I’ve got is:

import os
import re
import logging
import fnmatch

# Set up logging
logging.basicConfig(filename='rename_errors.log', level=logging.ERROR, format='%(asctime)s - %(message)s')

def rename_files(path):
    for root, dirs, files in os.walk(path, topdown=False):  # topdown=False ensures we rename files before renaming their parent directories
        for filename in files:
            if fnmatch.fnmatch(filename, '*avatar*'):
                src = os.path.join(root, filename)
                print(f"Skipping file: {src}. Avatar file.")
            elif fnmatch.fnmatch(filename, '*background*'):
                src = os.path.join(root, filename)
                print(f"Skipping file: {src}. Background File.")
            else:
                new_filename = re.sub(r'^.*?\((\d+)\)', r'\1', filename)
                if new_filename != filename:
                    src = os.path.join(root, filename)
                    dst = os.path.join(root, new_filename)
                    try:
                        os.rename(src, dst)
                        print(f"Renamed: {src} -> {dst}")
                    except Exception as e:
                        error_message = f"Error renaming file: {src}. Error: {e}"
                        print(error_message)
                        logging.error(error_message)

        for dirname in dirs:
            new_dirname = re.sub(r'^.*?\((\d+)\)', r'\1', dirname)

            if new_dirname != dirname: # if directory gets renamed
                isExist = os.path.exists(f'{path}\{new_dirname}') # check to see if directory already exists
                if isExist == True: # if directory DOES exist
                    ExistMsg = f"Skipping directory {dirname}. {new_dirname} already exists."
                    print(ExistMsg)
                    logging.error(ExistMsg)
                elif isExist == False: # if directory does NOT exist
                    src = os.path.join(root, dirname)
                    dst = os.path.join(root, new_dirname)
                    try:
                        os.rename(src, dst)
                        print(f"Renamed: {src} -> {dst}")
                    except Exception as e:
                        error_message = f"Error renaming directory: {src}. Error: {e}"
                        print(error_message)
                        logging.error(error_message)

# Use the script
path = "."  # replace this with the path to your Downloads directory. "." runs the script in the current working directory (where the script is).
rename_files(path)

I’m NOT particularly skilled with Python; my additions could be improved upon further, and some of the code was just me playing & experimenting with the language.

I was thinking of adding a bit more functionality or configuration, but I’m temporarily restricted to using a retro X58 system and my Razer keyboard’s software (Synapse3) seems to hate the ICH10.

I’ve used the script on some copied-over test data (on a Windows system) and my modified version seems to work perfectly (and much quicker than I expected for so many small files).

Thanks L1.