I’ve been experimenting with data hoarding/archival as a hobby/learning material for a few years. Back when I was still learning the syntax and how to use of a lot of tools, I used to use a different naming scheme for my folder structure.
When I downloaded material from a website using an archiver tool, I usually had it download into some top-level directory like “Download”
Inside of Download, every user had a folder that would use their username followed by a user ID in brackets.
Inside of the user folder, the files I wanted to be archived would be saved, again usually with the video/post title, followed by the upload/page/video ID in brackets. Sometimes, for playlists or multi-file posts, there would be an index number immediately after the post ID.
After I learned how to use the download tools better, I started to save a metadata folder or two inside the user’s folder, along with the files.
Inside the metadata folders, there are .txt or .json files, one for each downloaded video/picture/post/file, with the same name (numerical ID).ext format.
As time has gone on I have moved towards prefering strictly numerical folders and filenames (since the metadata files contain all relevant information for each file anyway).
I’ve also gotten fed up with channels/users changing their username, causing archive tools to redownload ALL of a user’s past uploads under a slightly different folder name.
I’ve also caught the same thing happening for individual posts that get renamed e.g. YouTube videos that get renamed years later (but obviously have the same ID)
When I eventually decided to move to more numerical file & folder naming, I picked up all of my “old” archived stuff and essentially moved it to “Downloads (old)” or something along those lines, hoping that one day I’d learn how to efficiently go through and prune the unnecessary characters from the file tree.
That’s what I’ve been trying to do for the last few days. I’ve been trying to come up with a command that will recursively go through an entire file tree like this:
Downloads/user’s F@ncyN4mE (12345)/cool tutorial on how to draw things (67890) p0.jpg
Downloads/user’s F@ncyN4mE (12345)/cool tutorial on how to draw things (67890) p1.jpg
Downloads/user’s F@ncyN4mE (12345)/metadata/cool tutorial on how to draw things (67890).txt
Downloads/user’s F@ncyN4mE (12345)/banner (20230101).jpg
Downloads/user’s F@ncyN4mE (12345)/profilePic (20230101).jpg
Downloads/user’s F@ncyN4mE (12345)/profilePic (20230102).jpg
Downloads/user’s F@ncyN4mE (12345)/profilePic (20230203).jpg
Downloads/professional artist with 日本語 name (11223344)/クリップスタジオシェーディング (55678) p0.jpg
Downloads/professional artist with 日本語 name (11223344)/クリップスタジオシェーディング (55678) p1.jpg
Downloads/professional artist with 日本語 name (11223344)/クリップスタジオシェーディング (55678) p2.jpg
Downloads/professional artist with 日本語 name (11223344)/クリップスタジオシェーディング (55678) p3.jpg
Downloads/professional artist with 日本語 name (11223344)/metadata/クリップスタジオシェーディング (55678).json
And rename everything like this:
Downloads/12345/67890 p0.jpg
Downloads/12345/67890 p1.jpg
Downloads/12345/metadata/67890.txt
Downloads/12345/banner (20230101).jpg
Downloads/12345/profilePic (20230101).jpg
Downloads/12345/profilePic (20230102).jpg
Downloads/12345/profilePic (20230203).jpg
Downloads/11223344/55678 p0.jpg
Downloads/11223344/55678 p1.jpg
Downloads/11223344/55678 p2.jpg
Downloads/11223344/55678 p3.jpg
Downloads/11223344/metadata/55678.json
All while preferably preserving folder timestamps (less important) & file timestamps (very important, MUST be left alone, as I sometimes reference these in my different archive datasets)
All of this stuff is now located on my Unraid-based NAS, and I’ve been looking at ways that I could mass-rename files on a Linux machine using rules.
I’ve been looking into regex for the past few days, but since I’m still just learning the ropes for this type of stuff, I thought I should ask for help.
I figured out a way to use regex to select all of the text in the file names up to and including the opening bracket before the post ID, but I have no idea how to add multiple rules or exceptions (in case I want to keep a channel’s banner image, or user’s avatar/profile picture).
I’m usually pretty good at reverse-engineering things once I have them; usually if I just find a few snippets of code for something, I can repurpose it to do what I need, but this sort of mass, recursive, rule-based file operation is too many components for me to start from scratch with.
That’s why I thought I’d come to Level1; I was hoping someone here would be well-versed enough in Linux to show me how to do this.