I found some strange issues in one folder (subfolders are fine) of a ZFS dataset. I haven’t found other issues like this, but I haven’t been looking. I’m not sure if metadata got jacked or what, but notice these files?
The extensions are wrong. The JPGs are not 400MB, they’re MP4 files:
After renaming them, all the files work correctly, but then the EXIF data doesn’t match the filenames.
Correct EXIF filename
Incorrect EXIF filename
Filename | EXIF Date Taken |
---|---|
20220710_132126 | 7/10/2022 - 1:21:14 PM |
20220710_132129 | 7/10/2022 - 1:21:21 PM |
20220710_150750 | 7/10/2022 - 1:21:25 PM |
20220710_150755 | 7/10/2022 - 1:21:29 PM |
20220710_150804 | 7/10/2022 - 3:07:50 PM |
Looking at these four files and the incorrect extensions in the screenshot above, it’s clear to me that all filenames are off-by-2.
I found a folder called “possibly dupes/”, and it contains a file called 20221021_111031.mp4
. Today, that file is named 20221021_111334.jpg
. That’s off-by-2. So it’s true that everything’s been pushed out.
What happened? Not sure. Everything is off-by-2 in here after July 10th, 2022.
But that’s wrong because I looked back further, and some others are messed up there too. It’s inconsistent. More below.
I looked at my oldest snapshot for this data, and it’s all messed up there too, but they’re off-by-6!
It’s weird because I have less and more photos. There are 2 extra photos in the oldest snapshot and 4 extra in the latest. 2 + 4 = 6
. Maybe it’s related? But if you base if off filenames, there’s an extra filename in the current data versus the snapshot.
If it’s related to the off-by-2 and off-by-6 naming, then that’d be an easy thing to verify. If I go back to older photos, I’d see those missing ones, but they’re not there! Those older photos from July 9th are all named correctly.
I haven’t renamed these files at all. They all came straight off my phone or my wife’s phone onto my Windows box, and then eventually copied onto the NAS using Robocopy or into a separate dataset using rsync. Not sure why they’re all messed up like this, but I really wanna fix them.
Other incorrect filenames I found
Notice here, these three MP4s are actually JPGs:
EXIF data shows they were taken the day before at 11am. That means something is definitely messed up here.
I looked at 36 files around these, and none of them are wrong. So how did these 3 images get improperly named as videos with the wrong date? That makes me think those videos are gone.
In the oldest snapshot, these images are also incorrectly named. In this case, it’s consistent between the two.
Looking further out, I found 3 JPGs that are actually MP4. They’re also incorrectly named, but they’re at least 26 files apart.
As a reference, the other images are correct, but this one is wrong: 20220614_133529
. It’s also from June 9th like the other 3 incorrectly named MP4s.
Looking at ones a month later, sure, they’re incorrectly named here too, but the naming isn’t off-by-2:
And cross-checking these against my oldest snapshot, the names actually line up this time. I’m not sure what the pattern is for these being out-of-sync and having the wrong filenames.
20kB images?
I dunno why some images are <50kB. I looked through all of them and while the filenames are wrong, I found a number of them under 50kB:
These all look like photos I’d sent over Signal. The aspect ratio is 4:3 (same as I took them), but the photos are all sized down and the EXIF data is missing. This is really distressing to me because photos I send over Signal are ones that I value more than others, and it’s not like I have another copy of these .
Stress
At this point, I’m freaking out because it’s possible I’ve lost quite a few photos. I already know I lost some because there are <50kB photos that should be 3-8MB in size. That’s really bothering me.
The earliest snapshot is from Jan 8th. The next snapshot is Jan 15th; and other than new images, those files match the current filenaming.
I don’t have a backup of this directory anymore in Windows. I have 2 other datasets where I replicate this data and both have snapshots, but both snapshots are from February, long after the initial copy.
How to fix?
First, I put a hold on these snapshots as they’d normally be deleted in a few weeks.
I also need to figure out where image names aren’t lining up. There are a few ways to figure this out:
- Compare the current data to the oldest snapshot and see where filenames don’t match filesizes in this directory.
- Figure out which JPG have EXIF and which filenames don’t match the “Date Taken” field in EXIF data.
- Figure out which JPG files are actually MP4 and which MP4 are actually JPG by looking at filetype metadata.
I’d also like to know why the oldest snapshot has different filenames. That makes no sense! It seems really odd to have one set of data with incorrect names and a later update of that same data with completely different names!
This makes me think ZFS’s metadata is corrupt for this directory. I do have a special
device, so maybe that’s what broke it? Only for this folder in all my datasets?