Data Corruption Question

Today I had a program freeze on my computer and it was so locked up that I ended up just killing the power and rebooting. I have heard that doing this can cause data corruption but I have never experienced it first hand. But it did get me thinking. If something were wrong the PC would tell me right? Many OS have some sort of volume checker that’s scans the drive and fixes errors.

I decided to do a quick test. I took a 2GB zip file and copied it to a spare external hard drive. Halfway though the copy I unplugged it. I plug it back in and see the file is on the external hard drive and reporting the correct size. I then run the disk checker in Windows 10. It scans the drive and reports back no errors everything is fine. But the file defiantly didn't copy over 100%. It fails to unzip and its hash doesn't match the original.

Is there really nothing in place that can catch something like this? My desktop is one thing but my NAS reads and writes many files a day.

What kind of NAS are you using?

I know its a dumb test. I expected the file to be corrupted. I just figured maybe I would get a notification or something. Hey buddy this files is messed up. Thought you should know.

The NAS is a WDMyCloud 4TB. Also on the topic of improper shutdowns due to a firmware issue it has no shutdown option in the web gui so if you want to turn it off all you can do is unplug it and hope for the best. I have seen a few pages online that talk about connecting via SSH and sending a shutdown command but I haven't tried it yet.

Actually, there is really no way to implement this robustly and automatically unless the file system supports it. The bitstream's meta-information reporting the "correct" size just says the file starts at x and goes until y. There is no programmatical dependency between that and the file's contents "validity" as determined by the user. Corruption in the meta-info is simple enough to check, but the only way to check for corruption in contents would be to run everything through a parser specific for that file type, which would necessarily miss many errors.

Since computers can't really do it automatically, the practical way to minimize this either involves using 1) Uninteruptable Power Supplies, that allow the user to safely shutdown their system, thus minimizing file system corruption.

Or the other way is to 2) check individual files for corruption using hashing techniques. The idea is to embed in the file's meta-info a fingerprint of the contents. This can be done by pre-appending the fingerprint to the end of the filename before transfers. Then, on the destination end, the file contents can be checked again against that fingerprint in the file name. A non-match would indicate a transmission error. For file-system level fingerprinting, this would probably double the time it takes to perform I/O (every file write would have an accompanying read). So who wants a new computer that runs at half the speed of their old one?

People normally use Rapid CRC and embed the crc32 in the filename like so:
myfile_[123456].doc
but Microsoft uses SHA1 which is better when needing to also detect tampering but takes longer to calculate.

You are getting into the realm of how File Systems operate. Checking drive integrity is different from checking file corruption. The Linux 'Forensic tools' kit has many programs to run different check of drive sectors, file header checks and the likes. I have looked into most of this because of a secure file deletion program and let me say there is a ton of stuff not easily understood.

the actual hard drive is fine in most cases, but continual improper shutdowns can cause unnecessary wear to the motor. The File header usually at the fron tof the file's read space went through just fine and reported to ythe files system its projected size so it can allocate the space accordingly. But the file integrity was corrupted and you can prove that by getting the SHA256 or MD5Sum of the file and it not match.

you can get the file's md5sum and run a check to verify it's integrity... I think there are system tools for that, or you can write a small Python program to do it for you...

This is all in simple speak, there are much more precise interpretations of what I said and variations dependant on the FS incase anyone chimes in.

Sorry for the late reply. It's been a busy week over here. Great replies. I actually never knew about RapidCRC. I have noticed the CRC at the end of video file names in the past but didn't know there was a tool to set and check them quickly. I can actually see it helping me out a lot. Do you know if there is anything similar for MacOS?

The "error" happens because the filesystem writes the inodes before the content of the files, so essentially the start and end locations of the file before it's content is written.
This is incidentially why when you move files from location A to B, the file source files content isn't deleted before done copying, so that if the move is interupted you still have the full source file.