ZFS vs EXT

dinscurge · May 1, 2019, 2:45am

nah as in most types of files are larger than a block in general practice, say you have a 4gB file, your talking a modest 1million hashes given a 4096kB block

i mean i can compare the hash for every single file on my hard drive(s) to verify a snapshot, then verify an older snapshot, then compare the 2 snapshots; faster then you could compare a single 1080p mp4 movie

freqlabs · May 1, 2019, 2:49am

Are you sure you know how hashing works?

dinscurge · May 1, 2019, 3:05am

say i add a duplicate drive to my system so i copy all the files from an existing drive i want to duplicate onto the new drive

after i have a hash table for the new drive(the actual written files not what i intended to write), if i want to compare those files to the files on the other drive by comparing the tables, i have many orders of magnitude less hashes to compare then if i used zfs, or if i have abackup solution using a cold storage drive, if i want to scrub that drive, i could generate new hashes, the same as i would have to on zfs, but i would have a many orders of magnitude faster comparison which would be reliable still

lets say 4tb of data or 1,000,000,000 hashes vs say at my current rate like 150~200,000 max prolly. i mean lets say you have a sql database you are trying to query if one is 5000x> larger would that effect that operation at all? maybe how much resources it takes, or how many cycles?

freqlabs · May 1, 2019, 3:07am

If your hash tables don’t match, how do you know which one is wrong?

dinscurge · May 1, 2019, 3:17am

like if i have a corrupt file from bitrot so the hash no longer matches the file from the other table?

currently will just print the things compared and then at my leisure i could just rsync the entire say 8gb video file probably still faster than if i just compared results using the zfs format/method could finish deciding which blocks were bad

freqlabs · May 1, 2019, 3:24am

But how do you know you’re not rsyncing the bad copy and overwriting the good one?

dinscurge · May 1, 2019, 3:27am

could compare against historical data using a snapshot, but in general laziness majority rules/can just test if if you want, e.g. if you have 5 known good copies of the same file then one becomes different, if you dont remember changing it and not updating the others or something, probably the oddman out thats bad

freqlabs · May 1, 2019, 3:29am

So hash all the blocks for 5 files instead of 1 because that’s faster? I think you lost me

dinscurge · May 1, 2019, 3:39am

nah < setup has one hash table per volume if i run the thing to scrub them. but as in generate one hash for the 8gb video file instead of 2million

is primarily for comparing against a duplicate volume, as in a raid 1(whether raid or not) a backup, a remove backup w.e.

could just compare the table against an older table if want i guess

freqlabs · May 1, 2019, 3:41am

Every time you write to a file you have to make 5 copies of it (since you need 5 duplicates or whatever) and generate the hashes?

And since you’re worried about files being contiguous (because performance??) you have to overwrite files in place or worse if they change size copy the whole file to free space that you hopefully have in a big enough contiguous chunk? Times 5 copies? How long does it take to find free space on a disk anyway?

dinscurge · May 1, 2019, 3:57am

if i wanted 5 volumes sure, can generate 5 hash tables at the same time in the same amount of time it would take to generate one(given its a storage intensive operation mostly, and they are separate drives, but it would take even longer in zfs to entirely generate new hash tables for every single block of 5 pools, given they require a hash per block so quite a bit more entries to create and logically group together in a way you can associate those blocks with the same file etc

maybe this will help you

am talking about a specific zfs feature where you could say have multiple versions of a similar file, say you are editing a massive word document, like a novel, and you save each revision to its own file, with deduplication it could remove the blocks which are the same and rebuild the file using the blocks from the other files, resulting in the files being just vomited around the volume instead of specifically striped/contiguous w.e. to get max read performance if desired

thro · May 1, 2019, 3:58am

You really need to go and learn how ZFS works and what it does before you try commenting on it any further.

On a multi-tasking multi-user OS, intelligent caching and write serialisation (both of which ZFS does) are far more important than attempting to maintain contiguous files.

Your thinking is stuck in the DOS days of single user single tasking machines, if you think what ZFS is doing is less important than “BUT MUH FRAGMETZZZ!”.

Your machine, that has many threads with open files on it, is already dealing with the disk heads skipping all over the place to some degree for both reads and writes for multiple applications at the same time. Heads moving all over the disk are inevitable. ZFS does full-stripe writes every time (traditional RAID does NOT) and uses caches to alleviate this.

dinscurge · May 1, 2019, 4:28am

dunno meng havent seen very many good benchmarks for zfs with say files larger than the cache size

except when it doesnt if you are using dedupe and have files which it can actually reduce in size

edit: for an example if you want

stuff like this does not make me want to recommend zfs for many uses, say you want a high performance say optane/nvme scratch drive, probably would need to do some testing but sorta doubt zfs is even in the running

yet here you are how zfs is all sunshine and daisies just perfect in every possible way its foolish to even think a filesystem could even come close to matching zfs

freqlabs · May 1, 2019, 4:56am

ZFS actually is capable of quite impressive throughput in a lot of situations due to the built in compression. Reading a small amount of compressed data off disk and decompressing it on CPU is a lot faster than reading the same amount of uncompressed data directly from disk. Modern CPUs are several orders of magnitude faster than even SSDs, so avoiding trips to external storage as much as possible makes a huge difference. This is also why ZFS heavily utilizes RAM as a cache. RAM is orders of magnitude faster than disks. In fact the value of RAM for caching is so high that ZFS even leaves blocks compressed in the cache (same as they are on disk) to get that much better cache hit ratio. The decompression algorithms are transparently fast (intentionally), so it’s a net gain, and it means you can fit an even bigger working set in memory.

freqlabs · May 1, 2019, 4:59am

The author of that comparison admits that they did not put ZFS in its best light, they only are showing the difference if you made no attempt to optimize ZFS for a database workload.

tkoham · May 1, 2019, 5:04am

you realize he literally never makes a good faith attempt at discussion right

freqlabs · May 1, 2019, 5:05am

Of course, but this is better than playing video games

tkoham · May 1, 2019, 5:06am

dont see how but live your best life man

freqlabs · May 1, 2019, 5:15am

it’s like going to the nerd gym

tkoham · May 1, 2019, 5:16am

more like cranking out a nerd 6 roper but I get why its fun