ZFS vs EXT

freqlabs · April 8, 2019, 10:47pm

I was referring to compressed files rather than compression in ZFS. For zfs:

When a file is written, the data is compressed, encrypted, and the checksum is verified. Then, the data is deduplicated, if possible.

https://docs.oracle.com/cd/E36784_01/html/E36835/gkknx.html

oO.o · April 8, 2019, 10:49pm

Got it. I thought it was the other way for some reason. Thanks for clarifying.

nx2l · April 8, 2019, 10:50pm

I suspected this based on my testing so far.

My next test was to turn off compression… And repeat.

I’m betting I see 2-3x dedup ratio vs the 1.11x I have seen now.

PhaseLockedLoop · April 8, 2019, 11:35pm

While ZFS is amazing… its better on multi disk systems vs singular. Its what it was designed for

thro · April 9, 2019, 2:47am

I’ve heard people having dedupe wins with VDI and end user filesystem data. People tend to save the same copies of stuff in their home folders across many users.

But yes, dedupe is very much an edge case and unless you know you’re likely to have a lot of duplicate data (e.g., in my case many copies of Windows VMs for testing with) you’re likely better off without it.

Even in my case, it’s going to be an experiment rather than 100% expecting a win.

IN most cases unless you’re running de-dupe on dedicated storage hardware its probably a mistake due to the RAM consumption. But in my (edge) case, fairly small storage and large amount of RAM = maybe it might be a win. TBC…

dinscurge · April 9, 2019, 2:54am

imagine a file system that almost no one on the planet would agree is reasonable and then also deliberately reduce the speed of reading files from your disk by making fragmented files?

and no not talking about ntfs. gotta save that last 4kb

also @thro

say i want to try zfs, so i go to the website for freebsd, freenas, illumos etc which actually ship in configuration to support such

where i can i find this hash table to verify every single block of the iso? and then the hash to verify that hash table thats like 50~100mb?

thro · April 9, 2019, 3:18am

ZFS filesystem code does this transparently. YOU don’t verify anything, it is built into the filesystem.

Every block on a ZFS filesystem has a block hash, which is recorded and read when the block is read to verify the block is in-tact. Due to the fact that these hashes are sha256, they are pretty much guaranteed to be unique. As this is the case, any block with the same hash can be considered a duplicate, and thus ZFS uses the same block hashes it uses internally to verify integrity as a duplicate block indication.

If you read an ISO from ZFS, you can be guaranteed it will either read correctly (as it was written) or it will not read at all (if it is damaged and sufficient replicas in the filesystem do not exist). I.e., you either get the correct data back, or a hard error.

Again, these are not file hashes. But every single block (including the uberblock, of which there are multiple copies) is hashed.

If you’re talking about verifying the downloaded ISO, then the download site will have the appropriate checksum/hash file for you to verify against.

The reason the block hash table needs so much RAM in ZFS (IF you turn on de-duplication) is because ZFS does de-duplication in-line, at the time the block is written, rather than a scheduled task. The reason for this is ZFS is intended for 24/7 performance, it is a given that there is no “out of business hours” time to do a scheduled de-duplication run, like on say a Netapp or whatever.

So, at write time - every block written needs to be looked up in the hash table to see if it already exists. If this table is not in RAM, then it needs to be looked up from disk. As you can imagine, a disk seek to look up the hash every time you write a block will tank performance.

freqlabs · April 9, 2019, 4:02am

Consider using clones. You can set up a windows VM, snapshot the disk, then clone that as many times as you like for free.

thro · April 9, 2019, 4:04am

Already using VM workstation thin clones from (sysprepped) templates, however they tend to diverge as windows updates are applied, and additionally there is likely A HEAP of duplication between say, Windows 2008R2/7, Windows 2012R2, Windows 2016/Win10, etc.

Again, experiment time. If it works, great! If it doesn’t work… it was an experiement

oO.o · April 9, 2019, 4:09am

I tried dedupe on VMs and didn’t have a good experience, even with a large amount of ram (192GB). Lots of variables though, so maybe it can work. It’s certainly a good case for duplicate blocks so long as the VMs aren’t encrypted and there are a lot of them.

I think most likely it’s a case that may be viable in the future when some of the kinks are worked out of dedupe.

freqlabs · April 9, 2019, 4:15am

You can actually choose from several checksumming algorithms. The default is a faster, non-cryptographic hash, though sha-256 is an option.

freqlabs · April 9, 2019, 4:17am

They will diverge over time, yeah. I figured test VMs would be pretty short-lived.

freqlabs · April 9, 2019, 4:21am

One of the practical considerations for VMs is that unless the filesystems in the VMs are tuned for the alignment of the underlying zvol block size (not a typical default), the same data might not be necessarily written in a way that produces matching blocks in the eyes of ZFS.

thro · April 9, 2019, 5:07am

Mostly yeah, but i am pretty slack with updating my sysprep templates, so inevitably i end up with X clones with Y months of updates.

Like i said, it will be an experiment. May end up trying it at home first later this week as i think my home PC currently has a spare 500 GB SSD in it (lol, how times change - can’t remember for sure how many SSDs i have) that used to be my Windows 10 install. If not, i know i have a spare 240 gig SSD in a USB caddy i can commandeer for testing.

Work i only have the 2 SSDs, so i’d need to wipe out my test environment to convert the second SSD to ZFS.

nx2l · April 9, 2019, 5:35pm

So far… things look much better with compression=off

freqlabs · April 9, 2019, 5:50pm

Try using some more disk space

SgtAwesomesauce · April 9, 2019, 6:11pm

like

10^5 * 129G

at the very least.

freqlabs · April 9, 2019, 6:16pm

It’s a 4TB pool, I assume one would like to be able to take advantage of at least 1TB of that.

Levitance · April 9, 2019, 7:06pm

Are you talking about just making sure the zvol block size is in line with the block size of the filesystem in the VM (or vice versa)?

freqlabs · April 9, 2019, 7:14pm

It amounts to the same thing either way. The block size used by the filesystem in the VM should not be smaller than the block size of the underlying zvol, just like a filesystem on real disk should use an appropriate block size. The motivations are different, but that’s a fairly basic rule of thumb.