Not going to watch an 18 minute video at work.
If its “performance is less” then that’s not “broken”…
Well the first 2-3 minutes explain why it’s broken and the rest are ideas how to fix it. By broken I mean “you can lose your data” because your pool can be potentially unusable because the performance is actually that bad. Technically you would be able to recover your data if you added resources or waited long enough, but realistically you would not.
I have lost a pool to dedup. Only once.
Q: “Do people want to turn off dedup?”
A: “Yes. A big percentage of people that turn on dedup, then want to turn it off. And, you can turn it off, but you still have a giant dedup table and whenever you free something that has the dedup bit set, you have to go look in the table and decrement the ref count. So people would love to have a seamless way to not just turn off dedup but eradicate dedup, and nobody’s implemented that.”
Thanks for the TLDW.
In my case i think the risk is not significant… small datastore (500gb), not really important data (essentially many copies of disposable Windows VMs for lab stuff) on full-SSD backing store. I’ll have plenty of RAM (up to 64 GB for 500 GB disk) for deduplication if needed.
I think i’ll still give it a go, but again thanks for the heads up.
If i have to turn it off, i’ll simply backup to an external drive, destroy the pool and restore, or just delete it and start over
If i can get 2-3:1 de-duplication ratio or better (on a heap of Windows 10/Server 2016 VMs) that will be a big win. Not performance critical really.
Don’t get me wrong - i agree in most cases don’t use it. But my situation is a bit of a non-mission-critical edge case, with abundant system resources to throw at it…
it is when even if you supply the exponentially more excessive amounts of ram for such a basic thing as a file system, and still perform worse every time
you cant use 1,000,000x the ram and still be slower and then also not be bloated
was direct to sgtbruh lol
the basis of the original concept is towards a user using at home on say a workstation/gaming machine, maybe a nas, of which probably 1/10-1/100th the workload of linus media group(obviously still not that intensive as far as file servers go)
i mean were talking add a bluray rip a couple times a month to a media pile or something, back up some photos
So now that we’re on the same page that you don’t need 1GB RAM per 1TB disk for a home NAS scenario, what is the issue?
Have you run zfs in any capacity?
Since this is post does not seem to be so very serious; Either @Token or @sgtawesomesauce gave me some previously unknown to me facts about btrfs in the original post. Despite that i put in quite abit of time trying to read up on it.
And i believe you both had some btrfs experience?
Would you or anyone give me some more techinical issues or pros about btrfs?
It can be abit hard with a fast developing fs where most of the posts are warning against raid5/6.
Im not studied up on the technical explanation behind the raid5 issue, but when I ran rockstor at raid5 I lost the whole array. I then spun up FreeNAS and loaded the back up of the data there, no issues after that.
Unfortunate because the rockstor gui was great and the plugins (rockons) simply just worked. I do not recall how long it lasted- maybe a fee months. I’ve since visited the rockstor site, seems dead over there.
This was years ago, I’m sure lots of dev has since been done to btrfs.
the base concept that there is even a recommendation for any mount of ram in consideration of a filesystem. ex if i wanted it to not be slower so i used ufs, there is nothing to be read about tuning my filesystem to increase performance/reduce the resources usage, they dont recommend i have extra ram over what i need for the normal system usecase
why do you even need a utility for tuning the filesystem if its already inherently faster and more efficient than any other filesystem? its there to turn down stuff people dont need/want for their particular situation to get back some of the resources/speed they would have lost
have used open slowlaris. but for most home use zfs will just be bloated and generally offer no benefits, ex: if people cared about backing up their data, they might actually do backups, lots of people dont. most would never try to restore from a snapshot or something as they probably wouldnt even know it does things like that, but they would get reduced speed and excess resource usage even if it was preconfigured to not do most of that stuff
Ty, the only thing i read properly about rockstor was how the gui would let you break your raid on repairs or replacements.
Supostely the manual would lead you to victory if you followed it fanaticly.
@sanfordvdev i tried zfs on arch for a few months, on root. for moving around media files. it is the fastest thing i have ever tried.
Yes. It was pretty much good, except for the one time I lost 5tb of data. (had backups tho)
BTRFS calculates parity incorrectly (last I read about it anyways), so if you have data loss, it will incorrectly repair it. I’m not sure much past that, but essentially, raid56 is fubar.
It also doesn’t handle lots of snapshots very well. I’m not that sure about how ZFS is with though, so I’m not sure if that’s just a thing we need to deal with.
The ram is only needed for inline de duplication which no other platform does. If the live de dupe hash table is not in RAM performance will tank. If any other platform implemented live de dupe (i.e., at write time, not a scheduled job that tanks array performance when it runs out of hours) at block level it would have the same requirement.
There’s no way around storing the hash table in RAM without tanking performance. Period.
ZFS is the only platform that has the option for live dedupe, and thats the only unreasonable memory requirement.
You clearly do not know what you’re talking about.
so lets say we have 1tb for me to use 1gB of storage space to store a hash table would require a pretty substantial amount of files, given say a sha1 hash is only 20bytes, if we alotted another average of 1000bytes for storing a exact path/file name, were talking a massive 1020bytes if we do the basic math 1,000,000,000/1020=980,392.1568627451, then if we were to divide 1,000,000,000,000/980,392.1568627451 we get a scant 1,020,000 files, with an average of that same 980,392.1568627451 but in bytes so a touch under 1mb
where if they were to say hash the filename, or assign a number or something to identify the file other than a possibly long full path(given a fairly standard 255 character file name limit), to say be a 64bit/8byte number, plus those 20bytes for an sha1 hash, results in a pretty significant reduction in size of the table you would actually benefit from being fast(as knowing the full path/name would only be relevant if you already found a match and wanted to remove one or whatever) with said 28bytes would only take 35 some million files to equal one gigabyte resulting in a massive average of 28kilobytes per file if that was only 1tb
i dont know about you but never known anyone to have such a vast collection of text files or very low resolution jpgs or something, let alone to consider such a thing normal enough to warrant a recommendation, or maybe they arent doing it nearly as compactly as they could
Have you ever looked at filesystem code? If not, this will be a good exercise. Let’s compare some code.
has anyone used dm-dedup?
I’m so down
You seem to not understand the use-case behind ZFS.
Don’t get me wrong, EXT4 isn’t bad, it’s just not as good for the use case I want. I want checksumming, volume management copy-on-write snapshots and encryption.
ZFS was never designed for normies. It was designed to hold multiple PB of data reliably. And to that end, I’d say that it does a good job.
I haven’t. Never heard of it. Sounds interesting though, now that I’ve looked at it.
Everyone craps on ntfs and maybe rightfully so but server 2016 has deduplication and volume shadow copy is actually nice.
This is very nice.
I don’t know about dedupe on ntfs, so I can’t speak to it, but I know that shadow copies have saved my bacon a couple times in the past.
One way or another though, i’m not sure I fully trust NTFS.