I am considering replacing my home server, getting a ZFS array running is my main motivation. If I am going as far as to replace my other hardware, I would like 5TB+ of unformatted, redundant storage. Before I run out and spend a grand on drives and hardware, I'd like to know how ZFS is an improvement over popping in a 5TB or 6TB drive and calling it a day.
What happens when a drive dies in ZFS? If a drive has an I/O error or fails a checksum check, will it be dropped from the array, leaving the array broken? Will the I/O transaction actually complete, or will the array be "frozen" until I run out and buy a new HDD? If I was in the middle of writing a file however, in most cases I'd be boned.
In my ideal world, when a drive has been deemed "bad" by ZFS, the drive would automatically be removed from the array and the array would continue operating normally and send me an email or something. From there I'd frantically find a replacement drive, but not experience any downtime, beyond a moment or two when the I/O error happened. After ZFS poops itself, the I/O transaction would be performed on what's left of the array, almost as if a drive didn't just die a minute ago.
Is this a pipe dream, or will ZFS do something very similar to this with the right configuration? Thanks!
That's pretty much how it works, as long as you have at least 1 redundant disk then a disk can fail or a read error can occur and you can still access and use the array. Performance will take a hit but it will still function. When you replace the disk the array will be rebuilt and everything will return to normal. It should be easy enough to set up email notifications for disk errors, I'm sure freenas has that functionality built in but it would be easy to set it up on another os.
You can still work on a degraded pool. The pool will however stay degraded until you replaced the drive and let it resilver. And the more you write on a degraded drive the longer a resilvering would become. If for some reason a second drive dies when it's degraded or during resilvering since it's still a degraded state you will lose everything. This is a RaidZ1 example
The key being "in the right configuration." If the hard drive fails and it's your only drive, you're boned. If you only have single-drive vdevs, you're boned. Only when you have some redundancy to absorb the failed drive will you be able to continue operation, in a "degraded" state until the drive is replaced and the new drive "resilvered" or repopulated with data.
The short answer is yes, if you set it up that way.
When you set up ZFS, you also set it up to do a data scrub periodically. I have mine set up to scrub once a week. During that scrub, it should find if any drives have gone bad. If you've set up your system to be able to email you, then you'll get an email on the results of the scrub. You can also check the status of your RAID at any time.
Doing a RAID whether it's a ZFS RAID or otherwise is an improvement over using a single large drive. You get the redundancy - one drive can fail in RAIDz1 or two drives can fail in RAIDz2 and it will still operate. You'll also get better read and write speeds than you would for a single drive. You can refer to the following website for ZFS RAID speed comparisons.
For speed, keep in mind that your bottleneck will likely be at the network connection since you're probably not running more than 1000Mb/s or 125MB/s on your Network Interface Cards or Cables. So the RAID will make the read and write speeds faster for internal processes, but not accessing or writing the data from another computer on the network.
I also found the following website to be really useful and informative about how ZFS works and how to set it up on Linux. If you're setting up ZFS on FreeNAS then the process may be simpler, but the parts about what ZFS can do should be the same.
With SSD caches, what happens with they malfunction? I have a 64GB Vertex SSD collecting dust, but if it could lead to data loss I'll happily wait for the HDDs.
3x4TB Hitachi Deskstar NAS drives seems like an economical and speedy solution, is that a reasonable setup? The plan was Debian Jessie + zfsonlinux with a Core i3-2120 and 6-8GB RAM. It's a NAS running a minecraft server, how much RAM and CPU time will ZFS gobble without deduplication, compression, or snapshots enabled?
I can probably get that Core i3 system for $100 plus the cost of drives, but those HDDs are already a little more expensive than I'd like. If I need >8GB or ECC RAM I can't justify spending that much money on a slightly nicer NAS.
Sorry if I'm sidetracking the thread, if there is a better way to ask my less applicable questions I'm open to it.
EDIT: The Hitachi HDDs are simply too expensive for my needs. 3x TOSHIBA DT01ACA300 3TB are $386 CAD shipped and look like they're made from Toshiba's Hitachi tech acquisition (reliability decent?). The 2TB space loss isn't a big deal, if you haven't heard horror stories I think I'll get three of these.
SSD's are ideal for caches. When it fails you just replace it. You may have slower write performance until you replace it, but you won't have data loss.
ZFS will allocate 1/2 of your RAM for its uses, so if you have 8GB of RAM - then you can expect your OS and other applications to run as they would on 4GB RAM if on a non-ZFS system. In other words, for ZFS (without deduplication) get twice as much RAM as you would normally get.
The CPU requirement for ZFS is negligible. Your other applications on the server will determine the kind of cpu you need.
I don't know about the Toshiba drives, but the "I" in RAID stands for "Inexpensive". I used cheap 6-year-old refurbished enterprise grade drives in my build. I figure that when I need to expand in a few years, the prices will come down. I also have the whole server backed up to a separate disk though too.
Good to know, thanks for your help. Since my current setup is adequate for now, I think I'll put off a proper server as long as I can and do it right. If I was to configure a ZFS-based NAS now, this would be my setup:
Core i3-4160 (or something to that effect) 2x8GB ECC Dimms (overkill, longevity will be great) Asrock Q87WS-DL (Server grade, 8 SATA, Intel NIC, ECC Support, USB 3.0) 3x3TB Toshiba DT01ACA300 HDDs in RAIDz1
I'd probably toss in whatever SSD(s) I have floating around, an 80+ Bronze or higher PSU, and lots of filtered fans. Debian + ZFSonLinux would be my first choice of OS. The cost would be ~$800 CAD for the parts listed, which is too much for a preemptive upgrade imo.
Still, if I bought all that, would I have a rock-solid server? I would post in the FreeNAS forums since the guys there are REALLY specific on hardware choices, especially about ECC RAM. Thanks for reading!
When it comes to ZFS, there's no such thing as overkill when it comes to RAM. The more RAM you have, the faster your filesystem will be. Before bothering with SSD log devices, always max out the RAM first. Regarding ECC, it's only as important as you regard your data integrity. If you don't care about the possibility of data corruption, ECC doesn't matter. Same as with any server build. People tend to build systems not knowing this and then complain when luck is against them, so the FreeNAS people just love to make sure everyone is fully aware of the obvious.
If you don't like FreeNAS (I'm not a fan of wanky web GUIs myself) I highly recommend you play around with vanilla FreeBSD in a VM or on some spare hardware for a while. There is also the option of Debian/kFreeBSD, if you are truly stubborn. ZoL is still immature and lags in features. FreeBSD is a very nice OS to work with and isn't that hard to get the hang of coming from Linux. You'll get a much more mature implementation of ZFS using FreeBSD, so again I recommend at least giving it a few weeks to feel it out. The FreeBSD Handbook is a great reference to start with.
I really, really like my Debian, and am not able to set anything else up without directly following a wiki yet. I'll mess with kFreeBSD or Vanilla BSD in a VM soon, but I would really like to use Debian if at all possible.
In what way is ZoL immature? From what I can tell I won't be doing anything fancy with ZFS, so as long as ZoL is stable I won't miss any new features zfs on FreeBSD has.
I need Samba, Minecraft/Java, and something akin to the AUR or debian repos. The first two shouldn't be terribly hard, but I use Windows on all my regular machines, so when I need Linux for something, my Debian box is what I go to first. My other server is a Raspberry Pi, so I need my x86 box to anything and everything I like, without too much hassle.
I guess I am stubborn, heh.
EDIT: No word (or obvious download source) for kFreeBSD on the Debian site. Debian may have dropped it, or I'm looking in the wrong place.
ZFSGuru might help although it's FreeBSD. Actually it IS FreeBSD with a pre packaged web front end to make ZFS pools and such along with built in Samba with a drag n drop interface for it. And has some pre packaged services. That said it's still in development and it's not progressing that fast.
This article goes into a lot of differences between using ZFS on debian linux vs. using FreeBSD, and how ZoL is immature compared to ZFS implementation on FreeBSD.
Personally, I chose linux simply because I'm still learning linux and didn't want to throw another OS in the mix right now. It seems to have performed well so far. I had an issue with destroying zpools when I was testing different configurations, but that's it (it turns out you have to remove connections to the zpool before you can destroy it).
That's a really well-written article, I'll hang onto that. The more I look into it, FreeBSD is way more effort to get the same functionality as your favourite Linux Distro. ZFS is the clear exception, but I use lots of closed-source software like BTSync, which are prone to terrible Linux support, let alone FreeBSD support.
One bombshell in that article is a lack of TRIM on ZoL, which makes SSD caches a temporary solution at best. Still, it looks like FreeBSD requires way more fiddling than I'm able to do, which is a shame because it could compromise my data integrity. It seems that ZoL's main issues are being outdated compared to FreeBSD's ZFS (not too bad if you don't need the new stuff... like TRIM), and having crap default memory management. The RAM settings look fairly easy to configure if you have the time to tinker.
PS: Avoid Asrock server boards like the one I suggested above. A similar LGA1150 model specifically supports 1.35V RAM, but generates thousands of IPMI errors per second when the voltage is below 1.38V. Easily fixable (or already fixed) with a BIOS update, but I'm not about to drop $1000+ on a platform that is so poorly tested. From what they've said the IPMI itself is also trash compared to SuperMicro's.Link
depends on what you use. I don't find a fileserver to be suited for stuff like BTsync to begin with. It easier to hang a simple computer if front of it. I use mine purely as a fileserver and a DC client for friends since OpenVPN keeps failing for me.
The rest is just a linux machine that does the rest. The big plus is that when I change stuff on my fileserver the rest basically just keep on trucking without having to touch anything at all, but that's how I roll probably Most want it to do everything and handle all their torrents, nzb's, btsync, owncloud stuff but I cannot find a way to make that neat so I can easily manage it. Therefor my solution is a download/junk machine for that. Rename the maps and folder (most done automatically though) just do one manual check and move that to a sync dir.
Probably not the most elegant solution but works for me.
There are other differences as well, but for basic usage it shouldn't matter.
You don't need an SSD cache most likely. There are two separate ways to use an SSD for cache: read cache and write cache. Write caches should be mirrored, and aren't going to be very large. Maybe about as much space as you have RAM. Any larger would just be a waste of space, because the data only stays on the SSD until the write commits to disk. The only time these will benefit you is when doing synchronous writes. So what's the point? Read caches can be larger and don't need any redundancy (if there's a data error the data is still on disk). But they don't stay hot: the cache has to be refilled after each reboot, and it's best filled by normal use over an extended period of time. Furthermore the read cache doesn't deal with large streaming reads; just smaller reads are cached, and it would be best to cache them in RAM. With some fine tuning, you can use the read cache to only store metadata, and potentially gain some performance that way. Officially these cache devices are called the ZFS Intent Log (ZIL) for the write cache, and the Level 2 Adaptive Replacement Cache (L2ARC) for reads. As I've mentioned before, you should always max out the RAM available to ZFS before even considering throwing an SSD at it for cache.
You should basically always enable compression. LZ4 compression is available in every implementation and not only reduces the amount of space used on disk, but can actually improve IO performance as a result of transferring less data. And it doesn't waste time trying to compress incompressible data, so there's not really a downside to having it enabled.
I was planning on running LZ4, since at the very least I'll have a Haswell i3 in the system, which should be more than capable of handling gigabit speeds. I was planning on getting a couple hot-swap bays for easy cloning and disk imaging. With a compressed ZFS pool the data would always be compressed whether I forget to or not, which is a huge convenience. How bad is the CPU requirement for LZ4 at gigabit speeds?
What's the deal with ZFS and low-memory problems? Besides crappy performance, I've heard of pools refusing to mount if the system doesn't have "enough" RAM. Deduplication is usually involved, but for a pool with only LZ4 compression, how much RAM can I expect to be using with ~6TB of data in the pool? Ultimately I'll have to find out the hard way, but the "buy ALL THE RAM" solution that people suggest is not really helpful. I think the system Wendell was using to show off FreeNAS had 2GB in it and it worked for light-duty fileserver applications. This sharply conflicts with the "OH DEAR GOD, YOU'RE USING LESS THAN 8GB, YOU MIGHT DIE" mentality the FreeNAS folk have. It's being cautious just like having ECC RAM, but numbers are more convincing for me.
@Hakker The "clean" and "dirty" machine approach is the most sane from a data integrity perspective, but I don't want to add a second server to the mix. At that point I'd have created a small compute cluster, for the purpose of storing a movie collection and hosting minecraft... not on a student's salary! I like BTSync because it's the best cross-platform solution I've found for managing "hot" data over WAN. If I'm at school I can be fairly confident that the document I just saved is now stored on my servers back home no matter what device I wrote it on. On top of that, I can work on a report before bed on my desktop, and when I'm on campus in the morning it magically appears on my laptop. It's not bulletproof, but it got me through first year with no major disruptions.
EDIT: How much of the RAM overhead in ZFS is from prefetching and RAM caching files (not just metadata)? My plan is to use EXT4 for the root directory and to have ZFS mounted under /mnt. I have no applications that would need to frequently access the ZFS pool such that preemptive caching is helpful.
The i3 will handle LZ4 fine and frankly never heard of memory issues. 8 GB for a server isn't much and will work fine less too. I run 16 GB memory on a 42 TB data server. Perfect? no but works good. It's how ZFS works, it will just use ALL ram it has available hence the general rule feed it as much as you can. ECC isn't a necessity either it's just eliminating every possible point of error. ECC isn't a ZFS specific thing it's just for every data source a point considering if the option is there. It's just that ZFS eliminates most errors already thus ending up on points like using ECC ram. What most of those people don't consider who scream you need ECC is that the source computer would need it as well as there the same could happen eg the endless spiral of last bits of security measures ;)
For comparison ZFS is by default more secure than EXT3/4 and NTFS and well I never had real filesystem issues with those.
The big memory issue is just with deduplication. Dedup in ZFS keeps a table in memory with the hash of each unique on-disk block in order to know when new blocks are duplicates, among other things. You can probably imagine that ends up taking quite a lot of RAM. It is really only beneficial in exceedingly rare edge cases.