What's the best way to do RAID1 with large HDDs?

Tystros · September 29, 2022, 1:07am

Simple question. What’s the best way to do RAID1 with large HDDs? Most simple, most reliable, and least likely to ever cause issues.

I’m currently using Windows Software RAID (Mirrored Volume), but I feel like that’s not really ideal, and there has to be some better way. My biggest issue with Windows Software RAID is that any time the PC shuts down uncleanly (like a crash or hard-reset), it needs to do a full re-sync of the volumes. And such a full resync is slow on HDDs. With 12 TB HDDs, such a full resync takes around 30 hours or so, and the problem is whenever you restart your PC, that full resync starts from 0 again. So since my PC runs at most 15 hours a day (I shut it down at night), it can never actually finish that full resync, and I hear my HDDs doing work all the time, and never finishing it. That’s obviously not ideal. It feels like Windows Software RAID1 is simply not made for large HDDs where it takes a long time to read them fully. Or not made for PCs that don’t run 24/7.

Another big issue with Windows Software RAID is that I think there’s no was to have it keep working when buying a new mainboard, or when reinstalling windows. When I bought a new mainboard the last time, I kept the Windows install, hoping that my RAID1 would keep working, but Windows showed all the mirrored volumes as “Missing” in the disk management. Probably because it hardcoded the mirrored volume config to some SATA ID from the old mainboard. So what I had to do was disconnect one of the mirrored HDDs, wipe it, and set up the RAID1 again from beginning. Not great. I want my RAID1 to keep working when buying a new mainboard.

Ideally, I’d also like to not be forced to only use Windows with my mirrored HDDs. I’d like to be able to read and write from the mirrored HDDs from both Windows and Linux.

So what’s the better way to do it? I have often heard now that hardware RAID isn’t a good solution either, but it feels to me like I would not have those issues with some PCIe RAID card. What is a good solution? The goal is just to be safe in the case that one HDD dies. I have a backup of everything in the cloud, but simply swapping out one HDD is a lot more convenient than having to download 12 TB of data from the cloud backup in case an HDD dies. So I want to have that local redundancy.

ibreakthings · September 29, 2022, 2:13am

ZFS. I’m not sure it’s the most stable on Windows, but it does appear to be possible GitHub - openzfsonwindows/ZFSin: OpenZFS on Windows port

cowphrase · September 29, 2022, 4:31am

Every RAID1 implemention needs to do this, both software RAID and hardware RAID without a battery backup. You can instead run hardware RAID with a backup battery, or use a UPS and avoid shutdowns on your machine.

However ZFS and BTRFS don’t need to do a full rescan in the case of an unclean shutdown. Why? Because ZFS mirrors and BTRFS “RAID1” aren’t technically RAID1. The difference is that the two drive copying is part of the filesystem, so the filesystem can replay the unclean transaction, ensuring both drives are up to date. A normal RAID1 has no idea what the filesystem was up to, so it has to manually scan both drives to ensure they match.

Software and Hardware RAID aren’t “bad”, they’re just not as good as ZFS when it comes to data integrity. In either case you want to keep backups to prevent data loss.

As for your situation I’d suggest getting an external NAS. That way you don’t need to shut it down all the time, and you can run ZFS while keeping your main PC as Windows.

twin_savage · September 29, 2022, 5:07am

You say simple so that disqualifies zfs in my mind.

I’d just use a windows storage spaces mirrored volume; this is distinct from windows raid 1. It checks most of your boxes, it’s portable among different windows installations (even if the hdds are on a different sata controller or plugged in via a usb hdd dock).
You shouldn’t have to resync a storage spaces volume after a crash unless something very strange happened. if you do plan to have many crashes I’d format it with ReFS so that any possible data corruption is dealt with better.

Tystros · September 29, 2022, 6:01am

Every RAID1 implemention needs to do this, both software RAID and hardware RAID without a battery backup. You can instead run hardware RAID with a backup battery, or use a UPS and avoid shutdowns on your machine.

I am using a UPS, so I never have a real power loss at my PC. But as far as I can see, there is no difference between an actual power loss, and me pressing the reset button of the PC because I managed freeze my PC somehow. I like to play around with overclocking, especially the GPU, and I am writing my own software that surely sometimes manages to freeze the PC too, so having to do a hard-reset is something that I consider something that happens “often”. So a UPS doesn’t help in any way, right? Windows treats an actual power failure and a manual hard-reset of the PC the same.

What I don’t understand is, why can a software raid implementation not just continue the full resync where it stopped when I restart the PC? I don’t mind the full rescan taking 30 hours, if its doing 15 hours first, then I restart the PC, and then it starts at 50% and runs for another 15 hours until it’s finished - that would be perfectly fine. I’m sure it’s just Windows being stupid that it can’t remember where it stopped when restarting the PC, right? And a different software RAID solution could surely do that better?

As for your situation I’d suggest getting an external NAS. That way you don’t need to shut it down all the time, and you can run ZFS while keeping your main PC as Windows.

I have thought about an NAS, but I only ever need to access the data from my one PC, and I don’t really like the extra space requirement of having an NAS somewhere. It seems like an unnecessary extra complexity, considering I have a good PC who should be able to do everything I need without such external hardware.

Tystros · September 29, 2022, 6:05am

You say simple so that disqualifies zfs in my mind.

Yeah it does not seem simple on Windows.

I’d just use a windows storage spaces mirrored volume; this is distinct from windows raid 1. It checks most of your boxes, it’s portable among different windows installations (even if the hdds are on a different sata controller or plugged in via a usb hdd dock).

How does such a windows storage space deal with the situation of a hard reset, does it not try some kind of full re-sync then? That it keeps working with different SATA controllers does sound nice, so that is one improvement over the “old-school” windows RAID1.

But what about accessing such a storage space from Linux? Is that possible in any way?

if you do plan to have many crashes I’d format it with ReFS so that any possible data corruption is dealt with better.

Is ReFS advisable to be used in Windows by now? I always had the impression that it’s still somewhat “new” and unfinished. Also, the Linux question, how well can I access a ReFS volume from Linux?

cowphrase · September 29, 2022, 6:28am

That’s because the RAID would need to “remember” where it was up to, which means it would need checkpoints and to store progress on disk. (ZFS and BTRFS actually do this too). The problem is that most RAID implementions are simple; complexity leads to bugs, which leads to data loss.

Most RAIDs are also designed for a specific scenario - a server that never gets turned off. You’re using it for something it wasn’t designed for, hence you’re hitting rough edges that normally aren’t exposed.

Personally I much prefer having a separate NAS. It means I can shut down my main machine whenever I want, and I can do maintenance on each one without turning the other off. It also means you can get away with a smaller case and may end up saving space.

It also solves your Linux/Windows access issue as both support SMB and NFS.

Tystros · September 29, 2022, 6:49am

Personally I much prefer having a separate NAS. It means I can shut down my main machine whenever I want, and I can do maintenance on each one without turning the other off. It also means you can get away with a smaller case and may end up saving space.
It also solves your Linux/Windows access issue as both support SMB and NFS.

I can see that a NAS would have some advantages, yeah. But every time I looked at NAS, they seemed really complex, have a ton of their own software and whatever, that I don’t really want to have to use… I just want to have some partition with my data, I don’t need media streaming or all those features that NAS have. NAS are never advertised as an external RAID1 solution, they are advertised as a media streaming server, backup solution, etc etc, and I need none of that stuff.

Also, I use in total 2x12 TB HDDs, 2x6 TB HDDs, and 2x2 TB SATA SSDs. All running in RAID1 each. So I would need a NAS with at least 6 slots. Those seem to be expensive as far as I can see. Which one would you recommend me?

And wouldn’t I lose quite a bit of speed when using such a NAS instead of directly accessing my data on drives directly in the PC? Especially with the 2x2 TB SATA SSDs. Would a NAS still give me the full ~1200 MB/s read speed that I should get from a RAID1 SATA SSD config?

twin_savage · September 29, 2022, 7:19am

I’ve never been able to get into a resync situation after a hard reset on storage spaces… and I was having alot of those lately due to Windows automatically installing a bugged southbridge driver that would cause the hdds to drop out and Windows to freeze up periodically.
I think when formatted NTFS, data integrity is handled the same as any other NTFS volume, so its advisable to run chkdsk after an event that could have caused trouble. ReFS would be another story since it has checksums to verify integrity at the file system level.

If I had to have a ‘separate’ linux system access the drives I would just have them setup as a shared drive on Windows and have the linux box/vm access them over the network, samba is the backbone of this method.
If you are talking about dual booting the same machine with linux then there isn’t a very straight forward way to access the Windows storage spaces volume.

ReFS has come along way; I would consider it reliable at this point. The only knock against it is that there aren’t as many file system recovery utilities for it if I had to reconstruct data at a low level on it after something catastrophic; if for example I accidentally formatted the hdds and then decided I wanted my old data back.

Supposedly paragon makes a ReFS driver so that linux can natively use ReFS volumes… but its probably easier to just access the volumes over the network hosted by a Windows PC then samba handles the translation.

Tystros · September 29, 2022, 7:24am

I am talking about dual booting, yes. I only have one PC. And long term I’d probably prefer to move away from Windows completely, even if so far its my main OS.

So no way to access those storage spaces from Linux is a bit of an issue then. The regular software RAID1 from Windows can be accessed from Linux, though it might be read-only, not quite sure.

ReFS has come along way; I would consider it reliable at this point.

Ok, nice to hear.

but its probably easier to just access the volumes over the network hosted by a Windows PC then samba handles the translation.

Well I only have one PC…

twin_savage · September 29, 2022, 7:35am

I hear you on that, things are only getting worse on that front.

I was browsing the paragon site and apparently their driver let’s linux access ntfs storage spaces as well as refs ones, read and write. I might just give it a try if i can figure out how their stupid website works.

xzpfzxds · September 29, 2022, 10:58am

If your machine isn’t stable, then there is very likely silent data corruption happening already, which RAID-1 does nothing about. Which raises the question of what problem you want RAID-1 to solve?

Any OS can be a “NAS”. I use a very standard install of FreeBSD (it defaults to ZFS) on a 8 year old Xeon i bought for $50, added a Mellanox 2x10GbE NIC and it easily gets 2GB/s transfers.

cowphrase · September 29, 2022, 11:18am

Nearly every NAS box can simply be a network storage device doing RAID1, it’s just they bundle in features on top. A NAS can be anything from a Synology NAS running its own software, to an old PC running TrueNAS.

From a software perspective you’d setup a RAID1, enable disk failure notifications, enable SMB/NFS, and you’re done. From there windows can access it over the network and you won’t notice the difference. (Admittedly you’d be limited to 100MB/s with gigabit networking).

Moving just the 2x12TB HDDs should solve your current rebuild time issues. No need for six drive slots. As mentioned above you can reuse an old PC as a NAS.

You can leave your Sata SSDs and 6TB drives in your current PC. The main point is that you’re fixing the RAID1 rebuild times that you’re suffering from.

I’ll also mention an alternative idea. If you have two GPUs you can run linux as your primary OS, and run windows as a VM. Then you can dedicate one of the GPUs to the windows VM so the host never sees it. From there you switch inputs on your screen, use a KVM, or use virtual KVM software like Looking Glass.

Running linux as the host OS means you can use ZFS or BTRFS for the RAID, while still giving Windows access to the data through SMB or NFS. (VM networking can operate at the speed of the disk). This solves the crash issue, and means you can use Windows and Linux at the same time.

Trooper_ish · September 29, 2022, 12:52pm

Has anyone used Drive Bender by any chance?
It looks like it costs about the same as a raid card- 40 bucks.

I never used it, but seen it on Unky Joes Playhouse

wayland · September 30, 2022, 1:11am

Bung the drives in a NAS and connect to your PC with the fastest cable you can.

Janos · September 30, 2022, 2:45am

If you are using Linux and Windows, then there is no solution other then a NAS with ZFS if you are interested in data security, especially if your PC is not stable!
ZFS for Windows is not stable and Refs and NTFS are not really what you want for Linux.
Get a second system with 32GB RAM and 10GB/s NIC and install Truenas Scale and activate periodic snapshots, the day will come where you will be grateful for it.
Trust me, you’re risking your data with your experiments right now

Janos · September 30, 2022, 2:58am

I have over 2.5GB/s read/write via iSCSI and SMB with TrueNAS direct connected to my workstation, yes it costs more than direct attached storage, but you can’t have everything at the same time

kewiha · September 30, 2022, 3:01am

If you want a direct replacement for windows disk management raid, I think chipset raid is the only option that makes sense imo. I have used chipset raid0 (on Z77 windows boot SSDs) and recall debian stable being able to mount them without a fuss. If you have a couple spare drives floating around you can test with, you could try that as a direct replacement for the simpler windows disk management raid.

However, give that you have multiple many-TB drives in use, I think you’re due for a NAS type device. The benefits this would provide you include:

If something catastrophic happens to your desktop, you have a chance of not losing all your stuff. My uncle in-law spilled soda in his desktop and while fortunately all he killed was the PSU, he could easily have killed every SSD and HDD hooked up to the system at once. Physical separation of your redundant copies of key data is good to have. Don’t just move all your data over to the NAS either, as one stray soft drink can still murder all of it.
- Consider a NAS that runs syncthing (or equivalent) which will actively sync folders on your desktop with your NAS. This is how I ensure that if one device (regardless of what it is) vanishes, I still have my data somewhere and it’s easy to get to. This approach also reduces the need for >1Gbit/s networking because the NAS doesn’t have to keep up in real time.
Advanced filesystems like zfs (I can really only recommend zfs) give you a lot of nice features:
- Ransomware and accidental delete/overwrite protection if you enable automatic snapshots. Syncthing can also provide this functionality, but only for data you sync via syncthing.
- Failing disk replacement can be easy. In a couple commands on the cli, zfs can replace the failing drive if you already have the replacement HDD handy. If you use a RAID10-like configuration, zfs can even be asked to kick out a pair of drives (e.g. going from RAID10 down to RAID1) if you have enough free space but don’t have the new HDD here yet. I’ve done this before and didn’t lose data, nor was it nearly as stressful as it could have been with other RAID solutions.
- With zfs it is very easy to replace HDDs with bigger ones, or expand a RAID1/RAID10 like configuration with new pairs of drives. You don’t have to capacity match them either, but they should be a fairly similar capacity for optimal perf and wear sharing.
- ZFS compression is really good if you have compressible data, and it’s enabled in the background with no tangible performance loss. NTFS disk compression is awful by comparison.
If you have a trustworthy RAID setup in your NAS and syncthing ensuring the data is on both your desktop and NAS, then you can avoid RAID entirely on your desktop imo. I only run redundant storage in my NAS.
NAS systems shouldn’t have to be complicated internet-connected devices like you’re concerned about. Any respectable NAS OS will be perfectly happy with no internet access or port forwards, although you should log into them periodically to update them and configure some way for it to email or otherwise alert you when a drive or software fails. TrueNAS is configured via a web interface like a typical router, and doesn’t require (or really encourage) command line usage because the webui is supposed to have all the features you need.
I don’t think you need as high end specs as other users are suggesting if you’re building a custom NAS to act as a backup appliance. 8 GB RAM on a late DDR3 platform (e.g. LGA1155) or newer with gigabit networking should be perfectly satisfactory if its primary job is to run syncthing and maybe some NFS/SMB shares to keep a safe copy of your data. Just do a quick google search and confirm that your platform is well supported in TrueNAS and you should be fine if you don’t see anybody complaining. ECC RAM is always nice to have, but if you don’t have it on your Windows box you are not any worse off. I’m not current with TrueNAS, but the only thing you might want to buy is an intel NIC (can be gigabit, just needs to be intel) if your mobo doesn’t have one, as those tend to get better support in FreeBSD.
Like others have mentioned, if your Windows machine BSODs or hard resets regularly, you will lose data. At least if your NAS is stable, you have a better chance of your OC tinkering not murdering the data you bought that UPS and redundant SSD/HDDs to protect.

I also have some cautionary notes:

as others mentioned, zfs on windows seems very proof of concept according to this video by a zfs on windows expert. It seems like it’s still in an alpha or beta level of maturity. It’s also worth noting that zfs pools need to be imported (and ideally exported) to change hosts, so if you dual boot I don’t think zfs will do what you want even if it’s stable in windows.
Some games and other applications don’t like ReFS. See here. If you’re just using this as data storage this might not be an issue for you, just be aware it could cause headache.
I wouldn’t buy a RAID card for this if your chipset supports RAID1. RAID cards have the same fundamental data integrity limitations as chipset RAID (e.g. they don’t know which drive is lying if they return different data).
Don’t buy a crappy NAS. Crappy NAS can be crappy hardware, a crappy vendor, or both. If you can build a PC, building a TrueNAS box is well within your abilities and it’s excellent NAS software.

Good luck

Janos · September 30, 2022, 7:57am

well, the sync apporach is quite expensive in terms of storage usage.
It depends on how much data he already has and how fast it grows. You then have the data local, plus NAS with redundancy.
Depending on what kind of data it is, it makes sense to edit the data directly on the NAS.
And yes, you schould have a secound copy of your data, but this is not the same as local in my system plus a NAS to which I replicate.
My local Storage is my first tier in terms of performance, it’s expensive, super fast, small and silent, my second tier is still fast, big and not so silent, hence I dont want it near by me.
The third stage is large, slow and as cheap as possible.

If you wanna have a fast NAS where, for example, you can use your Steam libery directly from, look on ebay for Emulex or Mellanox, 10Gb/s Ethernet is quite sheep, even 40 Gb/s should be found for 50$.

system · July 1, 2023, 1:57am

This topic was automatically closed 273 days after the last reply. New replies are no longer allowed.