RAID: Tech in Transition | Level1techs

Great video, I have been working on getting a video together of the 28TB RaidZ I did on my ubuntu server. it has 6 3TB drives and 6 4TB drives with 4 pools of 3 disks in raidz1/5 so I can loose 1 disk in each pool without loosing data. This may sound like overkill at first until you realize I started with 3 3TB first and upgraded my server 3 disks at a time because ZFS can do that. I also have 2 intel 120GB SSD's in a raid1 to act as a cashe which works very nicely. I have been using it to run my PLEX server, OwnCloud, and web server. Owncloud and the webserver run as virtual machines.

Motherboard: ASRock 990FX Extreme3
CPU: AMD FX(tm)-6350 Six-Core Processor
RAM: 3x8gb DDR3 1600 = 24gig
OS: Ubuntu 14.04
OS Drive: WD 320gig 7200RPM
HDD Config
4 drives on board SATA
8 drives on LSI LSI00301 (9207-8i) PCI-Express 3.0 x8 Low Profile SATA / SAS Host Controller Card

Neat. Can you do some benchmarks on it? use at least 10 gigabyte file sizes. Raw data would be nice but I may include it in an upcoming video. Do you have the SSDs for ZIL cache or both ZIL + L2ARC?

Thank you very much. I testing with BTRFS. Would it be less ... crazy or you recommend to get back to ext4?

@wendell please shower us in more of your glorious knowledge you wonderful beautiful man.

On a slightly more serious note, I found the video really informative and interesting. I will definitely be referring back to it whenever I need info on storage systems. I am very much looking forward to the next video in this series.

I must say I am a little jealous, you get to work with all these huge amounts of data and really interesting hardware, although the hardware I use is interesting, it is hard finding anyone who knows about it so I can never geek out over it with anyone and I only get to work with files about 10mb in size a lot of the time.

Keep it up, and I look forward to similar style videos on different topics. I wish I had you teaching my courses.

@wendell is there any reason why you didn't talk about ZFS on Linux? We use this + GlusterFS in our infrastructure and it is really solid! It has been stable for a year at least.

Being able to change directory into snapshot of the data volumes is a wonder!

its got its own video coming

Hi Wendell,

Thanks for this series... I'm an engineer/analyst by trade, so I rely on my computer to keep my pretty numbers whole, but never knew much about the effectiveness of redundant systems.

Do you have any relative risk percentages for these different systems (failure/evilness of a disk stand-alone being one of one, raid 1 being I dunno... 1 out of 1.5 chance of getting an unrecoverable vortex of death and lies)?

Thanks!
cycle

@Wendell :

Do you know a way to add data bitrot auto-detection & correction to a WHS2011 server running a PERC6i RAID-6 setup ?
I would love to be able to have the additional data protection on my little server.

my question would be: what if u just went in and give the pcb of the corrupted drive, wich isnt failing in the eyes of the raid, a good ol kick or a slice with a knife or just a drop of solder on, wouldnt that give the expected error? would it even be possible without the psu to fail? im sure the server thingis can do that? my first thought and my last when i watched

I'm probably a noob that did not have an optimal setup but I recently lost my BTRFS after a system crash.

It just hung indefinitely trying to mount the file system. I tried a few times and left it for about half a day but it never did mount. It was a very simple single disk setup too.. I could not find any errors in syslog, dmesg, messages, etc. I gave up eventually and formatted it.

Not too happy with it so far.

Hey Wendell long time viewer first time logging into this site; love the show!

Anyway just wanted to chime in after watching the raid segment you put on youtube. I ran btrfs for about 2 years on my main work machine and couldn't have been happier! I had auto snapshots being created using snapper and it saved my butt on more than one occasion :). I even had some HDD problems and thanks to btrfs raid-0-ing my data/metadata I recovered everything.

I also was super frustrated by df "lying" as to how much space I had available but after a little while I just adjusted how I looked and it isn't an issue. Check out this post in their FAQ for more details.

Keep up the amazing work!

IDK, are we supposed to be trusting a 'Red Shirt'? :-p

Correct me if I'm wrong, but as far as I know an SSD automatically detects failing cells, so hardware RAID should be good enough if you're only using SSD's.

ehh, theoretically mechanical hard drives have this as well but in practice it can be squirrely.

We were able to induce SSD corruption on a samsung EVO by putting it on the same cable as a cdrom and ejecting and inserting the cdrom a bunch while the SSD was writing data.

We've got an upcoming video that shows you where/how to solder capacitors on the power rails to give the SSD time to write out stuff properly (Intel 750 SSDs have this built-in, so not needed. samsung evos.. not so much).

1 Like

Also, why aren't there any RAID controllers that do integrity checking just like ZFS. Since ZFS has a high overhead, wouldn't that be better ?

Servers have optical drives ? Really ? I thought that the OS is installed over the network or from a usb drive. That's the only thing that comes to my mind when I think about optical drives. I'm not familiar with enterprise equipment.

No, the point is not about optical drives per se. The point is that even super minor power blips caused "only" by a cdrom eject button can lead to SSD data corruption. the eject button on a cdrom was a convenient way to turn a motor on and off to cause a power drain on the same line as an SSD.

It could just as easily be a mechanical hard drive spinning up, or a network card going from sleep to wake very fast. Or a power failure at just the wrong moment.

But If you have the power problem fixed, then can you use hardware RAID only with SSD's and not have data corruption ?

Not necessarily, for example EVOs have exhibited other forms of data corruption from which they do not recover.
http://www.anandtech.com/show/8617/samsung-releases-firmware-update-to-fix-the-ssd-840-evo-read-performance-bug

This "read performance bug" are the cells being re-read to try to get at the data. The fix? Rewriting data to those cells. Unplugged SSDs slowly "leak" the data as well.

That has been debunked: http://www.pcworld.com/article/2925173/debunked-your-ssd-wont-lose-data-if-left-unplugged-after-all.html