How does GRAID work?

NorthernWing · March 25, 2022, 6:50pm

It’ll be fascinating to watch this one unfold and see what comes of that video. That’s not a negligible amount of resources for them to have made it private on the main channel given the recording/editing time, upload slot, etc I’m sure it used.

Wonder if anything will be mentioned on the WAN show this week or maybe next depending on who actually put the hammer to it… whether they realized something was presented wrong, or new information came up, or whatever.

aBav.Normie-Pleb · March 25, 2022, 6:54pm

I was stunned as soon as I saw that T1000 (= no ECC for parity data) and the claims of more than 16 GB/s over PCIe Gen3 x16. The data the GPU is calculating has to get out of it somehow.

These two points should have been immediate red flags.

NorthernWing · March 25, 2022, 6:58pm

Agreed, something isn’t right here - I’m just curious to see what exactly went wrong at this point.

SgtAwesomesauce · March 25, 2022, 7:04pm

ecc for parity data

very overrated.

SgtAwesomesauce · March 25, 2022, 7:08pm

Video’s back up for some reason.

aBav.Normie-Pleb · March 25, 2022, 7:09pm

?

Can you explain further so might learn something?

If GRAID offers RAID5 (as tested in the LTT video) so the usable storage is number of drives minus one (n-1), wouldn’t corrupted parity data be catastrophic in the event of a drive completely failing and the restore has to rely on the (potentially faulty) parity data stored on the remaining drives?

aBav.Normie-Pleb · March 25, 2022, 7:11pm

…and all the comments have mysteriously vanished

SgtAwesomesauce · March 25, 2022, 7:11pm

This article explains why non-ECC RAM for parity calculations is really not a big deal.

https://jrs-s.net/2015/02/03/will-zfs-and-non-ecc-ram-kill-your-data/

The long and short of it is that the likelihood of a memory error flipping a bit to a valid checksum, thus validating bad data being returned is astronomical.

Yeah, that’s really sketch.

aBav.Normie-Pleb · March 25, 2022, 7:13pm

But isn’t GRAID strictly block-based storage, so no additional checksum features like the good-ol’ ZFS?

SgtAwesomesauce · March 25, 2022, 7:14pm

hmmm, that’s a really good point.

If it’s doing bit-by-bit parity validation, (which it isn’t doing any read-validation, apparently), ECC memory would be a requirement.

aBav.Normie-Pleb · March 25, 2022, 7:39pm

risk · March 25, 2022, 7:44pm

Snapraid sources here: snapraid/raid.c at master · amadvance/snapraid · GitHub

are citing speeds on the order of 10s of gigabytes per second using a CPU, this implementation might be more efficient than what’s in ZFS.

I don’t know how complicated or efficient reconstruction would be.

When it comes to sampling parity data on reads, reading all the parity data wastes IO, it’s more efficient I/O wise to compute and then compare a checksum.

I guess, using an old GPU for raid computation in a computer using an old CPU makes sense.

aj0413 · March 25, 2022, 7:45pm

Given previous interactions, I feel okay posting this

GigaBusterEXE · March 25, 2022, 7:57pm

how does GRAID work
It doesn’t

aj0413 · March 25, 2022, 8:06pm

How much you wanna bet the hottake of the wan today is gonna be Linus defending GRAID?

SgtAwesomesauce · March 25, 2022, 8:25pm

ehh, I’m just gonna wait and see.

SgtAwesomesauce · March 25, 2022, 8:32pm

Looks like the comments are slowly coming back, curious what’s going on there.

redocbew · March 25, 2022, 8:34pm

Maybe youtube striked it for some reason?

risk · March 26, 2022, 6:58am

I retract my statement, this makes no sense whatsoever, the card looks like just a licensing tool.

ZFS raidz/raidz2 code may not be as speedy as btrfs/snapraid raids code… but that should hardly be a bottleneck.

In world where you get 100+ threads per system, spending a couple of them on raid calculation if/when needed is a much better tradeoff than spending PCIe lanes that could be used to connect storage to the system, or network.

When it comes to reading raid6. You need to only read the data, not read the parity, checksum the data, compare to expected checksum you have stored in the filesystem.

Drives will have checksums internally, but for safety you really want end to end checksums… e.g. ideally ZFS records would fit into your CPU cache and you wouldn’t have to trust ram/controllers/PCIe/drive checksums or parity.

When writing, obviously you need to compute parity from data, which is more CPU intensive than not having anything to do with parity… and there’s various accounting that needs to happen to ensure you can detect/fix half written data in the future, in case there’s power loss (metadata journaling).

When reading, if data checksums for a data block don’t match, you pretend the block is not there and reconstruct the data block from parity.

When doing a scrub, you read all the data/parity blocks, and you compare checksums of blocks to what’s expected.

This is where a big deal with btrfs was a few years back, btrfs wasn’t checksummming the parity, and when scrubbing it was recomputing raid, and was relying on data checksums for verification. Since most of the time everything matches this is an ok optimization. But when things don’t match you need to try different combinations of stuff missing and cross your fingers to recover missing data… this was fixed.

@wendell since you have the shiny 7773X and other similar fun gear, any chance you could run raid/test/speedtest.c from snapraid/raid.c at master · amadvance/snapraid · GitHub and send a PR to add new numbers

it might also be an interesting example of a workload where huge amounts of L3 cache make some difference; if you were to run this multi threaded.

edit: actually, anyone with a zen 3 , or a recent (10-11-12) gen Intel could probably help refresh these.

wendell · March 26, 2022, 3:09pm

how do you easily run the speed test?

/home/w/.cache/kcbench/linux-5.15/drivers/mtd/tests

it comes with the kernel modules under drivers but it doesnt have a main() function and seems not to be a stand-alone module.

It used to be just insmoding raid5 ou’d get the output here from dmesg but it doesnt do that automatically anymore it seems