Fast scratch space for data processing (NVME raid?)

hsnyder · May 3, 2021, 9:38pm

Hi all,

I have a 2x Epyc Rome linux workstation that I use for data processing of electron microscope images for a structural biology application. The data sets are too large to fit in RAM, and many of our algorithms need to access random subsets of the data at a time, over many iterations. The reads themselves aren’t that big (maybe 1-4 megabytes per chunk, where the chunks are randomly selected and usually not sequential), but we might need a few thousand of those chunks at a time.

I’d like to add a fast cache/scratch drive to this system to accelerate that random access pattern. I’m curious from those who have experience with linux nvme raid, if you think that a raid 0 of nvmes would be a good fit for this application (I know @wendell has lots of experience with nvme raid…). If not, what would be a good direction to go in for this? The motherboard only has pcie 3.0, I should mention…

Thanks!

Log · May 3, 2021, 9:50pm

I don’t really have the expertise to help, but two questions I would ask are:

How quickly does the data turn over? The endurance necessary for data that is getting overwritten multiple times a day is different than reading from data that sits there for a week or month and only really added to.
edit Whoops nvm, M sized blocks, which is honestly fairly sequential as far as hard drives go. how big are your writes/what’s the block size of your database or whatever you are storing the data in? Random 4K mix of read/writes needs a way different drive than just doing large sequential reads.
Are these sync writes/latency sensitive?

Another thing to keep in mind is you’ll have to pay attention to if the drive can still keep up the expected performance when you’ve filled it 50%, 75% or more. edits looks like it’s write once, read heavy, so this is much less of an issue)

hsnyder · May 3, 2021, 9:56pm

re: 1, that’s a good point. The turnover isn’t that high, I don’t think. The workflow would be: copy the whole dataset to the cache drive, hit it with a ton of random reads for hours. Changing datasets would be once every couple of days on average. (where it’s safe to assume that a dataset is around half a terabyte)

re: 3, writes aren’t performance sensitive at all - the data is copied in and then read repeatedly like noted above. They’re being read in from HDDs or from the network, so write speed isn’t really a huge consideration.

Log · May 3, 2021, 10:00pm

Nice, that definitely loosens the requirements and subsequent cost, it also sounds like the drives can suddenly fail without actual dataloss, at worst needing to just restart a job.

hsnyder · May 3, 2021, 10:04pm

Exactly - that said, I will definitely look at the endurance of any drives I might choose to make sure I’m not going to kill them too fast…

wendell · May 4, 2021, 12:31am

I have always used optane in this scenario. The one time I didn’t the nand was worn out within a few years. I had setup performance counters initially and was satisfied that the writes was much less than petabytes but in practice not so much.

If you keep your eyes peeled you can get a good deal on 300-1.6tb previous gen optane. You’ll pay $1000 per tb but imho worth it for the ‘set it and forget it’ aspect

hsnyder · May 4, 2021, 1:09am

I’ll admit, I’ve been very curious about optane, just scared off by the price tag. I would need about 1TB. I’ll look around and see if I can find some good deals, though I might need to get something fairly soon, and definitely can’t spend full new price for a 1TB 905p for example…

wendell · May 4, 2021, 1:13am

If you could store everything on nand, then that could work too because it kills the need to re-write to any cache. It’s just always nvme.

I’ve seen the 1tb pop up on ebay for <$1k in the 905p “gamer” variety. Tbh at this point that’s an instabuy. Optane has basically gone up in value … :-\

hsnyder · May 4, 2021, 1:19am

That’s a good point. Though the motherboard in question is the supermicro H11DSI, which has a rather pitiful array of PCIe slots. To get enough flash in this box, plus the GPU, would be a challenge. The dual socket workstation motherboard situation for Epyc is (as you’ve noted in a few videos) rather on the poor side. Still, when you need the CPU memory bandwidth, nothing else will do, and ultimately that was the top priority…

system · February 1, 2022, 7:19pm

This topic was automatically closed 273 days after the last reply. New replies are no longer allowed.