Optane 900P in a PCIe 3.0x1 slot for ZFS cache/special VDEV?

Hi! New here, so hopefully this is the right place to ask-

Context:
I’m figuring out plans for a budget semi-janky used-parts NAS to run a TrueNAS Scale/ZFS home server on. Right now I have 5 Skyhawk 10TB drives I got for a killer price that I plan to put into a Raid Z1 pool- that is to say, the performance won’t be insane.

I’m curious about Optane and using it maybe for an L2ARC, or maybe for a special vdev, because of its insane random IOPS. But if I used it for a metadata vdev, I’d want mirrors. I’d want 2-3 drives. That becomes a problem when the ones I’m looking at are PCIe 3.0x4 devices, and I’m having trouble finding motherboards with 4-5 PCIe slots that go past x1. (I need 1-2 other slots for a networking card and a misc slot)

Question:
Would an Optane 900P work well as a cache device running on a single PCIe lane? Is it even possible to run a x4 device on a single lane?

I don’t think the bandwidth would create a problem. It’s the IOPS that matter anyway right? There are plenty of motherboards that have like two x16 slots with three x1 slots, and my idea is to slot the Optane into the x1 slots.

Followup question if that’s a bad idea- could I mix media for the special vdev? Could I use one Optane 900P, a SATA SSD, and an NVMe drive all mirrored? Would I get the full benefit of the Optane, while having the safety of multiple backups of my metadata?

This feels like a wacky and tedious question, but I ask because

  1. there’s a Facebook Marketplace deal for some Intel 900P drives lol
  2. I think I’ll learn a little more about ZFS in the process? And maybe a bit about PCIe connections…

Thanks!

They look like SMR drives. Return them! SMR drives wont work with ZFS.

they do but be prepared for some looooong resilver time and horrible random IO

Otherwise special vdev doesnt need bandwidth, so even SATA is fine. Important is good random IO, so any modern flash will do

According to the link below they’re CMR not SMR.

https://www.seagate.com/products/cmr-smr-list/

They’re CMR. I checked the data sheet https://www.seagate.com/content/dam/seagate/migrated-assets/www-content/product-content/skyhawk/en-us/docs/100804012f.pdf

It’s the Optane I’m more interested in, and the challenge of connecting many PCIe cards/devices

Try
zpool trim [pool] [device]
If it doesn’t give you error, you are screwed.

If they are not SMR, you should get something like this.

~$ sudo zpool trim tank ata-WDC_WD80EDAZ-11TA3A0_VGH5MZPG
cannot trim 'ata-WDC_WD80EDAZ-11TA3A0_VGH5MZPG': trim operations are not supported by this device

If i remember currently a Pcie x1 slot can power up to 10w at 12v. And the 900P i believe is rated for 14w. But i think it uses around 8w or so. I think it should be fine at x1.

Oh man, see this is the kind of consideration that wasn’t even on my radar! Thanks, I will double check those specs before I make any purchases.

Other than power, do my assumptions about bandwidth sound about right? 3.0x1, or even 2.0x1 should be plenty of bandwidth?

Not sure if cache bandwidth would affect network transfers. I would probably try and stick to at a minimal Pcie 3.0x1 if possible. Pcie 2.0x1 is getting pretty close to sata speeds.

It looks like PCIE power limits are tied to the form factor of the card not the number of lanes supported by the slot. If you look at the pinout no power comes over anything after the first 10 pins. Someone please jump in if I’m wrong but I think a full size x16 card plugged into a 1x electrical slot could still pull 75 watts through the slot. Any miners around who can confirm that a GPU can still pull 75w through a single lane slot?

PCIe 3.0x1 is what… 1GB/s real world throughput? I think you’ll be fine.

Your 10 HDDs combined throughput exceeds this but in my mind the prospect of satisfying a portion of your read I/Os with sub-millisecond latency (and keeping them away from the spinners) is worth the potential occasional bottleneck in throughput.

I don’t think you mentioned the speed of your network? 10Gb throughput is in that same ballpark (of 1GB/sec) anyway.

Edit: just realized you have 5x HDDs; not 10. I think you’ll be even more OK. Carry on…

It’s about special vdev, so we’re talking about single/double-digit MB/s . Most of my metadata is cached in RAM anyway, so I get like 10-20MB/s if I really have a lot of unusual small stuff moving.

Somewhere in OpenZFS world (I forget where exactly) there was some mention of altering zfs_l2arc_mfuonly to allow for MFU plus metadata. If metadata could be thoroughly populated into L2ARC and kept there indefinitely I reckon it’d quell much of the hand-wringing that goes on regarding special vdevs.

I’m personally interested because I regard specials as one more thing that can wreck my day… I’ve avoided them altogether.

I prefer L2ARC over special device. Special device is to tackle very special use cases. You have to know exactly what your problems are. In most cases, with special devices, you just waste your resources such as drivers and pcie lines and don’t get any benefits over a simple L2ARC.

By default, L2ARC can cache everything. Metadata is part of it and in high priority over normal data.

1 Like

IMO I’m going to suggest something different: For the price of the 900p you could just get a battery backup on the box and possibly add more RAM. PCI-E x1 on a PCI-E 3.0 is 1GB/s and those 900p’s do 2.5GB/s so they’re seriously hamstrung at that point and may even hinder performance. Plus I would be worried about the x1 slot providing enough power (but not THAT worried).

RAM > Optane

I went v4 Xeon for my ZFS backup box. 10x 14TB disks with 256GB DDR4 RAM in RAIDz6 on a $120 Supermicro X10SRL and a $8 e5-2660 v4. Fourteen cores, 256GB RAM and it runs great. I went heavier on the RAM because this does a good bit more than just backup.

If I were doing it again and I needed to get it done on the super cheap I’d look at v2 Xeons with 64GB sticks of DDR3 at $15 a piece if electricity cost isn’t too big of an issue (though I don’t think v2 Xeons eat THAT much more power). Also remember that Intel IPC didn’t really improve that much for a long time since Ivy Bridge/Haswell so IMO Ivy Bridge/v2 Xeon can still be relevant.

If I needed some kind of special processing capabilities (i.e. for video, AI etc) I’d probably just get a GPU and stick it in there.

L2ARC, in my opinion, is for when you have a high capacity storage array (i.e. spinners) that is
mission critical or other mission critical use-cases. In this instance an Optane L2ARC (or even one of those crazy Radian cards) is appropriate because it ensures that whatever you intend on writing to that disk is most likely going to get written.

For most mere mortals I think more RAM and a battery backup is fine. Enterprise RAID controllers come with a BBU (Battery Backup Unit) that is tiny. This is to battery-back the onboard cache in case of power failure to ensure that cached writes make it to disk. A $50 APC battery backup on your box is effectively performing the same task. More RAM is just feeding ZFS what it needs.

Just my $0.02… I’m still fairly new to ZFS myself but these have been my observations thus far so anyone/everyone feel free to point out where I’m wrong.

1 Like

If I remember this right, there is some start-up and sense-mechanics in play. So a 1x, 4x and 16x have different power envelopes somehow.

Sorry to respond a bit late, got busy. This is exactly what I would love, and I’m surprised it hasn’t been implemented or talked about more.

I will soon have 128GB of RAM, or even 256GB. I’d rather go huge on RAM than on Optane (or NVMe drives!) really. Not having to have individual devices for a special vdev means less PCIe slot requirements, less cost, and more flexibility. At least if you want the special vdev triple mirrored.

It shocks me that a special vdev was thought of, but a cached version of it in RAM wasn’t.

I bought a Dell Precision 5820 and am filling it with RAM.

But it doesn’t solve my desire, I don’t think.

It’s not about bandwidth, it’s about random I/O, and it’s about not having to “retrain” the ARC cache if I restart the system, or not having to experience the latency if I want to access some collection of rarely-accessed files that all end up being cache misses. The ARC cache won’t hold all my file metadata in it- it only holds what it deems useful according to its algorithm (which I hear is a good algo!).

Files are not being stored in RAM, aside from the 5 second (or whatever you set it to) transaction group, so I don’t know where battery backup matters any more or less if using Optane or more RAM. It’s a good baseline idea that you’re being equally irresponsible to ignore no matter your configuration I think? Well, perhaps the irresponsibility rises if you increase the TXG time amount.

I could be wrong though. I’m new to this obviously.

love it

You can find a used dell precision, HP Z series, Lenovo Z tower for sub $100
It’ll have 4+ x16 slots and several x8’s

Depends on your power budget and cpu choice

I’ve seen everything from 3 x16 slots fully wired for a single CPU to 8 x16’s on dual socket systems in the sub $100 range