General discussion of NVMe Performance

Context:

1 Like

Discussion cont from:

https://forum.level1techs.com/t/fixing-slow-nvme-raid-performance-on-epyc/151909/17?u=adubs

It is probably viable until you account for the heat the drive would generate. And then multiply that times 24. NVMEs already run hot and throttle. In fact in my experience most drives issues are cause by inadiquate cooling. But Samsung’s are already using octa core chips on their NVMEs most others are using quad core chips. They would simply need a little ECC RAM and a larger NAND for more complex firmware.

I don’t think heat is really a problem on the rack mount servers as the airflow is quite high and should be enough even for that density. I have not worked on many larger systems with NVMe, but what I have worked on has not had any heat problems that I have seen.

@OrbitaLinx
@thexder1

the tangent you two seems to be going off on isn’t really applicable here… did you two read the github issue tracker thread?

https://github.com/zfsonlinux/zfs/issues/8381

2 Likes

According to testing done by GamersNexus the new PCIE 4 NVME drives are over heating and come with a giant heat sync for a reason. They tested the drives with and with out the heat sync and the performance increase was extreme with the heat sync. Even with the heat sync on the performance still degrades under load. They also tested PCIE 3.0 NVME and most had minimal degradation with out the heat sync but still had some level performance degradation under load. Most had some slight level if improvement with better cooling. A very few PCIE 3.0 NVME drives do have heat problems if I remember correctly. But ya servers have plenty of cooling in any case. @nx2l I wasn’t saying Linus problem was with cooling. I just said I find most problems with drive I have seen is related to inadiquate cooling.

I believe the drives Linus is using are the ebay 4TB NVME intel drives that FB was unloading. I dont think they were new enough to have PCIe 4.0

1 Like

He bought these https://www.ebay.com/itm/Intel-DC-P4500-4TB-2-5-SSDPE2KX040T7-Solid-State-Drive-NVME-SSD/113910953799?epid=13034411922&hash=item1a859f3747:g:xI0AAOSwknpdlSsX after recommendation from Wendell.

2 Likes

Right in any case I assume the Linus server and the Intel drives have proper cooling.

It looks like using direct io, disableing aches and finding a way to stop the data duplication are the fixes in the file system according to the thread posted by @nx2l.

It is to bad ECC can can’t be overclocked on a server motherboard. Faster RAM after the aforementioned fixes seems the best solution until they build a ZFS for modern hardware… Like I said I would like to see two ZFS a common hardware version and a performance hardware version.

I would say currently the best solution is to either keep playing with it and or wait for ZFS to be fixed or upgrade to ROME and use 3200mhz RAM with the current ZFS implementation. Knowing Linus he might want PCIE 4.0 NVME storage at the point that he is trading in his new server for a new one. And then Wendell will still have to implement the optomizeations he has found to make it work right.

Yes, from what I have seen the PCIe 4.0 NVMe drives do run hot and come with large heatsinks, but under normal conditions with those heatsinks do not overheat, or at least not noticeably. In any case I already knew Linus was not using PCIe 4.0 NVMe. Also the airflow in the rack mount cases would likely still be enough as it is far more than any normal desktop case would have, though I am not sure if anyone is even putting PCIe 4.0 NVMe in normal drive bays yet so that may not be really valid at this time.

I will also point out that I have yet to see any drive issues that were directly related to drive temperature even in pretty extreme situations. I know it can happen, but in many years of working with desktops, laptops, and servers even in data centers I have not seen it.

I do kind of wish I still had the dedicated servers that likely had this same issue that Linus was facing to try out the fixes on them.

1 Like

Well the moment I got a case that put airflow directly over the drives I never had a hard drive die again. I have 11 year old Seagate hard drives I had in a raid for 10 years that haven’t died and those drives are the drives that had the highest failure rate of any drives ever on the market. They were also the cheapest drive to ever be sold and they besically run 24/7 with no power saving features enabled. Before that I lost drives every year or two. I upgraded to an SSD so now they are used as bulk storage

I haven’t been able to commit to the years of testing neccisary to proove anything but in my experience ever since I realized that a cool drive is a happy drive I have noticed that every dead or failing drive I have come across has had less than adequate airflow if any at all. This frame of mind has allowed me the insight to look at and physically test any fan providing airflow to a dying drive. Turns out they were dead, dying or the airflow was blocked by dust or inadequate ventilation if their was any at all. Especially in laptops and external enclosures. Laptops and enclosures with out active cooling are generally like ovens for storage. A hot drive will definitely run slow and a hot drive will defiantly corrupt data and have failed file transfers once the extended exposure to heat begins to kill the chips. The damaged chips will over heat more easily and lock up and that can kill the storage media because some times they lock up and get stuck reading or writing the same sectors until you notice and hour or so later or they simply start reading and writing nonsense.

When I switch to NVME storage I will only use fan cooled PCIE cards like the ones out of Dell and HP servers or fan cooled drive bays for NVME sleds. I will not bury an NVME drive under a GPU on a motherboard even if their is a heat sync and I wish motherboard manufacturers would just dedicate those PCIE lanes to a normal PCIE slot. But this is off topic at this point. So that’s all I will say about that. Maybe I’ll start a thread about this at some point.

I have almost never had airflow across my drives and have had very few drives die on me. Several years ago I also read a study that went very in depth on failure rates and causes of failure on hard drives at Google (several hundred thousand drives involved) and it was found that temperature had almost no effect on failure rate or performance. This was quite a long time ago, but I doubt much has changed in this regard to magnetic drives.

I believe that the seagate drives you are referring to are the same ones I have in my old storage server. Out of 6 drives I had 1 die over the course of about 10 years of running 24x7 with zero airflow and many times in 90+f ambient temperature. My father also had 12 of these drives and only in the past couple of years had gotten a couple of failures. Again little to no airflow and running 24x7 for 10+ years.

I have seen similar in 100’s of desktops where manufacturers like Dell, HP, and Lenovo never seem to bother to put much if any airflow over the drives, but failure rates are still very low and drives commonly seem to last 10+ years.

For SSDs you want to keep the controllers cool, but you want to keep the flash pretty warm so there is a balance there. Generally the right temperature for the flash is not too hot for the processor and if the processor does get too hot it will down clock to keep from overheating. This would normally result in slowing of the drive, but I would point to the video that Linus did on water cooling a PCIe 4.0 NVMe drive where he got no performance improvement out of it as evidence that the heatsinks on those drives are good enough for what they are designed for. I definitely would not be running one of those without a decent heatsink, but being concerned about heat beyond that does not seem to make any difference.

All of this is moot anyways as the rack mount servers have fans that move 10x or more air than you would find in any desktop and that air is all pulled in the front over the drives so it is very unlikely that the drives were much over ambient temperature and if ambient temp was high enough to cause problems then there would be a lot of other issues.

Edit: Also of note if you look at the drive stats over time from Backblaze you would note that the drives over time have gotten more reliable which could explain what you have seen as far as airflow goes if you saw drives failing regularly then switched to having fans blowing over the drives sometime later.

Interesting. Most research I had read stated the opposite but I haven’t read the google research. In any case I have replaced my consumer fans with workstation fans over time because I got tired of spending money on fans that bearely push air at full speed and cost a lot of money and brake a lot when I can buy used workstation fans that quietly push twice as much air at half speed and then just turn them up for less money.

All this talk of drive temperatures… could that be created into a different thread/topic?

Sure heat can definitely kill drives, but the information I have seen indicates that high enough heat to kill the drives would also likely prevent the system from booting because the CPU would be too hot.

The fans in most desktops are pretty bad and getting better fans does help many things, but you would not be using server fans in any desktop unless you really wanted it to sound like a jet engine and require earplugs if you are near it for any length of time. This is several steps above these workstation fans you are referring to in regards to airflow.

I will add that before that research paper that investigated the drives at Google it was generally considered that the drives needed to have cooling and there were even some papers written about it, though I do not remember how scientific any of it was I do know the sample sizes were generally quite small.

If all they are using this for is just to host their 8K RED RAW video files, I don’t know why they would bother with ZFS on Linux (ditching the Linux part altogether, if they really wanted to run ZFS) and run ZFS on Solaris 11.3 instead?

It should be a far more stable ZFS implementation and platform, even with NVMe devices, no?

It should probably be a different thread/topic, I just would prefer that people don’t get the wrong impression about that if seeing a thread that only mentions potential temperature issues.

I would agree if Solaris was not proprietary, expensive, and something very different than Linux or even other Unix systems which would likely necessitate additional training/hiring to manage it.

They want a simple fast and free aside from cost of hardware storage solution. They chose ZFS because it is a business and does still need to be redundant and reliable.

Guys, this is not relevant or helpful to the wiki. Please refrain.

1 Like

Agreed.