ASRock Rack has created the first AM4 socket server boards, X470D4U, X470D4U2-2T

While that does make sense, I believe the board is throttling and logging high temperature conditions based on this offset temperature measurement. If that had been AMD’s goal they just would have set the thermal limit to like 75°C instead of 95°C.

So it seems that our ECC DRAM issue is fixed know. We replaced the Kingston ECC memory with a new one and we haven’t seen the error anymore.

I also ran a stress test for about 5 hours. The higest temperature reading (Tdie) was 75°C and we don’t get any CPU_PROCHOT events with the Seasonic Focus Gold 450W.

What we found out:
When we put the memory module in A1 or B1 everything works absolutly fine, but in A2 or B2 we only get an 0D error which indicates a missing memory. After placing a non ECC crucial DIMM to A2 or B2 only the board is booting without any issues. That’s very strange and we can’t explain why this is? Is there a difference between Kingston and Samsung ECC memory which will explain this result?

@Waishon

Generally speaking I’ve had far “worse” experiences with Kingston/Crucial/Micron memory modules than Samsung’s, even with Ryzen 3000 where the IMC is said to be far less picky than with Ryzen 1000/2000.

I combine Kingston/Crucial/Micron since it is very likely that Kingston uses Micron DRAM on their modules.

Kingston bins the DRAM chips they buy in bulk from Micron since they also offer higher priced non-ECC overclocking memory modules. That means that the chips that barely pass JEDEC standards get used for standard frequency memory modules - like ECC ones, for example.

Samsung as another DRAM manufacturer does not need to bin the DRAM they use for their own memory modules, meaning there is a chance that you get really high-quality DRAM chips on an ordinary DDR4-2666 ECC memory module.

I might have been lucky with the four Samsung 32 GB DDR4-2666 ECC UDIMMs I got for testing, can run them on the X470D4U and the ASRock X570 Taichi with DDR4-3200 and stock 1.20 V. Haven’t got the time yet to try tighter timings.

Also got two Crucial 16 GB DDR4-2666 ECC UDIMMs (Crucial is Micron’s brand for DIMMs for end customers so there should be the same chance that they don’t bin the DRAM chips) - but no chance to even get stable DDR4-2933 with loose Auto timings out of them on the same motherboards with the same CPUs.

Regarding the issues with your A2/B2 slots, that sounds to me like a signaling issue.

That may be caused by:

  • Damaged/Bent CPU pins
  • Dirt in CPU socket or DIMM slots
  • By chance the traces in the motherboard that lead to these slots are somehow negatively impacted (faulty production, mechanical/thermal stress during shipping et al.)

Can you test a different CPU?

1 Like

Hey… we also do “one like one prayer”.

1 Like

Could test another PSU that doesn’t trigger the CPU_PROCHOT bug at all:

TESTED OK - Seasonic Prime Titanium 750W ATX 2.4 (SSR-750TD)

1 Like

A general detail regarding CPU temperature with BIOS 3.10/BMC 1.60:

With a 3700X the sensor that is listed in the IPMI for the CPU is accurate, meaning it shows the same temperature (rounded) as the Ryzen Master tool.

With the SSR-750TD as a PSU I haven’t had any issues yet. If ASRock Rack fixes the manual DRAM timing settings and restores proper SPD readings to the OS I would be quite happy with the X470D4U now.

1 Like

Good to know. It is looking like I’ll be happy with it too, particularly after another BIOS update or two. No CPU_PROCHOT issues since changing the power supply a few days ago.

unRAID did end up crashing at 18 hours 48 minutes uptime. Nothing about the crash was in the log of course, but unRAID’s Dashboard was showing very atypical CPU usage:

Last CPU Usage reading

I doubt the crashes are the motherboard’s fault at all, because people have been seeing Ryzen/Linux stability problems since Ryzen 1st gen. Lots of systems become stable after changing Power Supply Idle Control to Typical Current Idle or by disabling global C-states for the CPU. I’ve resumed testing the first of these options again, and unRAID uptime is up to 2 days 16 hours 6 minutes. If it crashes again, it had better be soon because I do NOT want to deal with a system that crashes every few weeks or months. Those are hell to troubleshoot because by the time you’ve got any confidence in their stability, the hardware is antiquated. No, I’ll switch to Hyper-V on Windows before I run all my virtual machines on an OS that isn’t stable on my hardware.

FYI: VMware ESXi 6.7 U3 has been released, anyone wanna try it with the X470D4U?

Should be optimized for Zen 2…

1 Like

unRAID appears to be stable on my system now, with the X470D4U and Ryzen 7 1800X CPU and OCZ700MXSP power supply, but only after changing Power Supply Idle Control to Typical Current Idle in the UEFI options. It is at 8 days 16 hours 51 minutes uptime. If it reaches 30 days, I’ll consider it good enough to start migrating VMs from my old ESXi box.

CPU_PROCHOT has not come back either. In fact I have no IPMI log entries at all for the last 9 days :slight_smile:

1 Like

Just noticed Asrock Racks site at the top says Max 32 GB for ram, but under specifications says: Support up to 128GB DDR4 ECC/UDIMM.

It’d be because the 3000 series can support 128GB with double stacked RAM.

Every AM4 Ryzen supports 128 GB total memory with four 32 GB UDIMMs. The only reason that this was not stated in the past is that this requires DRAM chips with 16 Gb dies that only have been available for about a year.

I have been running 128 GB for a little while and haven’t had any stability issues yet.

2 Likes

Are those the corsair LPX 32GB dimms?

Nope, these are Samsung DIMM 32GB, DDR4-2666, CL19-19-19, ECC (M391A4G43MB1-CTD).

(I have an ECC fetish)

2 Likes

so udimms and rdimms have the same physical slot compatibility?

These are UDIMMs, not RDIMMs. As mentioned 32 GB per module (without going RDIMM) has only been possible for about a year.

ECC does not automatically mean RDIMM although presently there are no RDIMMs without ECC functionality. Which makes sense since all platforms that handle RDIMMs usually use ECC (And since Intel is sooo concerned about not confusing customers they cut RDIMM support from some socket 2066 non-Xeon CPUs where running RDIMMs had been possible with ECC disabled).

No Ryzen boots with RDIMMs since AMD disabled that feature in the memory controller of that product line to differentiate it from Epyc.

i gotcha… Some of the google results were showing reg dimm… (i assumed rdimm)… So i am an ass now

2 Likes

Yeah, Samsung could use model numbers that are a liitle easier to differentiate :wink:

Yeah the only way to get those is to order NM-AM4-UxS. Just got my NH-U9S and only has the long ones for AM3/AM4.

1 Like

Can any of you install ESXi 6.7 U3 without getting a PSOD on the X470D4U?

60_psod

Update: BIOS version 3.11 that still seems to be the latest internal lab version also doesn’t fix that behavior.

1 Like