so around black friday I built my new computer with a ryzen 5 1600 and an asrock ab350m pro4. This is a VFIO build with an RX 580 for the host and a GTX 1070 for the guest. back then I posted on here about some bizarre issues where the system would black screen crash in certain circumstances when IOMMU was enabled in the kernel flags. Examples of things that would cause a crash were
Calling lspci after the system had warmed up
starting up the computer after it had warmed up.
taking a load off the system (stopping a stress test or game)
Note that this all would stop happening with IOMMU=off. After a bunch of head scratching and testing I found out I had a defective cpu that was also failing kill-ryzen. Upon receiving my warranty replacement CPU from AMD all of my issues disappeared. cut to two weeks ago, I purchased an R7 2700X to upgrade. It seemed fine at first but I was having stability issues that I couldn’t seem to resolve and then I realized that I was getting nearly the exact same black screen issue I had with my original 1600. knowing that, I tried starting up my system with IOMMU disabled in the kernel flags and lo and behold, the system starts up fine. Did I really manage to get another defective cpu on a new model that is supposed to have this stuff fixed.
Note I haven’t been able to test kill-ryzen but I doubt it’s going to fail that. is it really another defective cpu or somehow something else?
Ok so an update. I let the system rest so I could boot with IOMMU enabled. I ran kill ryzen once with it on and once with it off. It passed running for over an hour both times. However, I noticed that the system runs 10-15 degrees hotter (85~92C on, 70~82C off) with IOMMU enabled which to my knowledge isn’t supposed to happen. I don’t know if this is enough for me to get an RMA approved. I could really use some advice.
That’s way too high. You don’t want it to be higher than 70*C under any circumstances. I had the black screen problem as well every time I quit a game or the system load dropped to minimum. In my case the problem was bad GPU driver. Newer driver doesn’t mean better.
keep in mind that is reported temps before adjusting for the offset since I don’t know for sure what the offset is. also It only happens when IOMMU is enabled in linux. if it isn’t then I don’t get the black screen crash.
it’s a 2700X and apparently it’s supposed to have a 10C offset.
I let my system rest and found a reliable test method to quickly cause the startup crash to occur.
(everything in bios is stock settings except I enabled virtualization. I am toggling iommu with kernel flags)
with IOMMU enabled I ran valley in my vm (could have probably run it in host too but w/e) for about 10 minutes. once it was good and warmed up I shut down the vm and restarted the system with iommu enabled again; between the boot manager and my desktop loading the screen goes black and goes to sleep with hdd status stuck on(crash). I immediately restart again and run with iommu off; system boots normal. I restart one more time and load with iommu enabled again just to confirm; same result as first time.
I have now also done this test with my known good 1600 from amd rma. following the exact same steps the system boots as normal under all tests. just to push it I even tried it under an oc of 3.9 at 1.4v and it still didn’t have any issues with starting up after stress testing. So at this point I am left to conclude it is in fact the cpu. I’ve messaged amd support so we’ll see how that goes.
That could also be the cause of your problem especially with a 2700X.
i´m definitelly not suprised that you run into stabillity issues.
Because the vrm of that motherboard isnt decent enough for a 2700X really.
It will work, but the vrm will also run hot since its pushing it.
The vrm on your particular board is perfectlly capable to handle a overclocked 1600.
But a 2700X is a different story doe.
The vrm will be stressed allot more with the 2700X,
And if not cooled properlly it will run hot like crazy.
Because the vrm itself isnt really the strongest on your particular board.
According to these benchmarks my 1600 is probably pulling exactly the same amount of power especially considering it’s limited by the temperature not the voltage when on the stock cooler
Not sure what they exally tested.
But a 2700X on full load will draw allot more power then a 1600X would.
Or lets say it will stress and heat up the vrm allot more.
The vrm on your particular board is okay for a Ryzen 5 sku.
But for a 2700X at its max potential will really push your board.
I´m personally wouldnt be suprised that the board is just throttling.
If you have a fan laying arround, you might put that onto the vrm area, and see if that helps.
Of course i’m not saying that your particular cpu´s couldnt be defective.
But i’m just saying that the board could also be the culprit.
The vrm on your particular board is a 3+3 phase vrm.
It looks like a 6+3 phase vrm but it trully isnt,
because the ISL95712 pwm used to controll the vrm on your board is a 4+3 phase pwm, but only includes 2+1 intergrated gate drivers.
So that means that you cannot properlly double the phase count of this particular pwm.
So its not possible to create a 6+3 phase out of this pwm.
If you look closelly to vrm you will see 1 additional driver on the vcore vrm and 2 additional drivers on the SOC vrm.
So this means that your board basiclly is just a 3+3 phase.
What Asrock basiclly has done is doubled up the components on each phase.
The mosfets on your particular board are 2x Sinopower SM4337 for the highside and 2x SM4336 for the low side per phase.
So this basiclly means that the board has a max current capabillity of about 156A ish.
A 2700X at its max potential we sorta talk about 125A ish current draw,
and on normal usage circumstances about 110A ish give or take.
So that is kinda pushing, so it could definitelly use some active airflow on the vrm area there.
Of course the system wont be on full stress all day.
So if you dont go to crazy on your overclocks it should kinda work,
if you cool down the vrm properly, but the heatsinks on that board arent great either.
So yeah i personally would not really feel comfortable about it.
I would definitelly recommend a decent X370 or X470 board for a Ryzen 8 core sku, especially a 2700X.
Still i´m not saying that your cpu couldnt be defective.
Or that your particular issue is related to this.
I’m just trying to point out that your board the Arock B350M pro4 paired with a 2700X,
isnt the greatest combination so keep that in mind aswell.
You could allways try to update your bios to the newest version if you didnt do that allready.
I don’t remember if he discussed current draw, but The Stilt had a post about how the 2700X draws more power than its TDP rating would imply. The whole post is interesting, but here’s a sample:
At stock, the CPU is allowed to consume >= 141.75W of power and more importantly, that is a sustainable limit and not a short-term burst like of limit as on Intel CPUs (PL1 vs. PL2).
Personally, I think that AMD should have rated these CPUs for 140W TDP instead of the 105W rating they ended up with. The R7 2700X is the first AMD CPU I’ve ever witnessed to consume more power than its advertised power rating. And honestly, I don’t like the fact one bit. Similar practices are being exercised on Ryzen Mobile line-up, however with one major difference: The higher than advertised (e.g. 25W boost on 15W rated SKUs) power limit is not sustainable, but instead a short-term limit like on Intel CPUs. The way I see it, either these CPUs should have been rated for 140W from the get-go, or alternatively the 141.75W power limit should have been a short-term one and the advertised 105W figure a sustained one.
[…]
Since 105W TDP rated Pinnacle Ridge CPUs are allowed to sustain >= 141.75W of power draw, and more importantly because at stock they do consume significantly more than the rated 105W even in real world multithreaded workloads, their advertised power rating in my opinion is not entirely fair and might end up misleading the consumers. The measured sustained power consumption for a stock 2700X was 127.64W (132W peak) during X264 encoding and 142.52W (146.5W peak) during Prime95 28.10.
Yeah the problems is that TDP (thermal design power) isnt the same thing as TPD (total power draw).
Allthough they are somewhat related, but its definitelly not the same.
Those terms are getting mixed up allot.
Yeah, that is a point of confusion. I suspect The Stilt knows what he’s talking about, but it doesn’t help to mix terms. He later mentions the Prism heatsink too, so… I dunno.
I would think all of the power drawn by the CPU would eventually be dissipated as heat. The TDP seems to be a rating for an averaged thermal dissipation needed on a given processor based on some “realistic” usage pattern.
There doesn’t seem to be a standardized rule for assessing TDP. So my guess is that by “wiggling” this rating, it can make devices seem more or less efficient, impacting consumer choices.