I’ve just built a test machine with an MSI Z590 Unify and Intel i7-11700K. The parts were pretty cheap and I loves me a Unify board, so why not? It’s intended as a test machine and intended to only use the iGPU. Plus: homogenous cores for the Linux win! (Alder Lake is working fine with Linux as far as I can tell so this is actually a bit moot.)
So I put it all together - MSI motherboard, Intel CPU, 32GB G.Skill RipJaws 2133/3200 XMP DDR4 RAM (2 x 16), WD SN850X M.2 drives, Arctic Liquid Freezer II 280mm AIO (because Intel), Corsair RM750e PSU, and a Fractal Design case: no glass, no ARGB, no muss, no fuss - loaded Windows (11 - couldn’t get 10 going for reasons that will soon become clear) and a couple of flavors of Linux as is my wont, but mostly Debian 12: it’s pretty much set up to be a testbench for that.
I experienced quite a few problems getting OSs going: I’d get most or all of the way through an installation, only for the screen to blank out or the monitor to lose the HDMI signal from the onboard HDMI port and iGPU. Hmm… I swapped monitors - from UHD to QHD - and even the HDMI cable. The blanking behavior was eye-dentical (you have to imagine the lawyer from My Cousin Vinny saying that).
I pulled the RTX 3060 out of my i9-12900 build and stuck it in the Z590 motherboard, put the HDMI cable in it and everything ran fine - goes through hours of Cinebench, Prime95, FurMark, CPU-Z, etc. stress - with the Dell 4K monitor behaving perfectly.
But if I plug the HDMI cable back in the onboard HDMI port then the behavior with the screen blanking out and the monitor losing the HDMI signal returns.
So it’s a bad board or a bad CPU, obviously, right?
I got on the virtual blower to MSI tech support and went through several rounds, with videos, showing what was going on. The eventual prognosis: CPUs don’t usually fail that way, yeah it’s probably the board, try returning it to the retailer, and later: make a warranty request for the board (that last after i’d ordered another).
I ordered a replacement MSI Z590 Unify motherboard from one’s favorite online retail monopoly (one has not much choice, as with most things in the land of the free) and it arrived yesterday morning.
I put the CPU and cooler on the new motherboard, inserted one stick of RAM and hung a Samsung SSD off a SATA port: the most minimal bootable build I could make with the least changes to the board. The system wouldn’t even POST without a stick of RAM because of the iGPU - it failed at the VGA stage. I then proceeded to boot the machine and thence to install an operating system.
(I have to say: building a PC in a case which is fairly tight on space can be a pain in the derrière, but desconstructing said PC is a whole new realm of annoyance and aggravation. And swearing.)
And… It’s better, but the problem still persists! There’s no periodicity or duration of the screen blanking: it seems entirely random. The machine behaved itself for so long when I first installed Windows 10 - I first tried Debian 12 but it wouldn’t even boot - that I thought the introduction of a new motherboard had indeed fixed the problem. But nay. Nay! Thrice nay. (BTW, Debian 12 wouldn’t boot because of a infinite litany of Intel hardware incompatibility errors: I have to figure out what’s going on there; Fedora and Ubuntu Live installs would boot. Debian 12 was actually installed on the SSD. I then deleted it and installed Windows.)
I booted the machine from cold last night and immediately the screen was blanking out. I looked at the motherboard in its benchtest configuration and noticed that the Debug LEDs which double as a system temperature read-out once booted were showing 18. Celsius (the sensible - and economical (2 seven-segment LEDs instead of 3!) - temperature scale). Ok, it’s pretty chilly in the crib, but the Arctic Liquid Freezer 280mm AIO does a fantastic job: even under stress I’m not getting above 81C in HwInfo64. 81C for an Intel 11th Gen K SKU CPU! A minor miracle.
A random neutrino passed through my brain and a thought was released: it seems that the display misbehavior manifests itself more when the system is not being stressed or under load! In between amazingly annoying screen blank-outs, like trying to catch a glimpse of someone on the other side of a passing freight train, but with less regular intervals in the occasional windows of clarity, I managed to start Prime95. As soon as it was underway the system stabilized. I could use the machine simultaneously to surf the web and just let it run for hours without the screen blanking or losing HDMI signal anymore. Hallelujah? Possibly.
Then I had another brainwave (farts can be waves too, right?) Could it be that the RM750e PSU, a power supply chosen specifically because this was to be a relatively low power system, given that it has an Intel 11th Gen K SKU (at least it’s not the i9), but one that was intended to run without a graphics card, could it be that it wasn’t up to the task? Of providing the right amount of current or voltage to the CPU at low power to keep the iGPU running properly? So I cannibalized my poor i9-12900 build even more by excavating the Corsair RMX850 power supply out of it (it’s going to be a royal bugger getting that PSU back in the case, especially with the excellent job (even though I say it myself) of the cable management I’d achieved: that shiz was tight!).
Long story short: although seemingly improved, the problem still persisted.
(This is the point that I could use a virtual Buildzoid with a digital oscilloscope.)
So, over to your good selves; over to the wisdom of the Level1Tech crowd: what do you think is going on?
It seems like the problem is the CPU, right? We’ve definitively eliminated the MSI Unify motherboard from suspicion. Both Windows - 10 and 11 - and various flavors of Linux think that the machine has 2 GPUs when the 3060 is installed, and one when it’s only the CPU that’s present (obvs). The MSI tech support even said CPUs don’t typically fail in this way.
Have you seen or heard of this kind of behavior before? I searched the interwebs, naturally, but nada. It seems that this problem is sui generis. I’ve replaced everything, except for a stick of RAM, but I doubt that’s causing the problem. I could pull apart my 5950X machine and take a stick out of that but will putting an EXPO stick in an Intel board negatively affect it?! (I’m pretty sure it won’t.) I’m certain it’s not RAM, although the scientific method says I should remove that variable too.
It looks like I’m going to have to try and return the CPU and get another. But apart from this very individualistic behavior, the CPU is absolutely fine. I’m just worried that if it isn’t then the problems may metastasize. Specifically when it’s out of the return or warranty windows! What could possibly be going on, and is there anything else I could do to try and further investigate or eliminate possible reasons for the behavior? Particularly, is there anything in the BIOS settings that I could change? I mean, obviously, I shouldn’t have to: I’ve run through ALL the iGPU-related settings without materially changing the situation. The only things left are CPU OC settings - I’ve run tests with the memory in stock JEDEC and overclocked XMP states - and I doubt there’s anything there that could help. But who knows? (Not me, that’s for sure.) Maybe changing a CPU voltage or timing might suddenly make the iGPU work as well as the rest of the CPU and I wouldn’t have to go through the hassle of trying to justify to the seller why I think they need to take back the CPU.
Thanks for any help! (And for reading my spiel.)