Yeah def keep your BIOS updated for latest microcode patches (which may continue to come out).
If you haven’t booted yet, pretty sure your mobo supports flash back to update BIOS before actually fully booting. Usually format a usb drive FAT32 and rename the bios file to msi.rom
or whatever the manual says then push the magic button.
In theory the new Intel Recommended Defaults
should no longer cause degradation due to over voltage. So yeah just setup XMP for your RAM and you should be fine.
Some folks seem to undervolt just a touch as well, as honestly the chip runs hot (my old one would cap out at 100 deg C and throttle regularly with a 240mm AIO).
When doing bursty compiles in Linux e.g. make -j$(nproc)
, a degraded chip can lock-up or throw compiler segfaults. Compiling llama.cpp
would fail almost a quarter of the time on my degraded chip; oddly not while all cores were at 100%, but when most cores were idle and a couple cores would boost up in the middle of the compilation…
If something seems fishy, e.g. code that used to compile no longer compiles, check the output of your kernel logs with sudo dmesg -T
:
[Sat Jul 6 18:41:44 2024] mce: [Hardware Error]: CPU 9: Machine Check: 0 Bank 0: 8000004000050005
[Sat Jul 6 18:41:44 2024] mce: [Hardware Error]: TSC 578e290d131
[Sat Jul 6 18:41:44 2024] mce: [Hardware Error]: PROCESSOR 0:b0671 TIME 1720305704 SOCKET 0 APIC 21 microcode 123
[Sat Jul 6 18:41:44 2024] mce: [Hardware Error]: Machine check events logged
Also might see errors while inferencing AI LLMs like:
[Fri Jul 26 12:44:08 2024] llama-server[13422]: segfault at 55 ip 00007bc3ad0b7d55 sp 00007ffc1ab1ffa0 error 4 in libc.so.6[7bc3ad038000+16c000] likely on CPU 4 (core 8, socket 0)
[Fri Jul 26 12:44:08 2024] Code: e8 00 0d f9 ff f3 0f 1e fa 48 85 ff 0f 84 d3 00 00 00 55 48 89 e5 41 55 4c 8d 6f f0 41 54 53 48 83 ec 18 48 8b 1d bb df 13 00 <48> 8b 47 f8 64 44 8b 23 a8 02 75 5f 48 8b 15 38 df 13 00 64 48 83
[Fri Jul 26 12:44:18 2024] llama-server[13495]: segfault at 55 ip 000076091928ad55 sp 00007ffdef195540 error 4 in libc.so.6[76091920b000+16c000] likely on CPU 31 (core 47, socket 0)
[Fri Jul 26 12:44:18 2024] Code: e8 00 0d f9 ff f3 0f 1e fa 48 85 ff 0f 84 d3 00 00 00 55 48 89 e5 41 55 4c 8d 6f f0 41 54 53 48 83 ec 18 48 8b 1d bb df 13 00 <48> 8b 47 f8 64 44 8b 23 a8 02 75 5f 48 8b 15 38 df 13 00 64 48 83
This would be a canary in the coal mine for degradation. As xyz says though, they should RMA under extended warranty if the new microcode doesn’t fix it. Time will tell!
Enjoy the blazing fast low core count performance though!