Issues with Gigabyte B550M with Linux (Ubuntu 20.04)

Is there any way to check this other than sensors? Because my guess is that if heating was happening at this point, everything would be hot, the air flowing through the fans, especially. This does not happen and as I showed loking at sensors everything seems fine.

I will let it up and running for a bit and sees if with the new BIOS version it will last more than 3 days. After that I’ll perform a memtest just to rule out faulty RAMs before trying to change the motherboard, thanks for the suggestions.

I don’t know… I agree with you but I can’t help but think it could be a mysterious incompatibility issue between anything and the OS (as 2005 as it sounds lol).
The vendors allegedly tested it for a couple of days before sending it to me and said everything was working fine, but then of course they only use windows for that :roll_eyes:

You’re missing a sensor in sensors. I also run the B550M, which uses the NCT6687D, and isn’t really supported yet in ubuntu/debian kernel ranges. I’ve been able to add the kernel module on this workstation while distro hopping, and it’s worked pretty well. The module is in newer kernels, but “hurl a new kernel at it” is kinda like going fly hunting with an excavator.

Missing sensors means your fancontrol config isn’t working because your pwmconfig isn’t reading the right sensors. Thermal lockup is extremely likely here.

I’ve got it working right now on my debian machine. Here’s the module:

$ lsmod | grep nct
nct6687                28672  0

And here’s sensors:

$ sensors
nct6687-isa-0a20
Adapter: ISA adapter
+12V:           12.12 V  (min = +12.12 V, max = +12.12 V)
+5V:             5.02 V  (min =  +5.02 V, max =  +5.02 V)
+3.3V:           3.38 V  (min =  +0.00 V, max =  +3.38 V)
CPU Soc:         1.03 V  (min =  +1.03 V, max =  +1.03 V)
CPU Vcore:       1.40 V  (min =  +1.39 V, max =  +1.40 V)
CPU 1P8:         1.85 V  (min =  +1.85 V, max =  +1.85 V)
CPU VDDP:        0.00 V  (min =  +0.00 V, max =  +0.00 V)
DRAM:            1.22 V  (min =  +1.21 V, max =  +1.22 V)
Chipset:         1.00 V  (min =  +1.00 V, max =  +1.01 V)
CPU Fan:        696 RPM  (min =  696 RPM, max =  882 RPM)
Pump Fan:      1706 RPM  (min = 1678 RPM, max = 1706 RPM)
System Fan #1: 1100 RPM  (min = 1099 RPM, max = 1105 RPM)
System Fan #2: 1100 RPM  (min = 1100 RPM, max = 1117 RPM)
System Fan #3: 1151 RPM  (min = 1151 RPM, max = 1158 RPM)
System Fan #4:    0 RPM  (min =    0 RPM, max =    0 RPM)
System Fan #5:    0 RPM  (min =    0 RPM, max =    0 RPM)
System Fan #6:    0 RPM  (min =    0 RPM, max =    0 RPM)
CPU:            +47.0°C  (low  = +40.0°C, high = +48.0°C)
System:         +30.0°C  (low  = +30.0°C, high = +30.0°C)
VRM MOS:        +28.0°C  (low  = +28.0°C, high = +28.0°C)
PCH:            +32.0°C  (low  = +32.0°C, high = +32.0°C)
CPU Socket:     +27.0°C  (low  = +27.0°C, high = +27.0°C)
PCIe x1:        +26.0°C  (low  = +26.0°C, high = +26.0°C)
M2_1:            +0.0°C  (low  =  +0.0°C, high =  +0.0°C)

nvme-pci-2100
Adapter: PCI adapter
Composite:    +27.9°C  (low  = -60.1°C, high = +89.8°C)
                       (crit = +94.8°C)

k10temp-pci-00c3
Adapter: PCI adapter
Tctl:         +46.6°C  
Tdie:         +46.6°C  
Tccd1:        +33.0°C  
Tccd2:        +32.2°C 

The install instructions on the github link are pretty good. I’ve hit the few snags you can run into, so let me know how it goes.

Edit: Also, once you get the right sensors reporting in, it’s worthwhile to find a guide on configuring your fans to properly cool the system. The arch wiki guide is really good, but assumes a little bit of knowledge.

2 Likes

Okay, I will try this just to be sure about the temperature. But I am getting this error during installation: “modprobe: ERROR: could not insert ‘nct6687’: No such device”
Has this happened to you? Sorry, maybe it is a stupid mistake from mi side here lol

I am also curious - so you have the same series B550M, using which linux distribution/version?
Have you had this same or a similar issue before (random crashes/freezes)? Or your machine has always worked fine and you were worried about sensors due to some other thing?

Hm, no such device usually means the hardware isn’t there for the kernel module to talk to. I don’t think we need to keep pursuing this module.

I’m using the MSI Mag B550M Mortar Wifi. I ran it for a few months without the module, but since installing it the system has been more stable and quieter.

From your mainboard’s spec page, you’re sporting the iTE® I/O Controller Chip. I’d expect to see something for that listed in sensors. Does it show up if you run sensors-detect? Let’s see if that can find it before we go module hunting.

this one

Trying family `ITE’… Yes

and:

Driver `k10temp’ (autoloaded):

  • Chip `AMD Family 17h thermal sensors’ (confidence: 9)

Mine also finds that one:

AMD Family 17h thermal sensors...                           Success!
    (driver `k10temp')

Unfortunately that’s not the IO controller we’re looking for here. I found one github that claims to have a driver for the IT87 family of iTE I/O controllers, here: GitHub - a1wong/it87 but it claims to be unsupported since 2018.

It might be worthwhile watching or logging the temps you’re seeing on k10temp around the time of your machine locking up. More logging and info will help figure out why it’s behaving that way.

1 Like

Well a 5950X with a lower end board like the Gigabyte B550M DS3H,
isn’t particularly a great combo i suppose.
I mean the vrm on that particular board isn’t really great.

I would advice to monitor the motherboard vrm temperatures.
Just to check if the board isn’t just thermal throttling on heavier workloads.

Yeah, the files aren’t there for download anymore, it seems.
How could you see all your temps as you’ve shown?
Were you using this one before they deleted it?

Can you recommend me a way to do this?
I am familiar with lm-sensors, but it seems it does not show all the temperatures I should be seeing for this board…

edit: glances and hardinfo don’t show any other temp either.

Do you have another suggestion?
I mean, I could try to change it as a last resource or something, I am just not so sure about what I could find here.
But I have seen people using this very same set up with no problem.

Does it have to be a micro atx board?
Or does your particular case also fits a normal atx size board?

Quick update:

  • I noticed that after BIOS updates and all, the machine would simply enter power save at random when using some specific libraries (I run some simulations on it). Power save option was off everywhere possible (OS configs, BIOS). It was always somewhere around 2h running that this would happen. For the first time, something popped up in journal (syslog) and it just seemed like a general failure all at once. Other than that, it managed to stay on for ~14 days without any problem of the sort.

  • A bunch of mem tests were performed and no issue appeared. RAM sticks were also removed, cleaned, and installed back.

  • I could manage to test a Ryzen 5 with the same board instead. Using less cores but with the same libraries and stuff, it worked fine. (we wanted to rule out the possibility of a faulty CPU).

  • I’ve also opened it up, and let it run a while without lateral cases, and this had made me see (qualitatively speaking, could not come up with any means to measure temp) that the VRM area indeed gets a bit hotter with the Ryzen 9 than with Ryzen 5 even when using only 8 cores of both. I haven’t checked yet right before it crashes using all 16 cores at once.

Anyway, all things considered, I am actually towards believing that, as some have said, this board cannot handle stressed (or any heavy work load beyond OS basic stuff) Ryzen 9.

This topic was automatically closed 273 days after the last reply. New replies are no longer allowed.