Linux stability issues

Morning,

I’ve been using Linux on a daily basis for now half a year. So far, it has been a painless experience. However, since about 1 month, it has become unusable. My OS boots correctly, however, after 5/10 minutes of using it, the screen freezes for about 10/15 seconds and then the computer reboots.

At first, I thought it was a hardware related issue but I have a dual boot with Windows 11 and it works flawlessly.

So far, I haven’t tried many things because to be honest, I don’t really know what to look for.

  • I ran memtest to ensure that the issue wasn’t related to the RAM. No issue on that side
  • I checked the kernel logs and the dmesg logs and I didn’t see anything probing. Maybe, I could attach the logs if you want to take a look at them.
  • I tried updating my GPU drivers but I kept having the same issue.
  • Updating my BIOS

My config is the following :

  • CPU : Ryzen 5950X
  • GPU : GTX 1060 6GB
  • RAM : 2 x 16Go
  • PSU :
  • Motherboard : ASUS TUF GAMING B550-PLUS WIFI II

I tried two flavors of Ubuntu with the same results :

XUbuntu 22.04, kernel 5.15.0, nvidia driver 525
KUbuntu 22.04, kernel 5.15.0, nvidia driver 535.54.03

On the Windows side (which works) : Windows 11 22H2, build 22621.1992 and Nvidia driver version 528.02

I hope someone can help me !
Thanks :slight_smile:

Have you tried using an older kernel version or older nvidia driver version? Do you have btrfs/timeshift setup? Can you roll back to a previous known-good date and see if that fixes the problem?

…or just try removing the nVidia card? You will likely benefit from a newer kernel though.

Can you post kernel logs from the previous boot that crashed? On a systemd system, journalctl --list-boots to get the boot offset, then journalctl -k -b <boot_offset> should do. You may want to see if one of them contains the RIP: line near the end of the log.

Afer posting my message, I’ve tried xserver-xorg-video-nvidia and I’ve been able to watch a YouTube video for 2 hours without any stability issue. However, when I use this driver, my whole interface feels laggy and I’m not willing to keep using it.

After that, I tried switching back to nvidia-driver-535 and I could not even reach my desktop. The login screen would show up but as soon as I logged in, it crashed. I had to switch the console to go back to nvidia-driver-390 to be able to login.
However, I still have the same stability issue and the system crashed after a little while.

What do you mean ? Correct me if I’m wrong but my CPU doesn’t have an IGP so it can’t work !?

Ahh, you’re correct! I read that as 7950X, I apologize for the noise.

1 Like

Does a livecd environment work with nvidia drivers? I believe Manjaro has the option to boot nonfree drivers in the livecd.

Here are the logs for the 2 last crashes:

Crash 1

SHORTEN THE BOOT TO FIT IN THE CHARACTER LIMIT FOR A MESSAGE

juil. 30 08:51:05 REDACTED kernel: IPv6: ADDRCONF(NETDEV_CHANGE): enp7s0: link becomes ready
juil. 30 08:51:08 REDACTED kernel: kauditd_printk_skb: 91 callbacks suppressed
juil. 30 08:51:08 REDACTED kernel: audit: type=1400 audit(1690699868.246:103): apparmor=“ALLOWED” operation=“open” profile=“/usr/sbin/sssd” name=“/proc/2592/cmdline” pid=2074 comm=“sssd_nss” requested_mask=“r” denied_mask=“r” fsuid=0 ouid=0
juil. 30 08:51:08 REDACTED kernel: audit: type=1400 audit(1690699868.246:104): apparmor=“ALLOWED” operation=“open” profile=“/usr/sbin/sssd” name=“/proc/2593/cmdline” pid=2074 comm=“sssd_nss” requested_mask=“r” denied_mask=“r” fsuid=0 ouid=0
juil. 30 08:51:08 REDACTED kernel: audit: type=1400 audit(1690699868.246:105): apparmor=“ALLOWED” operation=“open” profile=“/usr/sbin/sssd” name=“/proc/2597/cmdline” pid=2074 comm=“sssd_nss” requested_mask=“r” denied_mask=“r” fsuid=0 ouid=0
juil. 30 08:51:08 REDACTED kernel: audit: type=1400 audit(1690699868.250:106): apparmor=“ALLOWED” operation=“open” profile=“/usr/sbin/sssd” name=“/proc/2591/cmdline” pid=2074 comm=“sssd_nss” requested_mask=“r” denied_mask=“r” fsuid=0 ouid=0
juil. 30 08:51:08 REDACTED kernel: audit: type=1400 audit(1690699868.250:107): apparmor=“ALLOWED” operation=“open” profile=“/usr/sbin/sssd” name=“/proc/2603/cmdline” pid=2074 comm=“sssd_nss” requested_mask=“r” denied_mask=“r” fsuid=0 ouid=0
juil. 30 08:51:08 REDACTED kernel: audit: type=1400 audit(1690699868.290:108): apparmor=“ALLOWED” operation=“open” profile=“/usr/sbin/sssd” name=“/proc/2590/cmdline” pid=2074 comm=“sssd_nss” requested_mask=“r” denied_mask=“r” fsuid=0 ouid=0
juil. 30 08:51:08 REDACTED kernel: audit: type=1400 audit(1690699868.326:109): apparmor=“STATUS” operation=“profile_load” profile=“unconfined” name=“docker-default” pid=2682 comm=“apparmor_parser”
juil. 30 08:51:08 REDACTED kernel: NFSD: Using nfsdcld client tracking operations.
juil. 30 08:51:08 REDACTED kernel: NFSD: no clients to reclaim, skipping NFSv4 grace period (net f0000000)
juil. 30 08:51:08 REDACTED kernel: Bridge firewalling registered
juil. 30 08:51:08 REDACTED kernel: Initializing XFRM netlink socket
juil. 30 08:51:14 REDACTED kernel: audit: type=1400 audit(1690699874.078:110): apparmor=“DENIED” operation=“capable” profile=“/snap/snapd/19457/usr/lib/snapd/snap-confine” pid=1854 comm=“snap-confine” capability=12 capname=“net_admin”
juil. 30 08:51:14 REDACTED kernel: audit: type=1400 audit(1690699874.078:111): apparmor=“DENIED” operation=“capable” profile=“/snap/snapd/19457/usr/lib/snapd/snap-confine” pid=1854 comm=“snap-confine” capability=38 capname=“perfmon”
juil. 30 08:51:17 REDACTED kernel: loop28: detected capacity change from 0 to 8
juil. 30 08:51:19 REDACTED kernel: audit: type=1400 audit(1690699879.766:112): apparmor=“ALLOWED” operation=“open” profile=“/usr/sbin/sssd” name=“/proc/3317/cmdline” pid=2074 comm=“sssd_nss” requested_mask=“r” denied_mask=“r” fsuid=0 ouid=0
juil. 30 08:51:19 REDACTED kernel: audit: type=1400 audit(1690699879.766:113): apparmor=“ALLOWED” operation=“open” profile=“/usr/sbin/sssd” name=“/proc/3317/cmdline” pid=2076 comm=“sssd_pam” requested_mask=“r” denied_mask=“r” fsuid=0 ouid=0
juil. 30 08:51:19 REDACTED kernel: audit: type=1400 audit(1690699879.822:114): apparmor=“ALLOWED” operation=“open” profile=“/usr/sbin/sssd” name=“/proc/3319/cmdline” pid=2074 comm=“sssd_nss” requested_mask=“r” denied_mask=“r” fsuid=0 ouid=0
juil. 30 08:51:20 REDACTED kernel: audit: type=1400 audit(1690699879.998:115): apparmor=“ALLOWED” operation=“open” profile=“/usr/sbin/sssd” name=“/proc/3332/cmdline” pid=2074 comm=“sssd_nss” requested_mask=“r” denied_mask=“r” fsuid=0 ouid=1000
juil. 30 08:51:20 REDACTED kernel: audit: type=1400 audit(1690699880.018:116): apparmor=“ALLOWED” operation=“open” profile=“/usr/sbin/sssd” name=“/proc/3336/cmdline” pid=2074 comm=“sssd_nss” requested_mask=“r” denied_mask=“r” fsuid=0 ouid=0
juil. 30 08:51:21 REDACTED kernel: audit: type=1400 audit(1690699881.118:117): apparmor=“DENIED” operation=“capable” profile=“/usr/sbin/cups-browsed” pid=2589 comm=“cups-browsed” capability=23 capname=“sys_nice”
juil. 30 08:51:21 REDACTED kernel: audit: type=1400 audit(1690699881.266:118): apparmor=“ALLOWED” operation=“open” profile=“/usr/sbin/sssd” name=“/proc/3672/cmdline” pid=2074 comm=“sssd_nss” requested_mask=“r” denied_mask=“r” fsuid=0 ouid=0
juil. 30 08:51:21 REDACTED kernel: audit: type=1400 audit(1690699881.274:119): apparmor=“ALLOWED” operation=“open” profile=“/usr/sbin/sssd” name=“/proc/3679/cmdline” pid=2074 comm=“sssd_nss” requested_mask=“r” denied_mask=“r” fsuid=0 ouid=0
juil. 30 08:51:21 REDACTED kernel: audit: type=1400 audit(1690699881.286:120): apparmor=“ALLOWED” operation=“open” profile=“/usr/sbin/sssd” name=“/proc/3684/cmdline” pid=2074 comm=“sssd_nss” requested_mask=“r” denied_mask=“r” fsuid=0 ouid=0
juil. 30 08:51:21 REDACTED kernel: audit: type=1400 audit(1690699881.338:121): apparmor=“DENIED” operation=“capable” profile=“/snap/snapd/19457/usr/lib/snapd/snap-confine” pid=3695 comm=“snap-confine” capability=12 capname=“net_admin”
juil. 30 08:51:22 REDACTED kernel: ntfs3: Max link count 4000
juil. 30 08:51:22 REDACTED kernel: ntfs3: Enabled Linux POSIX ACLs support
juil. 30 08:51:22 REDACTED kernel: ntfs3: Read-only LZX/Xpress compression included
juil. 30 08:51:22 REDACTED kernel: ntfs3: Unknown parameter ‘windows_names’
juil. 30 08:51:22 REDACTED kernel: ntfs3: Unknown parameter ‘windows_names’
juil. 30 08:51:22 REDACTED kernel: ISO 9660 Extensions: Microsoft Joliet Level 3
juil. 30 08:51:22 REDACTED kernel: ISO 9660 Extensions: Microsoft Joliet Level 3
juil. 30 08:51:22 REDACTED kernel: ISO 9660 Extensions: RRIP_1991A
juil. 30 08:51:22 REDACTED kernel: ntfs3: Unknown parameter ‘windows_names’
juil. 30 08:51:30 REDACTED kernel: kauditd_printk_skb: 9 callbacks suppressed
juil. 30 08:51:30 REDACTED kernel: audit: type=1107 audit(1690699890.126:130): pid=1782 uid=102 auid=4294967295 ses=4294967295 subj=unconfined msg=‘apparmor=“DENIED” operation=“dbus_signal” bus=“system” path=“/org/freedesktop/login1” interface=“org.freedesktop.login1.Manager” member=“UserRemoved” name=“:1.34” mask=“receive” pid=3695 label=“snap.firefox.firefox” peer_pid=2125 peer_label=“unconfined”
exe=“/usr/bin/dbus-daemon” sauid=102 hostname=? addr=? terminal=?’
juil. 30 08:51:35 REDACTED kernel: usb 3-4.1: new full-speed USB device number 11 using xhci_hcd
juil. 30 08:51:35 REDACTED kernel: usb 3-4.1: New USB device found, idVendor=046d, idProduct=c07e, bcdDevice=90.03
juil. 30 08:51:35 REDACTED kernel: usb 3-4.1: New USB device strings: Mfr=1, Product=2, SerialNumber=3
juil. 30 08:51:35 REDACTED kernel: usb 3-4.1: Product: Gaming Mouse G402
juil. 30 08:51:35 REDACTED kernel: usb 3-4.1: Manufacturer: Logitech
juil. 30 08:51:35 REDACTED kernel: usb 3-4.1: SerialNumber: 499038683237
juil. 30 08:51:35 REDACTED kernel: input: Logitech Gaming Mouse G402 as /devices/pci0000:00/0000:00:08.1/0000:0a:00.3/usb3/3-4/3-4.1/3-4.1:1.0/0003:046D:C07E.0006/input/input25
juil. 30 08:51:35 REDACTED kernel: hid-generic 0003:046D:C07E.0006: input,hidraw5: USB HID v1.11 Mouse [Logitech Gaming Mouse G402] on usb-0000:0a:00.3-4.1/input0
juil. 30 08:51:35 REDACTED kernel: input: Logitech Gaming Mouse G402 Keyboard as /devices/pci0000:00/0000:00:08.1/0000:0a:00.3/usb3/3-4/3-4.1/3-4.1:1.1/0003:046D:C07E.0007/input/input26
juil. 30 08:51:35 REDACTED kernel: hid-generic 0003:046D:C07E.0007: input,hiddev3,hidraw6: USB HID v1.11 Keyboard [Logitech Gaming Mouse G402] on usb-0000:0a:00.3-4.1/input1
juil. 30 08:56:19 REDACTED kernel: audit: type=1400 audit(1690700179.972:131): apparmor=“ALLOWED” operation=“open” profile=“/usr/sbin/sssd” name=“/proc/1794/cmdline” pid=2074 comm=“sssd_nss” requested_mask=“r” denied_mask=“r” fsuid=0 ouid=0
juil. 30 09:09:07 REDACTED kernel: audit: type=1326 audit(1690700947.939:132): auid=1000 uid=1000 gid=1000 ses=3 subj=snap.firefox.firefox pid=7421 comm=“firefox” exe=“/snap/firefox/2760/usr/lib/firefox/firefox” sig=0 arch=c000003e syscall=314 compat=0 ip=0x7f822029673d code=0x50000
juil. 30 09:09:08 REDACTED kernel: audit: type=1400 audit(1690700948.391:133): apparmor=“ALLOWED” operation=“open” profile=“/usr/sbin/sssd” name=“/proc/1794/cmdline” pid=2074 comm=“sssd_nss” requested_mask=“r” denied_mask=“r” fsuid=0 ouid=0
juil. 30 09:11:18 REDACTED kernel: audit: type=1400 audit(1690701078.330:134): apparmor=“DENIED” operation=“capable” profile=“/snap/snapd/19457/usr/lib/snapd/snap-confine” pid=8447 comm=“snap-confine” capability=12 capname=“net_admin”
juil. 30 09:11:18 REDACTED kernel: audit: type=1400 audit(1690701078.330:135): apparmor=“DENIED” operation=“capable” profile=“/snap/snapd/19457/usr/lib/snapd/snap-confine” pid=8447 comm=“snap-confine” capability=38 capname=“perfmon”
juil. 30 09:12:36 REDACTED kernel: audit: type=1400 audit(1690701156.273:136): apparmor=“ALLOWED” operation=“open” profile=“/usr/sbin/sssd” name=“/proc/8589/cmdline” pid=2074 comm=“sssd_nss” requested_mask=“r” denied_mask=“r” fsuid=0 ouid=1000

Crash 2

SHORTEN THE BOOT TO FIT IN THE CHARACTER LIMIT FOR A MESSAGE

juil. 29 15:15:16 REDACTED kernel: IPv6: ADDRCONF(NETDEV_CHANGE): enp7s0: link becomes ready
juil. 29 15:15:16 REDACTED kernel: NFSD: Using nfsdcld client tracking operations.
juil. 29 15:15:16 REDACTED kernel: NFSD: no clients to reclaim, skipping NFSv4 grace period (net f0000000)
juil. 29 15:15:17 REDACTED kernel: Bridge firewalling registered
juil. 29 15:15:17 REDACTED kernel: Initializing XFRM netlink socket
juil. 29 15:15:24 REDACTED kernel: kauditd_printk_skb: 98 callbacks suppressed
juil. 29 15:15:24 REDACTED kernel: audit: type=1400 audit(1690636524.811:110): apparmor=“ALLOWED” operation=“open” profile=“/usr/sbin/sssd” name=“/proc/3137/cmdline” pid=2106 comm=“sssd_nss” requested_mask=“r” denied_mask=“r” fsuid=0 ouid=0
juil. 29 15:15:24 REDACTED kernel: audit: type=1400 audit(1690636524.811:111): apparmor=“ALLOWED” operation=“open” profile=“/usr/sbin/sssd” name=“/proc/3137/cmdline” pid=2107 comm=“sssd_pam” requested_mask=“r” denied_mask=“r” fsuid=0 ouid=0
juil. 29 15:15:24 REDACTED kernel: audit: type=1400 audit(1690636524.867:112): apparmor=“ALLOWED” operation=“open” profile=“/usr/sbin/sssd” name=“/proc/3139/cmdline” pid=2106 comm=“sssd_nss” requested_mask=“r” denied_mask=“r” fsuid=0 ouid=0
juil. 29 15:15:25 REDACTED kernel: audit: type=1400 audit(1690636525.043:113): apparmor=“ALLOWED” operation=“open” profile=“/usr/sbin/sssd” name=“/proc/3152/cmdline” pid=2106 comm=“sssd_nss” requested_mask=“r” denied_mask=“r” fsuid=0 ouid=1000
juil. 29 15:15:25 REDACTED kernel: audit: type=1400 audit(1690636525.067:114): apparmor=“ALLOWED” operation=“open” profile=“/usr/sbin/sssd” name=“/proc/3156/cmdline” pid=2106 comm=“sssd_nss” requested_mask=“r” denied_mask=“r” fsuid=0 ouid=0
juil. 29 15:15:25 REDACTED kernel: audit: type=1400 audit(1690636525.639:115): apparmor=“DENIED” operation=“capable” profile=“/snap/snapd/19457/usr/lib/snapd/snap-confine” pid=1893 comm=“snap-confine” capability=12 capname=“net_admin”
juil. 29 15:15:25 REDACTED kernel: audit: type=1400 audit(1690636525.639:116): apparmor=“DENIED” operation=“capable” profile=“/snap/snapd/19457/usr/lib/snapd/snap-confine” pid=1893 comm=“snap-confine” capability=38 capname=“perfmon”
juil. 29 15:15:26 REDACTED kernel: audit: type=1400 audit(1690636526.375:117): apparmor=“ALLOWED” operation=“open” profile=“/usr/sbin/sssd” name=“/proc/3502/cmdline” pid=2106 comm=“sssd_nss” requested_mask=“r” denied_mask=“r” fsuid=0 ouid=0
juil. 29 15:15:26 REDACTED kernel: audit: type=1400 audit(1690636526.383:118): apparmor=“ALLOWED” operation=“open” profile=“/usr/sbin/sssd” name=“/proc/3509/cmdline” pid=2106 comm=“sssd_nss” requested_mask=“r” denied_mask=“r” fsuid=0 ouid=0
juil. 29 15:15:26 REDACTED kernel: audit: type=1400 audit(1690636526.387:119): apparmor=“ALLOWED” operation=“open” profile=“/usr/sbin/sssd” name=“/proc/3514/cmdline” pid=2106 comm=“sssd_nss” requested_mask=“r” denied_mask=“r” fsuid=0 ouid=0
juil. 29 15:15:27 REDACTED kernel: ntfs3: Max link count 4000
juil. 29 15:15:27 REDACTED kernel: ntfs3: Enabled Linux POSIX ACLs support
juil. 29 15:15:27 REDACTED kernel: ntfs3: Read-only LZX/Xpress compression included
juil. 29 15:15:27 REDACTED kernel: ntfs3: Unknown parameter ‘windows_names’
juil. 29 15:15:27 REDACTED kernel: ntfs3: Unknown parameter ‘windows_names’
juil. 29 15:15:27 REDACTED kernel: ISO 9660 Extensions: Microsoft Joliet Level 3
juil. 29 15:15:27 REDACTED kernel: ISO 9660 Extensions: Microsoft Joliet Level 3
juil. 29 15:15:27 REDACTED kernel: ISO 9660 Extensions: RRIP_1991A
juil. 29 15:15:27 REDACTED kernel: ntfs3: Unknown parameter ‘windows_names’
juil. 29 15:15:28 REDACTED kernel: loop28: detected capacity change from 0 to 8
juil. 29 15:15:31 REDACTED kernel: kauditd_printk_skb: 9 callbacks suppressed
juil. 29 15:15:31 REDACTED kernel: audit: type=1326 audit(1690636531.387:129): auid=4294967295 uid=0 gid=0 ses=4294967295 subj=snap.cups.cupsd pid=3899 comm=“cups-proxyd” exe=“/snap/cups/980/sbin/cups-proxyd” sig=0 arch=c000003e syscall=314 compat=0 ip=0x7ff2258ada3d code=0x50000
juil. 29 15:15:31 REDACTED kernel: cups-proxyd[3899]: segfault at 18 ip 0000563c50fdbd75 sp 00007ffc92c6d240 error 4 in cups-proxyd[563c50fd8000+7000]
juil. 29 15:15:31 REDACTED kernel: Code: 83 3d ee b2 00 00 00 41 54 55 48 89 fd 53 0f 85 f4 00 00 00 48 8d 1d 69 3d 00 00 48 63 45 1c 48 89 df 48 c1 e0 05 48 03 45 08 <48> 8b 50 18 8b 70 14 e8 0f d0 ff ff 44 8b 65 18 48 89 c7 45 85 e4
juil. 29 15:15:35 REDACTED kernel: audit: type=1107 audit(1690636535.219:130): pid=1809 uid=102 auid=4294967295 ses=4294967295 subj=unconfined msg=‘apparmor=“DENIED” operation=“dbus_signal” bus=“system” path=“/org/freedesktop/login1” interface=“org.freedesktop.login1.Manager” member=“UserRemoved” name=“:1.32” mask=“receive” pid=3525 label=“snap.firefox.firefox” peer_pid=2159 peer_label=“unconfined”
exe=“/usr/bin/dbus-daemon” sauid=102 hostname=? addr=? terminal=?’
juil. 29 15:15:52 REDACTED kernel: audit: type=1400 audit(1690636552.839:131): apparmor=“ALLOWED” operation=“open” profile=“/usr/sbin/sssd” name=“/proc/4609/cmdline” pid=2106 comm=“sssd_nss” requested_mask=“r” denied_mask=“r” fsuid=0 ouid=0
juil. 29 15:16:15 REDACTED kernel: audit: type=1400 audit(1690636575.159:132): apparmor=“ALLOWED” operation=“open” profile=“/usr/sbin/sssd” name=“/proc/4634/cmdline” pid=2106 comm=“sssd_nss” requested_mask=“r” denied_mask=“r” fsuid=0 ouid=0
juil. 29 15:17:01 REDACTED kernel: audit: type=1400 audit(1690636621.723:133): apparmor=“ALLOWED” operation=“open” profile=“/usr/sbin/sssd” name=“/proc/4650/cmdline” pid=2106 comm=“sssd_nss” requested_mask=“r” denied_mask=“r” fsuid=0 ouid=0
juil. 29 15:17:01 REDACTED kernel: audit: type=1400 audit(1690636621.723:134): apparmor=“ALLOWED” operation=“open” profile=“/usr/sbin/sssd” name=“/proc/4651/cmdline” pid=2106 comm=“sssd_nss” requested_mask=“r” denied_mask=“r” fsuid=0 ouid=0
juil. 29 15:17:03 REDACTED kernel: audit: type=1400 audit(1690636623.351:135): apparmor=“ALLOWED” operation=“open” profile=“/usr/sbin/sssd” name=“/proc/4657/cmdline” pid=2106 comm=“sssd_nss” requested_mask=“r” denied_mask=“r” fsuid=0 ouid=0
juil. 29 15:17:25 REDACTED kernel: audit: type=1400 audit(1690636645.479:136): apparmor=“ALLOWED” operation=“open” profile=“/usr/sbin/sssd” name=“/proc/4668/cmdline” pid=2106 comm=“sssd_nss” requested_mask=“r” denied_mask=“r” fsuid=0 ouid=0
juil. 29 15:17:48 REDACTED kernel: audit: type=1400 audit(1690636668.503:137): apparmor=“ALLOWED” operation=“open” profile=“/usr/sbin/sssd” name=“/proc/4688/cmdline” pid=2106 comm=“sssd_nss” requested_mask=“r” denied_mask=“r” fsuid=0 ouid=0
juil. 29 15:17:56 REDACTED kernel: audit: type=1400 audit(1690636676.663:138): apparmor=“ALLOWED” operation=“open” profile=“/usr/sbin/sssd” name=“/proc/4696/cmdline” pid=2106 comm=“sssd_nss” requested_mask=“r” denied_mask=“r” fsuid=0 ouid=0
juil. 29 15:19:06 REDACTED kernel: audit: type=1400 audit(1690636746.295:139): apparmor=“ALLOWED” operation=“open” profile=“/usr/sbin/sssd” name=“/proc/4713/cmdline” pid=2106 comm=“sssd_nss” requested_mask=“r” denied_mask=“r” fsuid=0 ouid=0
juil. 29 15:19:59 REDACTED kernel: audit: type=1400 audit(1690636799.495:140): apparmor=“ALLOWED” operation=“open” profile=“/usr/sbin/sssd” name=“/proc/1821/cmdline” pid=2106 comm=“sssd_nss” requested_mask=“r” denied_mask=“r” fsuid=0 ouid=0
juil. 29 15:23:03 REDACTED kernel: audit: type=1400 audit(1690636983.986:141): apparmor=“ALLOWED” operation=“open” profile=“/usr/sbin/sssd” name=“/proc/1821/cmdline” pid=2106 comm=“sssd_nss” requested_mask=“r” denied_mask=“r” fsuid=0 ouid=0

I can’t find any trace of the RIP: message

There really doesn’t seem to be anything here. If it was a GPU driver-related crash, I expected there to be something related to the GPU there, but there wasn’t. On the Windows side, do you see anything in system event logs in the Event Viewer? (Event Viewer → Windows Logs → System).

Also, it might be worth monitoring the temperature of the CPU and GPU.

I’ll check the Windows side.

I’ve checked the temperature and they stay within a reasonable range ≈60°C / 70°C

What is weird to me is that with the xserver-xorg-driver-nvidia, even tough the interface was laggy, it remained stable for multiple hours.

iirc, that’s just a generic VESA driver, and cannot actually interact with the card properly. It’s stuck at lowest clocks, doesn’t have acceleration for things, it just carries a frame to the display for you. Basically like running windows without a GPU driver.

1 Like

Thansk for the explanation, I didn’t take time to check what it was exactly.
However, given that the system is stable with it, doesn’t it mean that the Nvidia driver is likely to be the culprit?

It implies that the nvidia driver having full feature access to the GPU is causing a problem somewhere in the chain. But, it doesn’t really narrow it down much.
It could be partial GPU failure, it could be a bad GPU driver from nvidia, it could be a kernel module incompatible with nvidia’s driver, it could be a kernel bug, it could be a power issue…
We know it happens in two Debian installs with two versions of the nvidia driver and not much else.
Does it happen on any Arch based distros?
Does it happen on a fresh install?
Does it happen in Nouveau? Does it happen in Nouveau when you reclock it to use 3D clocks?
Are you using X11 or Wayland based DEs? Can you try another desktop environment that uses the other?

I’ve tried a fresh install of Fedora 36 and I had the same issue. So it isn’t related to Debian nor to the fact that they aren’t fresh installs.

I’ve tried KDE, XFCE and Gnome without any difference
I will verify but I think they all were using Wayland.

I also have to verify but if I remember well, my PSU is a 750W from a reputable brand. Is there a way to test if the problem is the PSU?

For the GPU, my girlfriend also has a GTX1060 so I could try to swap them to see if it’s hardware related. However, I don’t understand why windows works flawlessly.

In case it is related to the drivers? Should I try to contact Nvidia?

I didn’t try an Arch based distrib, the issue is that even if it works, it won’t solve my issue because I use Yocto which isn’t tested on Arch distrib.

What is Nouveau you are talking about?
Do you have a link about reclocking I could dig into?

I don’t think XFCE supports wayland yet, so it sounds like it’s… Possibly hardware failure that the windows driver is masking?

Nouveau is the open source nVidia driver. It barely works and for most GPUs, it requires manually setting the power state to be non-idle, I’d be surprised if it didn’t crash. I’m not sure Pascal even supports reclocking at all, it needs to be done manually via the terminal for Kepler.
In terms of testing Arch, it’s just another step to seeing what works and what doesn’t. It probably won’t work, but if it did, theoretically, it would mean that there’s some problem that isn’t just that nvidia’s linux drivers have suddenly become completely broken, such as the distros not properly blacklisting nouveau, or the kernel not loading nvidia drivers properly on boot.
It would be even more useful to have a known-good/working/legacy install to see if the problem persists with old software. If old was-working software is no longer working, that implies some change to the hardware causing it to not work. If you have any old disk images of a working linux install, it would be good to test those. If not, a live environment that has baked-in nvidia drivers from last year sometime could also do.

Swapping your girlfriend’s GPU in and seeing if that works would also be good to try, as would booting a liveimage in her computer to see if it has the same problems.

I have no idea what’s actually wrong here or where to look, so I’m just throwing out anything I can think of that might produce some kind of information to go off.

You’ll probably need to start from scratch to even use Nouveau, but it should work in a liveimage, since it’s the open source driver and plays nicely with the rest of linux, though not with the nVidia GPUs it’s supposed to be working with.
It doesn’t look like pascal supports reclocking, though, so it may not be a very useful test.

1 Like

Doesn’t help troubleshoot his problem with the gpu but the nvidia drivers do in fact allow reclocking of both memory and core clocks if you set it up with what is known as the “coolbits tweak”

Works on every generation of Nvidia card back to Maxwell. A simple command line statement to enable it and reboot and then to either set your desired overclock via the terminal and nvidia-settings or just manually adjust the clocks via the Nvidia X Server Settings application that gets installed by default with the proprietary drivers.

Just a FYI. I run all my Nvidia cards with an appropriate overclock to get them back to equivalent P0 power state after the driver imposes a downclock penalty for running compute loads on consumers cards.

This is the coolbits tweak.

sudo nvidia-xconfig --thermal-configuration-check --cool-bits=28 --enable-all-gpus

This is my bash script I run on my cards.

#!/bin/bash

/usr/bin/nvidia-smi -pm 1

nvidia-smi -i 0 -pl 200
nvidia-smi -i 1 -pl 200
nvidia-smi -i 2 -pl 200

/usr/bin/nvidia-settings -a "[gpu:0]/GPUPowerMizerMode=1"
/usr/bin/nvidia-settings -a "[gpu:1]/GPUPowerMizerMode=1"
/usr/bin/nvidia-settings -a "[gpu:2]/GPUPowerMizerMode=1"

/usr/bin/nvidia-settings -a "[gpu:0]/GPUFanControlState=1"
/usr/bin/nvidia-settings -a "[fan:0]/GPUTargetFanSpeed=85"
/usr/bin/nvidia-settings -a "[fan:1]/GPUTargetFanSpeed=90"
/usr/bin/nvidia-settings -a "[gpu:1]/GPUFanControlState=1"
/usr/bin/nvidia-settings -a "[fan:2]/GPUTargetFanSpeed=85"
/usr/bin/nvidia-settings -a "[fan:3]/GPUTargetFanSpeed=90"
/usr/bin/nvidia-settings -a "[gpu:2]/GPUFanControlState=1"
/usr/bin/nvidia-settings -a "[fan:4]/GPUTargetFanSpeed=85"
/usr/bin/nvidia-settings -a "[fan:5]/GPUTargetFanSpeed=90"

/usr/bin/nvidia-settings -a "[gpu:0]/GPUMemoryTransferRateOffset[4]=800" -a "[gpu:0]/GPUGraphicsClockOffset[4]=60"
/usr/bin/nvidia-settings -a "[gpu:1]/GPUMemoryTransferRateOffset[4]=800" -a "[gpu:1]/GPUGraphicsClockOffset[4]=60"
/usr/bin/nvidia-settings -a "[gpu:2]/GPUMemoryTransferRateOffset[4]=800" -a "[gpu:2]/GPUGraphicsClockOffset[4]=60"

This enables overclocking. Sets the power limits on each card, sets the fans speeds and sets the core clock and memory clock speeds for each card.

I also would recommend to not use Wayland. The Nvidia drivers are more stable with the X Server. 22.04 defaults to Wayland now I believe. You can revert to X by commenting out the:

WaylandEnable=false

statement in the /etc/gdm3/custom.conf file with a # and save the file and reboot to get up on X11 DE.

Ubuntu MATE def uses x11. Did I miss the PSU wattage and model? IIRC people have mentioned psu capacity declines over time.

Might want to try the 400 era drivers with that card

A 1060 6GB card uses maybe 120W. Any power supply greater than 400W would be sufficient.

Since the card works in Windows, I doubt the power supply is the issue.