(kernel experts needed) ACPI errors causing Linux to not boot

before you ask, yes i compiled the kernel with a custom boot logo.
i have a Dell precision m6800 with BIOS version A25. running Linux 5.2
i just upgraded the wireless card to the sierra wireless MC7455 from the sierra wireless MC7355, with the intention to get LTE wwan as the 7355 doesnt support LTE.
my system now will not boot. it runs abhorrently slow for a while and then it overheats and shuts down.

i managed to use my smartphone to take some pictures of my kernel messages.
these are from multiple attempts to boot the system.
[UPDATE]: here is a kernel log
https://pastebin.com/b20QEqPi

please help, it is terribly inconvenient not being able to use my laptop.

Won’t be of much help but it seems like a memory/cpu leak at some point.

Did you load the drivers to the new card into this kernell?

How’s standard kernel compatibility with it?

You trying to get into the system because you need important files or you just want your system back?

1 Like

i did not load any drivers. i just upgraded the wwan card and booted.
i assumed i could install any needed drivers once it booted up.
i want to be able to use my laptop as is.
i have access to all its storage devices and can use another machine to build a fixed kernel if i need to.

You might try to live boot into it to get important stuff and recompile the kernel taking the old driver away and installing yours.

I don’t mess with kernels and stuff, at least not yet… but there might be something tied to the booting sequence that broke it. Like it might be trying to load the modules, failing and without a throw exception (not sure if it’s the word, but the try{} and catch basically)

1 Like

UPDATE: i plugged the SSD into another machine and grabbed the kernel logs.
https://pastebin.com/b20QEqPi

I’m fairly curious to understand it all for academic purposes.

So let’s go:

Are the fans spinning normally? Do they ramp up?

Have you heard of any other linux user complaining about the wi-fi card model?

Can you pop the old card back and see if you can boot?

This part:

 acpi device:41: Cannot transition to power state D3hot for parent in (unknown)

Seems to be common among dell laptops, I saw some posts in other forums, but couldn’t quite understand it. But might be some cpu leak…(given you hit around 80ºC at some point) Not entirely sure. Is this part on the kernel proprietary (the ACPI)?

1 Like

What distro are you running?

Have you tried to boot using another kernel?

Regarding fans mentioned by another poster generally TLP and ik8utils tend to deal with this: https://www.cyberciti.biz/faq/controlling-dell-fan-speeds-temperature-on-ubuntu-debian-linux/

https://wiki.archlinux.org/index.php/Fan_speed_control#Dell_laptops

I’ll read your logs and get back with some more answers.

—— updates as I check through log ——

  1. crda error suggests you haven’t got country details setup, or your wireless card driver is causing interference https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=871643
  2. ACPI error is non-specific, but does your BIOs contain options to disable your dedicated GPU as I believe Optimus issues can create similar ACPI issues, as it’s trying to switch cards not knowing states.
1 Like

i am using Debian GNU/Linux 10
i have tried other kernels. same results.

when i undo my wireless card upgrade, the system behaves perfectly fine. besides the fact that my old wireless card is 3G only, and my use case needs at least LTE speeds.

my fans function perfectly and they do ramp up to full speed, it just the cooling on this laptop has never been able to keep cool under full load. my normal solution to this is to simply avoid high loads.

my BIOS does not have a setting to disable the use my dedicated GPU.
there is a feature called “switchable graphics” which i have disabled as Linux would never boot with it enabled back when my laptop worked.

what is Optimus? i dont recall ever seeing any reference to a feature of that name anywhere in the BIOS.

i wasnt aware WWAN cards needed country details, i thought they only needed to support the bands of one’s country.

That was my bad, I didn’t look up the wireless card you referenced, and made a poor assumption it was wlan not wwan.

Optimus is the name of a switchable graphics solution offered in select systems with Nvidia and intel graphics. What confuses me is why your system is trying to load AMD gpu drivers, given I have not seen reference to a Dell Precision M6800 with both AMD and intel graphics.

AMD:

Sep 23 12:08:54 localhost kernel: [  129.507271] [drm] amdgpu kernel modesetting enabled.
Sep 23 12:08:54 localhost kernel: [  129.513642] amdgpu 0000:01:00.0: remove_conflicting_pci_framebuffers: bar 0: 0xe0000000 -> 0xefffffff
Sep 23 12:08:54 localhost kernel: [  129.513644] amdgpu 0000:01:00.0: remove_conflicting_pci_framebuffers: bar 2: 0xf0000000 -> 0xf07fffff
Sep 23 12:08:54 localhost kernel: [  129.513645] amdgpu 0000:01:00.0: remove_conflicting_pci_framebuffers: bar 5: 0xf7e00000 -> 0xf7e3ffff
Sep 23 12:08:54 localhost kernel: [  129.513648] fb0: switching to amdgpudrmfb from EFI VGA
1 Like

my laptop has an AMD SATURN XT dedicated GPU.
the AMD Firepro M6100 2GB.
it is what came with the laptop and it worked flawlessly before i upgraded my wwan card.

after several days of googling, it seems i am the first one to find this bug. i cant find any examples of similar situations.

from what information I could find, i believe this issues to be entirely an ACPI problem with my kernel.
I’ll look at my kernels config and see if i can find anything out of place relating to ACPI. (not that i know where to look).

UPDATE: i could find nothing wrong in my kernel config.
also: i tried disabling ACPI completely, which caused other errors which also prevent me from using the system.

the kernel parameter acpi_sci=low enables the system to bootup just fine.

1 Like