@wendell Today, after updating Fedora 33 (no other changes), F33 got stuck during boot. I eventually found that F33 now gets stuck during boot, consistently and repeatably, only if my L1T DisplayPort EDID box is installed. I know - sounds crazy.
This is on a multi-boot computer with a secondary monitor connected via a DP EDID between computer and an L1T KVM. The setup has been working well for about a year with Fedora 31, 32, & 33; it still works with Windows 10 and Pop!_OS 20.04.
I’ve tried various combinations of gear, as well as deep powerdown, and reset of the DP EDID. In particular, I followed your advice in another topic: connect just the computer, EDID, & monitor; power cycle all; reset the EDID. Still doesn’t work, but only with F33. As best I can figure, F33 is now using the DP in a way that doesn’t work with the EDID. I’m hoping another update in future will restore proper operation.
As for how it gets stuck - during boot, the primary monitor shows a “splash screen” with a circular spinning… worm? comet? thingamajig. Normally, when the spinner disappears, the boot completes in a few seconds. Now, it disappears and then nothing ever changes (the splash screen “fedora” text remains on screen). This is the time at which the login screen would normally be displayed (on the secondary monitor in my system - the UEFI and early boot display on the primary monitor).
This is just an FYI; I don’t ask or expect an investigation or solution. I have a workaround for now, and may change distros if F33 doesn’t clear this up. Just wanted you & forum members to have the information.
P.S. I last updated F33 on March 31, so the issue would have been introduced since then.
At the boot menu, hit E to edit, then erase the “rhgb” and “quiet” words. Then boot with that, and you’ll have some actual information to work with instead of a spinning circle.
@rcxb What a good idea! Unfortunately, when I did as you suggested, the screen went black at the critical moment, and any important text was on screen for only an instant.
But you inspired me to spend some time studying the system logs for successful and failed boots. With the EDID Repeater installed, there is a kernel null pointer dereference. This takes place while gdm is being setup; the system is querying the graphics card and monitors for their capabilities. The bug immediately follows an entry “Only one GPU will be used for this X screen.” A Call Trace (stack trace) shows the failure occurring several calls deep beneath function nv_open_device() in a module (?) named “[nvidia]”. I’m not sure if this means the issue is within the NVidia binary blob.
This begins to make sense, as the EDID Repeater is partially responsible for supplying the type of information the system is gathering.
While the ABRT says there is too little info for it to make an automated report, I now have enough scraps to make a manual bug report, if I can figure out how to submit it.
Many thanks for your response; I had about given up on investigating further.
@wendell After a little investigation and reflection, I would like to point out this is an opportunity to make the EDID Repeater even better than it already is. The null pointer dereference is the responsibility of NVidia or kernel maintainers, but there must be some difference when the repeater is installed, that makes the OS act differently.
The simplest possible setup with a repeater - just the computer, EDID Repeater, and monitor (always connected) - activates this bug. Remove the EDID Repeater and the bug lies dormant.
So perhaps you might organize an effort to find out what is different when the repeater is installed? That could lead to an improved repeater that is even more “transparent”, so the computer is not affected by its presence.
If you disable gdm (or set your default target to multi-user instead of graphical), then boot-up with the DP box connected, you should avoid the crash and then can dump and decode the EDID data to see if anything is obviously wrong (e.g. blank fields, zeros, what ever).
Compare that to the edid data without the DP box connected.
And once you’ve booted, try startx to see if you can get a GUI without the crash. If so, and the bug is only triggered by gdm, you could always switch to lightdm or something else like good old simple xdm.
Wow, I’m dealing with a wizard! Is that you, Gandalf?
In Fedora33, I don’t find the edid file. The find command finds nothing.
I can follow a similar path below /sys/devices/pci… down to …/drm/card0/ but there is no card0-DP-? there, nor any edid below that point.
But when I boot Pop!_OS instead, the /sys/ tree looks much as you say. I haven’t installed the latest NVidia driver in Pop!_OS, so it still boots fine with the repeater, which is convenient.
And to my surprise, the edid file is identical with & without the EDID box installed! I checked both DP-1 & DP-2 in case I was confused about which is which; both are identical with & without the box.
Color me baffled. Whimper…
I’m going to investigate the EDID handling in Fedora a little more and try to repeat the comparison there. But I’m starting to wonder about some sort of timing variation or race condition.
I will note the EDID repeater strips Freesync bits so it does not show up as a Freesync monitor even if you have it on in the monitor settings. Also own one, and it helps when I switch sources on my monitor.
It takes the important sections of the EDID and copies it. It does not copy the entire EDID table. My monitor, which is especially sensitive to EDID changes at 4K over HDMI, did not allow picture to be shown when adapting from DP to HDMI over the adapter. DP with a modified EDID is fine on this monitor though.