At this point I’ll try anything haha. You got it I’ll isolate to one GPU in an 8X slot and report back. I did find a 0902 beta UEFI update that I wasn’t aware that existed. I’m currently flashing that now. Doubt I’ll be seeing and change from that but it’s worth a shot.
Groupings all looked good. GPU and audio devices were assigned to the same group and isolated to themselves. I’ll post that in just a moment when this comes back up from the UEFI update.
Looks like IOMMU groups 26 and 27, and groups 14 and 15. it’s odd that it split them in two different groups for one device right? I’m guessing that won’t be a problem. I do have ACS enabled as well as PCIe ARI Support. Those shouldn’t have and adverse affects will they?
I’ve had to cut tubes for both instances so I’m pretty covered. Just a pain to have to drain and refill, but I’ve gotten pretty quick at it on this build. I have a vega64 in 3 out of 4 slots so I could always just pull/add power to those I want to test. I’d assume that doesn’t damage the card or the board XD.
You’re definitely right, wishing I had some soft tubes to be able to use in the meantime though!
ACS enabled within the BIOS is all. Ah, I’m a dummy, I was wondering if ACS was causing that split but I didn’t much mind to it just as long as they were isolated from anything else. I’ll go ahead and disable it and test just in case. Will post the results shortly!
I think ACS needs to be enabled. I’ve never seen an option for that, so I can’t say for sure. I’ll defer to Wendell on that one. I was talking more specifically about the kernel patch that overrides the ACS configuration that the UEFI sets up.
20:17 System encountered a non-fatal error in i2c_dw_init_master()
20:17 sp5100_tco: I/O address 0x0cd6 already in use kernel
Reboot
20:10 System encountered a non-fatal error in i2c_dw_init_master()
20:10 sp5100_tco: I/O address 0x0cd6 already in use kernel
Reboot
19:45 System encountered a non-fatal error in i2c_dw_init_master()
19:45 imjournal: ignoring invalid state file /var/lib/rsyslog/imjournal.state [v8.32.0] rsyslogd
19:45 imjournal: fscanf on state file `/var/lib/rsyslog/imjournal.state' failed [v8.32.0 try http://www.rsyslog.com/e/2027 ] rsyslogd
19:45 sp5100_tco: I/O address 0x0cd6 already in use kernel
Reboot
19:28 System encountered a non-fatal error in i2c_dw_init_master()
19:28 sp5100_tco: I/O address 0x0cd6 already in use kernel
Reboot
18:58 System encountered a non-fatal error in i2c_dw_init_master()
18:58 sp5100_tco: I/O address 0x0cd6 already in use kernel
18:58 could not read from '/sys/module/pcc_cpufreq/initstate': No such device systemd-udevd 2
Reboot
18:46 System encountered a non-fatal error in i2c_dw_init_master()
18:46 sp5100_tco: I/O address 0x0cd6 already in use kernel
18:46 could not read from '/sys/module/pcc_cpufreq/initstate': No such device systemd-udevd
Reboot
18:08 error: process_write: write failed sftp-server 2
08:48 wil6210 0000:04:00.0 wlp4s0: wil_halp_vote: HALP vote timed out kernel 2
Not a lot to go off of. My thoughts were that there’s some obscure feature on the board that’s messing things up, or that I did something wrong when compiling the kernel.
No luck, seeing the same issue.
It leads me to think that I either compiled the kernel wrong, or that theres a feature of the board that may be causing an issue. Just a guess really.
I have this board too, I’m using with 2 nvidia 980ti’s… successfully on bios 0804. Perhaps try that bios to narrow it down?
There is a second IOMMU option buried in AMD PBS/CBS or something, one of those menu’s, as well as the option you’ve already found. I have both enabled - I think the buried option is disabled by default…
Hey new here, I have the same motherboard and wanted to chime in to contribute to the pool of knowledge:
I finally got my setup just about perfect. On the ROG ZENITH there may be a bug of sorts regarding UEFI/bios settings.
Counter-intuitively, I have had best success disabling compatibility mode in UEFI. Using UEFI version 902.
I have a rx550 for host. I did not want to wast an x16 PCIE slot on it, so i had it in the second x8 slot. Initially even choosing it as the primary GPU in the UEFI under ‘tools,’ did not allow the system to POST through the rx550. I found a post suggesting disable compatibility mode, and it worked!
Coincidentally this also allowed the ThreadRipper Reset Patch to finally work with my board as well.
Took a while and a lot of hours I should have been studying… But i finally have my setup just about right! Rx 550 for host Fedora, Vega for ROCm ubuntu VM, and gtx 1050ti for ubuntu vm. Nice to be able to do deep learning in VMs. The software configs are so picky, its nice to be able to get a VM working then simply back up the VM :). Great to be able to roll back to working state with the bleeding edge stuff
I’m running Manjaro linux and to get rid of this error I had to ensure that the sp5100_tco module was loaded before the i2c module.
To do this I had to edit /etc/mkinitcpio.conf and add the module to the following section (right at the top):
# MODULES
# The following modules are loaded before any boot hooks are
# run. Advanced users may wish to specify all system modules
# in this array. For instance:
# MODULES=(piix ide_disk reiserfs)
MODULES="sp5100_tco"
Then I had to rebuild the initial ramfs (using mkinitcpio).
Once that’s done you should see the following in the boot log (at least, this is what shows up on my boot):
[ 1.166511] sp5100_tco: SP5100/SB800 TCO WatchDog Timer Driver
[ 1.166587] sp5100-tco sp5100-tco: Using 0xfed80b00 for watchdog MMIO address
[ 1.166596] sp5100-tco sp5100-tco: Watchdog hardware is disabled