Threadripper Reset Fixes

Does this Beta BIOS also solve the issue with KVM_AMD not being available?

I have no issues with that, checked lsmod and kvm_amd is loaded. I am running stock Ubuntu kernel also, which is 4.15.

Hey all, this udev issue is also present on the ASUS Zenith Extreme if you upgrade to the latest 1402 bios. Presumably the only option at the moment is to downgrade the bios, rebuild the kernel with option CONFIG_CRYPTO_DEV_SP_PSP=n, then I can upgrade the bios and it should behave for now?

Hi can you tell me how to do this with Ubuntu?

Thanks

Worked it out got this far so far will update when finished
Some where to work
mkdir Kernel4-17-14
cd Kernel4-17-14/

Get kernel
git clone git://git.launchpad.net/~ubuntu-kernel-test/ubuntu/+source/linux/+git/mainline-crack v4.17.1

Get ubuntu patches
wget http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.17.14/0001-base-packaging.patc
wget http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.17.14/0002-UBUNTU-SAUCE-add-vmlinux.strip-to-BOOT_TARGETS1-on-p.patch
wget http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.17.14/0003-UBUNTU-SAUCE-tools-hv-lsvmbus-add-manual-page.patch
wget http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.17.14/0004-UBUNTU-SAUCE-no-up-disable-pie-when-gcc-has-it-enabl.patch
wget http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.17.14/0005-debian-changelog.patch
wget http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.17.14/0006-configs-based-on-Ubuntu-4.17.0-8.9.patch

Apply patches in order
cd v4.17.14/
patch -p1 < /root/Kernel4-17-14/0001-base-packaging.patch
patch -p1 < /root/Kernel4-17-14/0002-UBUNTU-SAUCE-add-vmlinux.strip-to-BOOT_TARGETS1-on-p.patch
patch -p1 < /root/Kernel4-17-14/0003-UBUNTU-SAUCE-tools-hv-lsvmbus-add-manual-page.patch
patch -p1 < /root/Kernel4-17-14/0004-UBUNTU-SAUCE-no-up-disable-pie-when-gcc-has-it-enabl.patch
patch -p1 < /root/Kernel4-17-14/0005-debian-changelog.patch
patch -p1 < /root/Kernel4-17-14/0006-configs-based-on-Ubuntu-4.17.0-8.9.patch

cp /boot/config-uname -r .config
nano .config
locate (ctrl+w) ‘CONFIG_CRYPTO_DEV_SP_PSP’ set to =n
locate (ctrl+w) ‘CONFIG_KVM_AMD’ set to =y

Fixed some errors with Makefile.lib:196: recipe for target ‘scripts/kconfig/zconf.tab.c’ failed
By doing apt-get install bison flex

I highly recommend you use make-kpkg and build an actual package rather then invoke make directly.

make-kpkg kernel_image --initrd
dpkg -i ../linux-image-.....deb

make-kpkg is part of the kernel-package package.

Thanks I’ll give that a go if I have time at the week end.

hm, I haven’t had any issues. But I’m on kernel 4.15 (stock Ubuntu kernel, to be specific), so maybe you need to compile with that flag for newer kernels only?

Thanks for info, perhaps I’ll give Ubuntu another go for now then, I forgot to mention I was using Fedora 28.

Just found this bugzilla:
https://bugzilla.redhat.com/show_bug.cgi?id=1608242

Steven Haigh 2018-08-14 12:29:45 EDT
Good news, I have received a reply from AMD regarding the issue after submitting further information as gathered above and from other sources.

Quote:
It would appear that the BIOS/firmware is advertising it supports SEV, when in fact it doesn’t. We currently don’t have a timeout associated with the SEV commands and so the module load is stuck - which would also explain the KVM issue.

The current approach is to add a timeout to the kernel that stops things sticking forever and submit that to the current stable releases.

I have also asked if this could be forwarded on to the BIOS team to not advertise SEV on hardware that doesn’t support it - I thought this could be an OEM thing, but it might be unlikely that it is multiple OEMs making the same mistake.

1 Like

Hello @x3sphere,

I am interested in doing a similar hardware setup that you have with (based on the IOMMU output that you posted on Aug 9th):

  • Asus ROG Zenith Extreme
  • (2x) Video Cards
  • Fresco or Sonnet USB Controller
  • Not sure if you using the included 10GB PCIe card ?

I would really appreciate if you could help me some queries/questions that I have before I go down this (expensive) road…

  1. Are you using BIOS version 1402 (AGESA 1.1.0.1 Patch A)?
  2. Are you using the PCIe x4 slot that is connected to the chipset? Is this in its own IOMMU group?
  3. Is it possible for you to post/give me a copy of the output?
  • IOMMU groups output
  • lspci -vvvt
  • dmidecode --type slot

Thank you in advance for your time and help.

Yes, using BIOS 1402. No, I’m not using the PCIe x4 slot. Last I tried it, it was not in its own IOMMU group, but this may have changed with the latest bios. I’m also not using the 10GB PCIe card

As far as the full length slots are concerned, I have my Sonnet Allego Pro USB card in slot 1, Radeon WX7100 is in slot 2, 1080 Ti in slot 3, and a GTX 1060 is in slot 4. The reason I put in the USB card in slot 1, is because my Noctua cooler partially covers the slot. There’s barely enough room to fit the card in there though.

Here is the output of those commands,

IOMMU groups:

https://pastebin.com/fWDcRi6G

lspci -vvvt

https://pastebin.com/sccac915

dmidecode --type slot

https://pastebin.com/v1dugnRF

@x3sphere,

Thank you a lot for helping to provide the output of the commands. It was extremely useful and you have a cool build :slight_smile:

It is interesting that dmidecode is showing incorrect information, such as all the bus addresses for each slot as 0000:00:00.0 and all the slots are in use.

[*1] As for the lspci, I am guessing that both 03.0-[06] and 04.0-[07] is for PCIEX4 and PCIEX1 which is on the X399 chipset, which most likely means that it will probably be sharing the same IOMMU Group as the Intel I211 NICs, Wireless/BT, and ASMedia USB controller. So. I think this probably has not changed since you last tested it with the older BIOS, but I could be wrong.

[*2] I also notice that there were 2 USB controllers in its own IOMMU group, which I am thinking that I would not need the Sonnet Allego Pro USB card anymore if I can just passthrough these onboard USB controllers to a Windows VM.

Sorry to trouble you again, but if possible could you give me a few more outputs in regards to devices that support resetting and USB info?

Thanks again!

https://wiki.archlinux.org/index.php/PCI_passthrough_via_OVMF#Passing_through_a_device_that_does_not_support_resetting

for iommu_group in $(find /sys/kernel/iommu_groups/ -maxdepth 1 -mindepth 1 -type d);do echo "IOMMU group $(basename "$iommu_group")"; for device in $(\ls -1 "$iommu_group"/devices/); do if [[ -e "$iommu_group"/devices/"$device"/reset ]]; then echo -n "[RESET]"; fi; echo -n $'\t';lspci -nns "$device"; done; done

https://wiki.archlinux.org/index.php/PCI_passthrough_via_OVMF#USB_controller

for usb_ctrl in $(find /sys/bus/usb/devices/usb* -maxdepth 0 -type l); do pci_path="$(dirname "$(realpath "${usb_ctrl}")")"; echo "Bus $(cat "${usb_ctrl}/busnum") --> $(basename $pci_path) (IOMMU group $(basename $(realpath $pci_path/iommu_group)))"; lsusb -s "$(cat "${usb_ctrl}/busnum"):"; echo; done

[*1]

         +-01.1-[01-08]--+-00.0  Advanced Micro Devices, Inc. [AMD] Device 43ba
         |               +-00.1  Advanced Micro Devices, Inc. [AMD] Device 43b6
         |               \-00.2-[02-08]--+-00.0-[03]----00.0  Qualcomm Atheros QCA6174 802.11ac Wireless Network Adapter
         |                               +-01.0-[04]----00.0  Wilocity Ltd. Wil6200 802.11ad Wireless Network Adapter
         |                               +-02.0-[05]----00.0  Intel Corporation I211 Gigabit Network Connection
         |                               +-03.0-[06]--  
         |                               +-04.0-[07]--
         |                               \-09.0-[08]----00.0  ASMedia Technology Inc. Device 2142

[*2]

IOMMU Group 21 0c:00.3 USB controller [0c03]: Advanced Micro Devices, Inc. 
[AMD] Family 17h (Models 00h-0fh) USB 3.0 Host Controller [1022:145c]

IOMMU Group 51 4a:00.3 USB controller [0c03]: Advanced Micro Devices, Inc. 
[AMD] Family 17h (Models 00h-0fh) USB 3.0 Host Controller [1022:145c]

Yeah, you don’t need the Sonnet USB card. Prior to the 1402 BIOS, none of the USB controllers were on their own IOMMU group so I had to get this at the time. Passing through one of the onboard USB controllers works fine now.

Here are the other outputs:
https://pastebin.com/enXgFJ4Z

https://pastebin.com/vaE42e8m

Thanks again for the help.

Looks like these 2x USB controllers support RESET [*1] also which I believe means the passthrough for these should be good. Not sure which USB ports these are on the motherboard, but I am hoping either the front panel headers or the I/O shield. But I think you are right, I can probably drop the Sonnet card for the time being and worst comes to worst pick one up if necessary… I just hope I won’t need it hehe…

I also notice that your NVIDIA 1080TI doesn’t support RESET [*2], at least from the output of that command. I assume you are using KVM, but have you had any problem with passing this video card to VM(s) in regards to shutting down the VM, and the video card is no longer functional and requires your computer to be power-cycled to make the video card available again for the VM?

[*1]

IOMMU group 19
[RESET]	0b:00.3 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD] 
Family 17h (Models 00h-0fh) USB 3.0 Host Controller [1022:145c]
IOMMU group 47
[RESET]	49:00.3 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD] 
Family 17h (Models 00h-0fh) USB 3.0 Host Controller [1022:145c]

[*2]

IOMMU group 16
0a:00.0 VGA compatible controller [0300]: NVIDIA Corporation GP102 [GeForce GTX 1080 Ti] [10de:1b06] (rev a1)
0a:00.1 Audio device [0403]: NVIDIA Corporation GP102 HDMI Audio Controller [10de:10ef] (rev a1)

Edited by gnif: In future please format your posts, this is the 2nd time I have had to fix them for you.

Yes, I am using KVM. That’s odd it doesn’t list it as reset capable, but I don’t have any issues with shutting down/restarting the VM. In general, all the NV cards I’ve had handle PCIe reset fine, only had issues with it on AMD GPUs.

What are your BIOS setting? upgrading puts everything to default and I recall there area few hard to find ones.

Advanced/AMD PBS/Enumerate all IOMMU in IVRS = ON
SVM Mode = ON
Memory Interleaving = Channel

That’s the only changes I’ve made, apart from enabling the XMP/DOCP profile for my RAM.

Hi, is your build still working with latest bios on taichi (3.30)? I’m trying to pass nvme and always get D3 state with the latest xubuntu unpatched :frowning:

Hi,

You did not post the whole “lspci -vt” output - your other PCI segment is cut off.

Do you know how the PCI segments are assigned with physical slots? Which PCIe / M.2 slots are on which segments? It would be greatly appreciated if you could tell us!

Cheers!

Finally! With kernel 4.18.16 using .deb’s straight from http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.18.16/ all issues related to Threadripper + Vega 56 is solved.

I can start and stop VM all day long, without need to reboot or shutdown.

Still setting proper values in /boot/config-4.18.16-041816-generic

CONFIG_CRYPTO_DEV_SP_PSP=n
CONFIG_KVM_AMD=y
4 Likes

You didn’t do anything else to get Vega resetting properly?

I updated to the same kernel and it’s still broken, for me. VM fails to reboot or start again after shutdown. I can sorta work around the issue by suspending to RAM after shutting down the guest, then starting up the VM again.

With my Nvidia card (1080 Ti) it works fine however.