Broken virtualisation (and more) on ASRock AB350 Pro4

Just tested on my own ASRock X370 Gaming K4 platform with arch and 4.16.3 kernel and I can’t reproduce this Virtualbox behavior.
It just works, even when I break the clocksource into hpet mode. Hpet is real slow btw. :wink:

This btw is the full extent of iommi kernel events when dealing when starting a VBox VM in my system.

journalctl -k | grep vbox

Note how IOMMU domains are only created and freed as vbox VM’s are started and stopped.

I’m also concerned about the clocksource instability. I don’t see that on my machine (which is not overclocked).

[email protected]:~# dmesg | egrep -i "clocksource|PSD"
[    0.000000] clocksource: refined-jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 1910969940391419 ns
[    0.000000] clocksource: hpet: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 133484873504 ns
[    0.043420] clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 1911260446275000 ns
[    0.310039] clocksource: Switched to clocksource hpet
[    0.322039] clocksource: acpi_pm: mask: 0xffffff max_cycles: 0xffffff, max_idle_ns: 2085701024 ns
[    1.503318] tsc: Refined TSC clocksource calibration: 3600.282 MHz
[    1.503876] clocksource: tsc: mask: 0xffffffffffffffff max_cycles: 0x33e55d41bc4, max_idle_ns: 440795207871 ns
[    2.511970] clocksource: Switched to clocksource tsc
[    8.406697] acpi_cpufreq: overriding BIOS provided _PSD data

…but I do have the _PSD data line.

1 Like

Yeah, the output of

cat /sys/devices/system/clocksource/clocksource0/current_clocksource

Is indeed hpet. Can you tell me/link me to more information about this?

As for Vbox I’m not exactly sure what’s going on there. It’s creating IOMMU domains and freeing them before the log ends. Chances are that the log is also not fully written out to disk when the crash occurs.

I just want to be clear that it’s the virtual machine that hangs, not the host machine. I’ve not been able to access any of the virtual machines over SSH, one of those demonstrated in the video is a vagrant machine which fails to provision as it always hangs before SSH is started.

Also I think that the IOMMU domains are being created/destroyed correctly, it was just hard to see in the earlier log, when looking at the output of journalctl you can see those actions were a few minutes apart, enough time to start the machine and for it to hang:

Apr 23 20:07:16 smeg kernel: VBoxNetFlt: attached to 'vboxnet0' / 0a:00:27:00:00:00
Apr 23 20:07:16 smeg kernel: device vboxnet0 entered promiscuous mode
Apr 23 20:07:16 smeg kernel: vboxdrv: 0000000001a6ec03 VBoxDDR0.r0
Apr 23 20:07:16 smeg kernel: vboxpci: created IOMMU domain 00000000746b3096
Apr 23 20:12:53 smeg kernel: vboxpci: freeing IOMMU domain 00000000746b3096
Apr 23 20:13:00 smeg kernel: device vboxnet0 left promiscuous mode
Apr 23 20:13:00 smeg kernel: vboxnetflt: 0 out of 4 packets were not sent (directed to host)
1 Like

Yours is about the same as mine on a stock 1700X

[email protected]:~$ journalctl -k | egrep -i "clocksource|PSD"
Apr 23 13:33:34 jupiter kernel: clocksource: refined-jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 6370452778343963 ns
Apr 23 13:33:34 jupiter kernel: clocksource: hpet: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 133484873504 ns
Apr 23 13:33:34 jupiter kernel: clocksource: tsc-early: mask: 0xffffffffffffffff max_cycles: 0x30e974a77bc, max_idle_ns: 440795308615 ns
Apr 23 13:33:34 jupiter kernel: clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 6370867519511994 ns
Apr 23 13:33:34 jupiter kernel: clocksource: Switched to clocksource tsc-early
Apr 23 13:33:34 jupiter kernel: clocksource: acpi_pm: mask: 0xffffff max_cycles: 0xffffff, max_idle_ns: 2085701024 ns
Apr 23 13:33:34 jupiter kernel: tsc: Refined TSC clocksource calibration: 3393.623 MHz
Apr 23 13:33:34 jupiter kernel: clocksource: tsc: mask: 0xffffffffffffffff max_cycles: 0x30eac649395, max_idle_ns: 440795252668 ns
Apr 23 13:33:34 jupiter kernel: clocksource: Switched to clocksource tsc
Apr 23 13:33:34 jupiter kernel: acpi_cpufreq: overriding BIOS provided _PSD data

The fact that it’s the VM that hangs is crucial, I was under impression that the entire host crashed.

First lets find a way to get your clocksource onto tsc.
Set your system to baseline stock clocks. It MUST be stable on stock settings and default to tsc again.

If it is not you may well have a hardware (mainboard/cpu) issue at hand.

1 Like

Yep. I reset to UEFI defaults and enabled CPU virtualisation, now running stock CPU and memory clocks. All three virtual machines featured in the video booted fine first time.

$ cat /sys/devices/system/clocksource/clocksource0/current_clocksource
tsc

And here is the full dmesg output, including starting the virtual machines: dmesg-2.txt (74.6 KB)

2 Likes

Odd question:
Are you by any chance using this system with a PS/2 keyboard?

Nope, all USB peripherals.

Hmmm…

What output does this now give:

journalctl -k | egrep -i "tsc|clocksource"

Also @MarcT can you post your’s too as a comparison?

Apr 23 13:33:34 jupiter kernel: tsc: Fast TSC calibration failed
Apr 23 13:33:34 jupiter kernel: tsc: Using PIT calibration value
Apr 23 13:33:34 jupiter kernel: clocksource: refined-jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 6370452778343963 ns
Apr 23 13:33:34 jupiter kernel: clocksource: hpet: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 133484873504 ns
Apr 23 13:33:34 jupiter kernel: tsc: Fast TSC calibration failed
Apr 23 13:33:34 jupiter kernel: tsc: PIT calibration matches HPET. 1 loops
Apr 23 13:33:34 jupiter kernel: tsc: Detected 3393.265 MHz processor
Apr 23 13:33:34 jupiter kernel: clocksource: tsc-early: mask: 0xffffffffffffffff max_cycles: 0x30e974a77bc, max_idle_ns: 440795308615 ns
Apr 23 13:33:34 jupiter kernel: clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 6370867519511994 ns
Apr 23 13:33:34 jupiter kernel: clocksource: Switched to clocksource tsc-early
Apr 23 13:33:34 jupiter kernel: clocksource: acpi_pm: mask: 0xffffff max_cycles: 0xffffff, max_idle_ns: 2085701024 ns
Apr 23 13:33:34 jupiter kernel: tsc: Refined TSC clocksource calibration: 3393.623 MHz
Apr 23 13:33:34 jupiter kernel: clocksource: tsc: mask: 0xffffffffffffffff max_cycles: 0x30eac649395, max_idle_ns: 440795252668 ns
Apr 23 13:33:34 jupiter kernel: clocksource: Switched to clocksource tsc
Apr 23 13:33:34 jupiter kernel: vboxdrv: TSC mode is Invariant, tentative frequency 3393610709 Hz

Note how calibration fails twice and then matches first hpet and then acpi_pm.
There’s something up with Ryzen and the clocksource calibration, particularly when overclocking.

Apr 23 21:12:25 smeg kernel: tsc: Fast TSC calibration failed
Apr 23 21:12:25 smeg kernel: tsc: Using PIT calibration value
Apr 23 21:12:25 smeg kernel: clocksource: refined-jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 6370452778343963 ns
Apr 23 21:12:25 smeg kernel: clocksource: hpet: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 133484873504 ns
Apr 23 21:12:25 smeg kernel: tsc: Fast TSC calibration failed
Apr 23 21:12:25 smeg kernel: tsc: PIT calibration matches HPET. 1 loops
Apr 23 21:12:25 smeg kernel: tsc: Detected 2993.962 MHz processor
Apr 23 21:12:25 smeg kernel: clocksource: tsc-early: mask: 0xffffffffffffffff max_cycles: 0x2b27fd42f6e, max_idle_ns: 440795240346 ns
Apr 23 21:12:25 smeg kernel: clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 6370867519511994 ns
Apr 23 21:12:25 smeg kernel: clocksource: Switched to clocksource tsc-early
Apr 23 21:12:25 smeg kernel: clocksource: acpi_pm: mask: 0xffffff max_cycles: 0xffffff, max_idle_ns: 2085701024 ns
Apr 23 21:12:25 smeg kernel: tsc: Refined TSC clocksource calibration: 2994.373 MHz
Apr 23 21:12:25 smeg kernel: clocksource: tsc: mask: 0xffffffffffffffff max_cycles: 0x2b2981a153b, max_idle_ns: 440795305653 ns
Apr 23 21:12:25 smeg kernel: clocksource: Switched to clocksource tsc
Apr 23 21:12:25 smeg kernel: vboxdrv: TSC mode is Invariant, tentative frequency 2994354421 Hz
1 Like

Here’s mine (NB: Slackware doesn’t have systemd or journalctl, so dmesg for me):

[email protected]:~# dmesg | egrep -i "tsc|clocksource"
[    0.000000] tsc: Fast TSC calibration failed
[    0.000000] tsc: Using PIT calibration value
[    0.000000] clocksource: refined-jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 1910969940391419 ns
[    0.000000] clocksource: hpet: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 133484873504 ns
[    0.000000] tsc: Fast TSC calibration failed
[    0.002000] tsc: PIT calibration matches HPET. 1 loops
[    0.002000] tsc: Detected 3600.011 MHz processor
[    0.043420] clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 1911260446275000 ns
[    0.310039] clocksource: Switched to clocksource hpet
[    0.322039] clocksource: acpi_pm: mask: 0xffffff max_cycles: 0xffffff, max_idle_ns: 2085701024 ns
[    1.503318] tsc: Refined TSC clocksource calibration: 3600.282 MHz
[    1.503876] clocksource: tsc: mask: 0xffffffffffffffff max_cycles: 0x33e55d41bc4, max_idle_ns: 440795207871 ns
[    2.511970] clocksource: Switched to clocksource tsc
[   31.673139] vboxdrv: TSC mode is Invariant, tentative frequency 3600241268 Hz
1 Like

As an aside, my system does have PS/2 KB & mouse.

Interesting how this has the same timer calibration problem. Then succeeds later.
Something is up with the TSC Calibration code for Ryzen, in particular when overclocking via the BIOS things go out of whack.

Unfortunately kernel timers is not my specialty, I know enough to poke around but this may require an expert to review what exactly is going on here.

Of Note:

On an Intel Ivy Bridge system, not even a single failure of any sort.

Apr 23 09:24:56 saturn kernel: tsc: Fast TSC calibration using PIT
Apr 23 09:24:56 saturn kernel: clocksource: refined-jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 1910969940391419 ns
Apr 23 09:24:56 saturn kernel: clocksource: hpet: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 133484882848 ns
Apr 23 09:24:56 saturn kernel: tsc: Fast TSC calibration using PIT
Apr 23 09:24:56 saturn kernel: tsc: Detected 2494.496 MHz processor
Apr 23 09:24:56 saturn kernel: TSC deadline timer enabled
Apr 23 09:24:56 saturn kernel: clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 1911260446275000 ns
Apr 23 09:24:56 saturn kernel: clocksource: Switched to clocksource hpet
Apr 23 09:24:56 saturn kernel: clocksource: acpi_pm: mask: 0xffffff max_cycles: 0xffffff, max_idle_ns: 2085701024 ns
Apr 23 09:24:56 saturn kernel: tsc: Refined TSC clocksource calibration: 2494.333 MHz
Apr 23 09:24:56 saturn kernel: clocksource: tsc: mask: 0xffffffffffffffff max_cycles: 0x23f45085418, max_idle_ns: 440795285711 ns
Apr 23 09:24:56 saturn kernel: clocksource: Switched to clocksource tsc

If I look at the tsc.c code the calibration issues may have to do with alternative clock sources not being available yet for some reason?

1 Like

That’s progress…! :slight_smile:

Yeah this is pretty much narrowed down to clocksource calibration now.
VM’s in particular are sensitive to this.

Possibly somewhere in between the BIOS update, tsc calibration and overclocking something has broken for sure. I do remember that TSC calibration wasn’t always an issue. On early BIOS versions I had some mad overclocks + running VM’s without any timer issues.

@wendell You might be interested to have a look here. Have a look if you can reproduce this.

2 Likes

Thanks for your input guys, for now I’m just glad I can get some work done. I’ll take a look at overclocking from within Linux later.

2 Likes

So, I updated the ASRock x370 Taichi BIOS 4.60 to 4.70, and also updated the kernel. Now I have this exact same problem. I suspect the BIOS update caused it.

Performance in my Win10 VM in VirtualBox now sucks.

[email protected]:/home/marc# dmesg | egrep -i "clock|TSC"
[    0.000000] tsc: Fast TSC calibration using PIT
[    0.000000] clocksource: refined-jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 1910969940391419 ns
[    0.000000] clocksource: hpet: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 133484873504 ns
[    0.000000] hpet clockevent registered
[    0.000000] tsc: Fast TSC calibration using PIT
[    0.001000] tsc: Detected 3600.317 MHz processor
[    0.022000] TSC synchronization [CPU#0 -> CPU#1]:
[    0.022000] Measured 4775348882 cycles TSC warp between CPUs, turning off TSC clock.
[    0.022000] tsc: Marking TSC unstable due to check_tsc_sync_source failed
[    0.028109] clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 1911260446275000 ns
[    0.244682] acpi PNP0A08:00: _OSC: OS supports [ExtendedConfig ASPM ClockPM Segments MSI]
[    0.291032] clocksource: Switched to clocksource hpet
[    0.302701] clocksource: acpi_pm: mask: 0xffffff max_cycles: 0xffffff, max_idle_ns: 2085701024 ns
[    0.422562] clocksource: tsc: mask: 0xffffffffffffffff max_cycles: 0x33e57f72b0e, max_idle_ns: 440795387883 ns
[    8.610009] PTP clock support registered
[   17.520286] vboxdrv: TSC mode is Invariant, tentative frequency 3800917500 Hz

zenstates output has changed from stock too.

ISTR there is an ASRock auto over-clock in the UEFI, so I’m off to disable that now. BRB

Nope - it wasn’t that. It was the setting labelled “Sapphire Boost”.

I didn’t have it enabled before & switching it off resolved the TSC sync issue that caused poor VM performance/stability.

Not sure if you resolved the issue but I ran into the same issue last night after upgrading from v3.2 to v3.4/v4.90. Stumbled upon your thread to confirm I wasn’t the only one with this issue thought I am using windows instead of linux. But I was also using virtualization for linux… though window vm was having the same performance issue.

created 3 threads in reddit but didn’t get an answer.

Finally ran into this http://forum.asrock.com/forum_posts.asp?TID=8141&title=asrock-ab350-pro4-470-attempted-write-to-readonly saying to revert back to a previous firmware everything is working now as expected. see : https://i.imgur.com/Q7KiQOH.png I couldn’t even get to the installer before

You must flash to bridge 3.4 and then to a stable firmware… for me that was v3.2… I don’t trust v3.3

I hope this helps.

I’m running windows 10 pro 1803

System details: Graphics Chipset - Radeon ™ RX 480 Graphics Windows Version - Windows 10 (64 bit) System Memory - 16 GB CPU Type - AMD Ryzen 5 1600 Six-Core Processor