How to make Linux obey kernel parameters?

My kernel keeps fucking with the pcie max payload size even though I’ve specified pci=pcie_bus_tune_off.
I have confirmed in grub AND /proc/cmdline that the parameter IS being passed to the kernel, yet I get payload size reductions anyways. This kills my system performance and is unacceptable.

This is a libvirt VM.
Rebooting the VM over and over again, while changing absolutely nothing, until this fuckery just goes away on its own is the only way to make the system usable.

I’ve had this issue for a while but just recently learned it’s related to these “payload” thingies. Should be fixed if this payload system were disabled entirely, but this VM’s kernel is ignoring the parameter and screwing with it anyways. Only sometimes. Rebooting the VM repeatedly will reliably eventually produce a state where this behavior is not present and everything runs as it’s supposed to.

How do I force the kernel to ALWAYS obey my parameters? Instead of at complete random, being decided at VM kernel load, as in my current setup?

VM XML

<domain type='kvm' id='2'>
  <name>TESSA1</name>
  <uuid>50ec8055-258b-47e0-a20f-7730cf37c879</uuid>
  <memory unit='KiB'>33554432</memory>
  <currentMemory unit='KiB'>33554432</currentMemory>
  <memoryBacking>
    <nosharepages/>
    <locked/>
  </memoryBacking>
  <vcpu placement='static'>12</vcpu>
  <cputune>
    <vcpupin vcpu='0' cpuset='8'/>
    <vcpupin vcpu='1' cpuset='24'/>
    <vcpupin vcpu='2' cpuset='9'/>
    <vcpupin vcpu='3' cpuset='25'/>
    <vcpupin vcpu='4' cpuset='10'/>
    <vcpupin vcpu='5' cpuset='26'/>
    <vcpupin vcpu='6' cpuset='11'/>
    <vcpupin vcpu='7' cpuset='27'/>
    <vcpupin vcpu='8' cpuset='12'/>
    <vcpupin vcpu='9' cpuset='28'/>
    <vcpupin vcpu='10' cpuset='13'/>
    <vcpupin vcpu='11' cpuset='29'/>
    <emulatorsched scheduler='fifo' priority='1'/>
    <vcpusched vcpus='0' scheduler='fifo' priority='1'/>
    <vcpusched vcpus='1' scheduler='fifo' priority='1'/>
    <vcpusched vcpus='2' scheduler='fifo' priority='1'/>
    <vcpusched vcpus='3' scheduler='fifo' priority='1'/>
    <vcpusched vcpus='4' scheduler='fifo' priority='1'/>
    <vcpusched vcpus='5' scheduler='fifo' priority='1'/>
    <vcpusched vcpus='6' scheduler='fifo' priority='1'/>
    <vcpusched vcpus='7' scheduler='fifo' priority='1'/>
    <vcpusched vcpus='8' scheduler='fifo' priority='1'/>
    <vcpusched vcpus='9' scheduler='fifo' priority='1'/>
    <vcpusched vcpus='10' scheduler='fifo' priority='1'/>
    <vcpusched vcpus='11' scheduler='fifo' priority='1'/>
  </cputune>
  <numatune>
    <memory mode='strict' nodeset='1'/>
  </numatune>
  <resource>
    <partition>/machine</partition>
  </resource>
  <os>
    <type arch='x86_64' machine='pc-q35-7.0'>hvm</type>
    <loader readonly='no' type='rom'>/usr/share/ovmf/OVMFTESSA1.fd</loader>
    <bootmenu enable='yes'/>
  </os>
  <features>
    <acpi/>
    <apic/>
    <hyperv mode='custom'>
      <relaxed state='on'/>
      <vapic state='on'/>
      <spinlocks state='on' retries='8191'/>
      <vpindex state='on'/>
      <runtime state='on'/>
      <synic state='on'/>
      <stimer state='on'/>
      <reset state='on'/>
      <vendor_id state='on' value='randomid'/>
      <frequencies state='on'/>
      <reenlightenment state='on'/>
      <tlbflush state='on'/>
      <ipi state='on'/>
    </hyperv>
    <kvm>
      <hidden state='off'/>
      <hint-dedicated state='on'/>
      <poll-control state='on'/>
    </kvm>
    <vmport state='off'/>
    <ioapic driver='kvm'/>
  </features>
  <cpu mode='host-passthrough' check='none' migratable='off'>
    <topology sockets='1' dies='1' cores='6' threads='2'/>
    <cache mode='passthrough'/>
    <feature policy='require' name='topoext'/>
    <feature policy='require' name='svm'/>
    <feature policy='require' name='apic'/>
    <feature policy='require' name='hypervisor'/>
    <feature policy='require' name='invtsc'/>
  </cpu>
  <clock offset='localtime'>
    <timer name='rtc' tickpolicy='catchup'/>
    <timer name='pit' tickpolicy='delay'/>
    <timer name='hpet' present='yes'/>
    <timer name='hypervclock' present='yes'/>
    <timer name='tsc' present='yes' mode='native'/>
  </clock>
  <on_poweroff>destroy</on_poweroff>
  <on_reboot>restart</on_reboot>
  <on_crash>destroy</on_crash>
  <pm>
    <suspend-to-mem enabled='no'/>
    <suspend-to-disk enabled='no'/>
  </pm>
  <devices>
    <emulator>/usr/bin/qemu-system-x86_64</emulator>
    <controller type='pci' index='0' model='pcie-root'>
      <alias name='pcie.0'/>
    </controller>
    <controller type='pci' index='1' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='1' port='0xf' hotplug='on'/>
      <alias name='pci.1'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x7'/>
    </controller>
    <controller type='pci' index='2' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='2' port='0x9' hotplug='on'/>
      <alias name='pci.2'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x1'/>
    </controller>
    <controller type='pci' index='3' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='3' port='0xa'/>
      <alias name='pci.3'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x2'/>
    </controller>
    <controller type='pci' index='4' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='4' port='0xb'/>
      <alias name='pci.4'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x3'/>
    </controller>
    <controller type='pci' index='5' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='5' port='0xc'/>
      <alias name='pci.5'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x4'/>
    </controller>
    <controller type='pci' index='6' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='6' port='0xd'/>
      <alias name='pci.6'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x5'/>
    </controller>
    <controller type='pci' index='7' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='7' port='0xe'/>
      <alias name='pci.7'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x6'/>
    </controller>
    <controller type='pci' index='8' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='8' port='0x8'/>
      <alias name='pci.8'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x0' multifunction='on'/>
    </controller>
    <controller type='pci' index='9' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='9' port='0x18'/>
      <alias name='pci.9'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
    </controller>
    <controller type='pci' index='10' model='pcie-to-pci-bridge'>
      <model name='pcie-pci-bridge'/>
      <alias name='pci.10'/>
      <address type='pci' domain='0x0000' bus='0x03' slot='0x00' function='0x0'/>
    </controller>
    <controller type='virtio-serial' index='0'>
      <alias name='virtio-serial0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x06' function='0x0'/>
    </controller>
    <controller type='usb' index='0' model='qemu-xhci' ports='15'>
      <alias name='usb'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/>
    </controller>
    <controller type='sata' index='0'>
      <alias name='ide'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x1f' function='0x2'/>
    </controller>
    <serial type='pty'>
      <source path='/dev/pts/1'/>
      <target type='isa-serial' port='0'>
        <model name='isa-serial'/>
      </target>
      <alias name='serial0'/>
    </serial>
    <console type='pty' tty='/dev/pts/1'>
      <source path='/dev/pts/1'/>
      <target type='serial' port='0'/>
      <alias name='serial0'/>
    </console>
    <input type='mouse' bus='ps2'>
      <alias name='input0'/>
    </input>
    <input type='keyboard' bus='ps2'>
      <alias name='input1'/>
    </input>
    <audio id='1' type='none'/>
    <hostdev mode='subsystem' type='pci' managed='yes'>
      <driver name='vfio'/>
      <source>
        <address domain='0x0000' bus='0x0a' slot='0x10' function='0x2'/>
      </source>
      <alias name='hostdev0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x08' function='0x0'/>
    </hostdev>
    <hostdev mode='subsystem' type='pci' managed='yes'>
      <driver name='vfio'/>
      <source>
        <address domain='0x0000' bus='0x09' slot='0x00' function='0x0'/>
      </source>
      <boot order='1'/>
      <alias name='hostdev1'/>
      <address type='pci' domain='0x0000' bus='0x02' slot='0x00' function='0x0'/>
    </hostdev>
    <hostdev mode='subsystem' type='pci' managed='yes'>
      <driver name='vfio'/>
      <source>
        <address domain='0x0000' bus='0x48' slot='0x00' function='0x0'/>
      </source>
      <boot order='2'/>
      <alias name='hostdev2'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0' multifunction='on'/>
    </hostdev>
    <hostdev mode='subsystem' type='pci' managed='yes'>
      <driver name='vfio'/>
      <source>
        <address domain='0x0000' bus='0x48' slot='0x00' function='0x1'/>
      </source>
      <boot order='3'/>
      <alias name='hostdev3'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x1'/>
    </hostdev>
    <hostdev mode='subsystem' type='pci' managed='yes'>
      <driver name='vfio'/>
      <source>
        <address domain='0x0000' bus='0x49' slot='0x00' function='0x3'/>
      </source>
      <alias name='hostdev4'/>
      <address type='pci' domain='0x0000' bus='0x01' slot='0x00' function='0x0'/>
    </hostdev>
    <memballoon model='none'/>
    <shmem name='looking-glass'>
      <model type='ivshmem-plain'/>
      <size unit='M'>32</size>
      <alias name='shmem0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/>
    </shmem>
  </devices>
  <seclabel type='dynamic' model='dac' relabel='yes'>
    <label>+0:+64055</label>
    <imagelabel>+0:+64055</imagelabel>
  </seclabel>
</domain>
1 Like

Try pcie_bus_perf instead of pcie_bus_tune_off.

@mathew2214 may I ask what kind of scenario you have that this parameter makes noticeable differences? I have never heard of anyone using or needing this option so you piqued my interest.

1 Like

yes, i actually tried setting it to perf before i tried just disabling it.
off, perf, and safe all have the same behavior.
sometimes the VM starts, messes with the payload sizes, and runs terribly.
touch nothing and reboot VM until the kernel eventually doesnt have any messages regarding pcie payload size, and is running perfectly.

these payload size message are the only differences in dmesg between the VM’s perfect working state and it’s misbehaving state.

if i cannot get this to just work, then i will just make me rc.local check for these messages in the dmesg and reboot if theyre found. doing this manually everytime the guest needs to reboot for any reason is very, very annoying.

the parameter seems to make zero difference, because it is being ignored. that is my question here.

1 Like

You have remembered to regenerate the initramfs, yes?

1 Like

is that necessary just to change parameters passed to the kernel?
i see no reason why my initramfs is involved here, but i wlll forcibly regenerate and retest.

1 Like

Might not be, but honestly I am a bit out of ideas here since I know nothing about this parameters bahaviour.

1 Like

Just as I predicted. No change in behavior.
Which means one or both of the following must be true:

  • The initramfs is not involved in pcie payload sizes
    Or
  • Pcie payload sizes is the incorrect rabbit hole to lead me to a solution.

I’m leaning towards the former as the kernel makes zero mention of any payload sizing when operating normally, only in the misbehaving state is it mentioned in dmesg. This is also the only difference in dmesg between the two states, and seems to be the only way to systematically and objectively tell apart the two states. Aside from physically observing the unacceptable performance.

1 Like

Can you please provide the output you have read in the logs?

1 Like

So I have read a mailing list conversation about this parameter and it changes with which message sizes your PCIe devices talk to one another. I can not imagine that anyone except maybe hyperscalers and the like need to change these settings. My guess is that this is just a symptom of a problem, much like the degraded performance is. The first step I would take here is revert or test without all the virtualization settings you have in you configuration. You have set about a dozen options I have never used or needed to use and my guess is that they might interfere with one another. The second guess is that you have some sort of problem with PCIe devices and the message about the message sizes are a result of a device misbehaving.

1 Like

i tried to upload my dmesg logs as you asked, but it seems i dont have enough perms on the website.

to which settings are you referring to? afaik my VM has nothing out of the ordinary for a typical VM with PCIe pass-through. also: the VM does run absolutely flawlessly when it decides to not screw itself over. surely hardware failure would be more reliably unacceptable, right?

i am willing to change the VM’s settings and report results, just need to know which options specifically you might suspect to be the issue.

No idea what this is …

… also no idea what this stuff does …

… use nothing of that either …

… also don’t know what this is for.

1 Like

all of those are strictly things that have been suggested to me to improve gaming performance of the VM, by various sources, mostly old redhat forums. shouldnt be anything too critical in there.
anyways, i will backup my current config, and retest with those options removed.

See, thing is I have none of those and my virtual machine runs perfectly fine. My suggestion here is when you don’t understand what the option does, don’t use it.

For example the following enables SMT on AMD processors …

… so I have no idea what the following is supposed to do here …

… since it enables nested virtualization, which is not recommended until you know what you are doing.

1 Like

i am unable to remove the seclabel block, virsh refuses to even show me that block in the editor.

Either leave it be for the moment or create a new virtual machine, add only the passthrough device, the topoext-line, the looking glass device and use the block devices you use with the current virtual machine.

1 Like

Did you do this on the host as well?

1 Like

just finished testing with those things removed. no observable or logged change in VM behavior.

no, the host is behaving perfectly, all the time. i had no reason to suspect it of being at fault. i could try those things in the host, but it will have to wait until the next time i can get away with downing the host long enough to reboot it. likely tomorrow morning if not tonight.

I thought you tested it on the host! It is certainly not intended to be used in a virtual machine!

1 Like

Just tried it. Renders the VM completely unusable. The VM doesn’t even load amdgpu when those parameters are passed to the host. AND the VM still messes with the payload sizes despite that being disabled on both host and guest.

One again I ask, how do I force my guest to obey these parameters?