Hibernating a QEMU VM (from inside) sometimes fails to resume

So i have a Linux Qemu VM (AMD 5950X Linux host, OVMF) with an Nvidia GPU and a USB controller passed through to it, and i want to hibernate it, the usual way from inside.

I’ve set up a swap disk for it, added the resume kernel parameter and so on, and on systemd hibernate it does hibernate just fine.

Qemu quits, i restart it, and there is a 50% chance that the VM would un-hibernate and work fine, GPU and all.

This post is about the other 50%, when it fails to resume and does a clean boot.

The error is:

[    9.860061] Hibernate inconsistent memory map detected!
[    9.860062] PM: hibernation: Image mismatch: architecture specific data
[    9.860065] PM: hibernation: Read 13405188 kbytes in 0.01 seconds (1340518.80 MB/s)
[    9.860960] PM: Error -1 resuming
[    9.860963] PM: hibernation: Failed to load image, recovering.
[    9.861363] PM: hibernation: Basic memory bitmaps freed

On some googling i found that it’s because the e820 memory map is slightly different for some reason.
Boot #1:

[    0.000000] BIOS-e820: [mem 0x0000000000000000-0x000000000009ffff] usable
[    0.000000] BIOS-e820: [mem 0x0000000000100000-0x00000000007fffff] usable
[    0.000000] BIOS-e820: [mem 0x0000000000800000-0x0000000000807fff] ACPI NVS
[    0.000000] BIOS-e820: [mem 0x0000000000808000-0x000000000080ffff] usable
[    0.000000] BIOS-e820: [mem 0x0000000000810000-0x00000000008fffff] ACPI NVS
[    0.000000] BIOS-e820: [mem 0x0000000000900000-0x000000007df09fff] usable
[    0.000000] BIOS-e820: [mem 0x000000007df0a000-0x000000007df0afff] reserved
[    0.000000] BIOS-e820: [mem 0x000000007df0b000-0x000000007e8b4fff] usable
[    0.000000] BIOS-e820: [mem 0x000000007e8b5000-0x000000007e8b8fff] ACPI NVS
[    0.000000] BIOS-e820: [mem 0x000000007e8b9000-0x000000007e8bafff] ACPI data
[    0.000000] BIOS-e820: [mem 0x000000007e8bb000-0x000000007e8c2fff] ACPI NVS
[    0.000000] BIOS-e820: [mem 0x000000007e8c3000-0x000000007e8dafff] reserved
[    0.000000] BIOS-e820: [mem 0x000000007e8db000-0x000000007e8fafff] usable
[    0.000000] BIOS-e820: [mem 0x000000007e8fb000-0x000000007e91afff] reserved
[    0.000000] BIOS-e820: [mem 0x000000007e91b000-0x000000007f99afff] usable
[    0.000000] BIOS-e820: [mem 0x000000007f99b000-0x000000007f9cafff] type 20
[    0.000000] BIOS-e820: [mem 0x000000007f9cb000-0x000000007f9f2fff] reserved
[    0.000000] BIOS-e820: [mem 0x000000007f9f3000-0x000000007f9fafff] ACPI data
[    0.000000] BIOS-e820: [mem 0x000000007f9fb000-0x000000007f9fefff] ACPI NVS
[    0.000000] BIOS-e820: [mem 0x000000007f9ff000-0x000000007fe5ffff] usable
[    0.000000] BIOS-e820: [mem 0x000000007fe60000-0x000000007fe7ffff] reserved
[    0.000000] BIOS-e820: [mem 0x000000007fe80000-0x000000007fffffff] ACPI NVS
[    0.000000] BIOS-e820: [mem 0x00000000b0000000-0x00000000bfffffff] reserved
[    0.000000] BIOS-e820: [mem 0x00000000ffe00000-0x00000000ffffffff] reserved
[    0.000000] BIOS-e820: [mem 0x0000000100000000-0x0000000e7fffffff] usable
[    0.000000] NX (Execute Disable) protection: active
[    0.000000] efi: EFI v2.7 by EDK II
[    0.000000] efi: SMBIOS=0x7f9cc000 ACPI=0x7f9fa000 ACPI 2.0=0x7f9fa014 MEMATTR=0x7ea21018
[    0.000000] efi: Remove mem49: MMIO range=[0xffe00000-0xffffffff] (2MB) from e820 map
[    0.000000] e820: remove [mem 0xffe00000-0xffffffff] reserved
[    0.000000] SMBIOS 2.8 present.

Boot #2:

[    0.000000] BIOS-e820: [mem 0x0000000000000000-0x000000000009ffff] usable
[    0.000000] BIOS-e820: [mem 0x0000000000100000-0x00000000007fffff] usable
[    0.000000] BIOS-e820: [mem 0x0000000000800000-0x0000000000807fff] ACPI NVS
[    0.000000] BIOS-e820: [mem 0x0000000000808000-0x000000000080ffff] usable
[    0.000000] BIOS-e820: [mem 0x0000000000810000-0x00000000008fffff] ACPI NVS
[    0.000000] BIOS-e820: [mem 0x0000000000900000-0x000000007e38afff] usable
[    0.000000] BIOS-e820: [mem 0x000000007e38b000-0x000000007e38bfff] reserved
[    0.000000] BIOS-e820: [mem 0x000000007e38c000-0x000000007e8b4fff] usable
[    0.000000] BIOS-e820: [mem 0x000000007e8b5000-0x000000007e8b8fff] ACPI NVS
[    0.000000] BIOS-e820: [mem 0x000000007e8b9000-0x000000007e8bafff] ACPI data
[    0.000000] BIOS-e820: [mem 0x000000007e8bb000-0x000000007e8c2fff] ACPI NVS
[    0.000000] BIOS-e820: [mem 0x000000007e8c3000-0x000000007e8dafff] reserved
[    0.000000] BIOS-e820: [mem 0x000000007e8db000-0x000000007e8fafff] usable
[    0.000000] BIOS-e820: [mem 0x000000007e8fb000-0x000000007e91afff] reserved
[    0.000000] BIOS-e820: [mem 0x000000007e91b000-0x000000007f99afff] usable
[    0.000000] BIOS-e820: [mem 0x000000007f99b000-0x000000007f9cafff] type 20
[    0.000000] BIOS-e820: [mem 0x000000007f9cb000-0x000000007f9f2fff] reserved
[    0.000000] BIOS-e820: [mem 0x000000007f9f3000-0x000000007f9fafff] ACPI data
[    0.000000] BIOS-e820: [mem 0x000000007f9fb000-0x000000007f9fefff] ACPI NVS
[    0.000000] BIOS-e820: [mem 0x000000007f9ff000-0x000000007fe5ffff] usable
[    0.000000] BIOS-e820: [mem 0x000000007fe60000-0x000000007fe7ffff] reserved
[    0.000000] BIOS-e820: [mem 0x000000007fe80000-0x000000007fffffff] ACPI NVS
[    0.000000] BIOS-e820: [mem 0x00000000b0000000-0x00000000bfffffff] reserved
[    0.000000] BIOS-e820: [mem 0x00000000ffe00000-0x00000000ffffffff] reserved
[    0.000000] BIOS-e820: [mem 0x0000000100000000-0x0000000e7fffffff] usable
[    0.000000] NX (Execute Disable) protection: active
[    0.000000] efi: EFI v2.7 by EDK II
[    0.000000] efi: SMBIOS=0x7f9cc000 ACPI=0x7f9fa000 ACPI 2.0=0x7f9fa014 MEMATTR=0x7ea48118
[    0.000000] efi: Remove mem51: MMIO range=[0xffe00000-0xffffffff] (2MB) from e820 map
[    0.000000] e820: remove [mem 0xffe00000-0xffffffff] reserved
[    0.000000] SMBIOS 2.8 present.

Line 6 is reserved in slightly different places, and this throws off the resume.

Best i can google it’s some kind of a kernel bug that the kernel devs refused to fix because the BIOS/UEFI should be doing things right and it’s not their job to work around BIOS bugs.

So the question is - is there a way around this?
Somehow convince Qemu or OVMF to provide a consistent table between runs?
Somehow get the kernel to ignore such shifts or blacklist the whole range or something?

Qemu config:

qemu-system-x86_64 \
-nodefaults \
-nographic \
-enable-kvm \
-m 57344 -mem-path /dev/hugepages \
-cpu host,kvm=off,hv-vendor-id=PC,hv-frequencies=on,hv-reenlightenment=on,hv-relaxed=on,hv-reset=on,hv-runtime=on,hv-spinlocks=4096,hv-time=on,hv-stimer=on,hv-stimer-direct=on,hv-synic=on,hv-vapic=on,hv-vpindex=on \
-smp cores=32,threads=1,sockets=1 \
-machine q35,vmport=off,kernel_irqchip=on \
-drive if=pflash,format=raw,readonly=on,file=ovmf_code.fd \
-drive if=pflash,format=raw,file=ovmf_vars-1024x768.fd \
-smbios type=2 \
-netdev user,id=net0,hostfwd=tcp::5002-:22,hostfwd=tcp::5902-:5900 \
-device e1000,netdev=net0,mac=00:25:4B:00:00:02 \
-device nvme,drive=nvme0,serial=deadbeaf1,max_ioqpairs=8 -drive file=vm2_sys.qcow2,if=none,id=nvme0 \
-device nvme,drive=nvme1,serial=deadbeaf2,max_ioqpairs=8 -drive file=vm5_aux.qcow2,if=none,id=nvme1 \
-device nvme,drive=nvme2,serial=deadbeaf3,max_ioqpairs=8 -drive file=vm5_games.qcow2,if=none,id=nvme2 \
-device nvme,drive=nvme3,serial=deadbeaf4,max_ioqpairs=8 -drive file=data.qcow2,if=none,id=nvme3 \
-device nvme,drive=nvme4,serial=deadbeaf5,max_ioqpairs=8 -drive file=swap.qcow2,if=none,id=nvme4 \
-device pcie-root-port,chassis=1,id=root1,bus=pcie.0 \
-device vfio-pci,host=0a:00.0,bus=root1,multifunction=on,addr=00.0 \
-device vfio-pci,host=0a:00.1,bus=root1,addr=00.1 \
-device vfio-pci,host=0c:00.3 \
-vga none
1 Like

I worked around the issue by adding memmap=48M\$0x7d000000 to kernel command line, blacklisting the entire top of the 2Gb range where all the wiggly reserved sections are. So far it seems to be working reliably.

I feel like this is kind of a hack, however, so the question remains open.

1 Like

Sigh. Some more use later, apparently all this did was make it happen slightly less often.

It still restarts blank even with the blacklist workaround, same error, and i’m not entirely sure why now.

WTF.

Ok, so the check that actually fails is in the arch/x86/power/hibernate.c:

int arch_hibernation_header_restore(void *addr)
{
        struct restore_data_record *rdr = addr;

        if (rdr->magic != RESTORE_MAGIC) {
                pr_crit("Unrecognized hibernate image header format!\n");
                return -EINVAL;
        }

        restore_jump_address = rdr->jump_address;
        jump_address_phys = rdr->jump_address_phys;
        restore_cr3 = rdr->cr3;

        if (rdr->e820_checksum != compute_e820_crc32(e820_table_firmware)) {
                pr_crit("Hibernate inconsistent memory map detected!\n");
                return -ENODEV;
        }
 
        return 0;
}

It detects inconsistency by comparing the checksum of the e820 region, which is why blacklisting the area didn’t help.

I’m going to try patching this check out, along with using the blacklisting, and see what happens.
The flickering region does not appear to be anything occupied by something important like MMIO or ACPI tables or something else, so it should be safe to ignore.

So, if anyone knows any reason why this might be a bad idea, please speak up before i find out the hard way.

Here is the rationale for when the check was added back in 2016: [PATCH][v12] PM / hibernate: Verify the consistent of e820 memory map by md5 digest - Chen Yu

This is a potential and rare case we need to deal with in OS in the future.
The future is now, Chen.

Looking through OVMF code, i can’t quite figure out what that range is. It also seems to be receiving some map from Qemu, in which’s code there is also no immediate clarity as to what range can be reserved and for what.
So that does not help unless i really start digging in with debug enabled, for which i am too lazy for now.

1 Like

Nice Sleuthing. Thanks for being detailed in your follow ups.

1 Like

i have to ask, and maybe there is a real reason, why?

why can’t it just be on or off? like, maybe this is a broader question but why in this world of NVME storage and ultra low power APUs, why does hibernate even still exist at all?

1 Like

A mix of habit and peace of mind? I tend to have a lot of terminals, IDEs, pdfs open and don’t like to have to write it all down and reopen afterwards every day. So it’s nice to have the state saved and restored every day.

Why not leave it on? It got fans in it that are never perfectly idle.

For laptops, hibernate is useful for when the battery is about to run out.

NVME storage is what makes it practical, since it only takes seconds to save or restore the state.

2 Likes