Iscsi boot real PCs from KVM qcow2 images

EDIT: To save some scrolling it is possible but it isn’t practical because your mixing usermode binaries (qemu-nbd) with kernel modules (lio + targetcli-fb) and the performance is miserable even on bleeding edge fast machines with 10g ethernet.

–Oriiginal Post–
Anyone think it’s possible to make a linked-clone qcow2 as an iscsi target?

I want to plop a customers pc on the counter and pxe boot it into a working updated windows 10 os. Example:

Client boots. ipxe uses client mac address in variable to sanhook iscsi target. Iscsi target host sees it doesn’t have a matching file. Takes existing windows 10 sysprep’d qcow2 and creates a linked-clone matching the requested file. Iscsi host creates iscsi target to match request. Client finally successfully sanhook’s requested iscsi target and does oobe first boot of windows 10. Linked image retains drivers and software through reboots. If customer decides to go for OS reinstall use iscsi target as image source. After, delete linked-image

Can’t tell you how much I like the idea. If any problems rebuilding the customer’s OS you run into them on your image before you commit it to the drive.

And PC shop owners, what’s your method of working on customer computers. If they won’t sign the Liability Waiver I tell them to take it somewhere else. Heard of more than a few small shops getting ruined over lost data. SSDs are cheap. OS loading over a 1GB lan has decent speed. But good lawyer is expensive and our courts are slow.

I’m an old hat with pxe, tftp, nfs, and samba os deployment methods. I’ve got a bit more to learn about iscsi and ipxe to make this work together but I think it’s doable. I could see tail of iscsi log, sed out the requested image to qemu-img -b . Then qcow mount linked-clone to local machine. iscsi target point to new mount point. restart iscsi service. Bet that would make it take two runs of sanhook to work properly for first boot… this method lacks elegance. But I think it would work. Possible workaround is the server could be the dhcp server using dnsmasq, that way it could generate the iscsi target in advance of the sanhook request… better.

Please post thoughts and comments.

Interesting idea.
First off, you can make any device an iscsi target, there’s only a matter of how. You could have the qcow2 target attached to another operating system and present the disk as a target:

  • Microsoft: Starwind vsan free has options to create targets.
  • Linux open-iscsi package has the same capability, but you don’t get a GUI.

You could theoretically add this to the pxe boot config pretty easily.
The only part you’d have to configure on a per-customer basis would be the pxe boot in the bios of their computer (and maybe remove it afterwards).

Let us know if you end up doing this. I’d be interested to hear the results.

Also, side note, there is software that can do this already:
Ansible
Chef
Puppet

Bonus, these can do updates automatically:
Solarwinds
WSUS (Win)
Spacewalk (RHEL)

Edit: Microsoft does not have iscsi targets easily. Can create them with starwind though(free software).

1 Like

Update: Thanks for the response @1ncanus I’m still having fun getting guestfs-tools mounted qcow2 to play nicely with open-iscsi.

Also annoyingly during research I found, [ccboot](https://www.ccboot.com/) which is the windows commercial version of what I’m trying to accomplish. I think it’s using a shared network block boot source and local ram cache for changes. So not sure how it’s handle sysprep’d images. But I need reboot persistence long enough to get a copy of windows ready to write to the customer’s physical drive.

Clarification, my server presently hosts dhcp with ipxe and wimboot capability. Installations occurs from a samba source that’s as scripted as I can get non enterprise versions of windows. I’d like to expand this futher. First using iscsi targets as the windows 10 installation destination. Keeping persistence during a few reboots. And latter flipping the iscsi target to the installation source for the physical (customer’s pc) hard drive. Reinstalling as an image. That could save an enormous amounts of time over the course of a month’s worth of OS reinstalls.

Use case is a bit niche. For the odd times when I get a customer that doesn’t want a new hard drive, or want’s me to use their (not here) drive, or other BS.

Has it been 3 months? A few days ago I booted a zero’d machine to custom usb debian, same usb stick had a copy of my windows 10 vm from qemu. Playing around with no expectation of productivity, I tried

qemu-img convert -p -t qcow -O raw file.qcow2 /dev/nvme0n1

This gave me a clean working operating system on the physical machine without even bothering to sysprep the qcow2 image first!
And it’s activated. ( I didn’t check before I connected to internet so not sure if MS machine recognition or my activation was copied from my VM.)

And I’m back to this project.

qemu-nbd is a somewhat cleaner method for mounting qcow2 images to a local and qemu-proprietary remote share, but yeah, local target that can then be hosted as an iscsi target. I’m still learning the ins and outs of iscsi over ipxe, and I’m not liking this mixing of user space drivers and kernel modules. But I’ll update when I get a working model.

Meanwhile this is not my youtube channel but If anyone can find out who this guy is or recognizes the video style I’d like to buy him a beer.
Youtube - Multi-OS iscsi ipxe boot.

to shill my post a bit;

Well, it isn’t what you are trying to do, i guess i have to read your posts a few more times because i currently don’t get it, but meh.

iPXE can be booted from a USBstick, and might even do what you want in efi mode. It was a small pita to get a bootable iPXE efi, but it is doable.

iPXE is very nicely scriptable and makes choosing machine- individual targets rather easy.

You can either have the script be downloaded over tftp, directed by the DHCP server, or you burn it into the iPXE image.

as far as i have understood your writing (my fault, i’m done for today), you want some kind of dynamic target configuration.
for example: there aren’t any iscsi targets configured jet.
you boot ipxe on the client machine, that does some magic, the server creates a target, tells ipxe and it does something with it.

The ipxe to -> magic part is what i’m currently thinking about.
The best idea i currently have is that you could build a “REST” like interface that iPXE trys to imageload or so, with custom parameters.
The Interface detects those and does something with them.

I mean, i don’t think iPXE comes with any tools to let you do custom POST / GET requests or do TFTP uploads ?
You can enable Ping capabilitys when you build it, otherwise, you can’t even ping.

But hey, its opensource, and linux, so why not frickle something into it ?
If that is what you want / need.

Reread first post:

your dirty sanboot …:$Mac
followed by sanboot …:sysprep
isn’t dirty in my eyes.

to shill my post a bit;

Not a shill when it’s good info.

The ipxe to -> magic part is what i’m currently thinking about.
The best idea i currently have is that you could build a “REST” like interface that iPXE trys to imageload or so, with custom parameters.
The Interface detects those and does something with them.

The iscsi server is also running dnsmasq (dhcp, dns, and tftp) I can scrape the mac address of new clients from the dhcp service with a bit of scripting. That mac is used to create a fresh snapshot (copy on write layer) for every new dhcp client. Then bind the snapshot to a new nbd and then a new iscsi target. Ipxe should support requesting it’s own network mac as a target/lun. That way I keep persistence between machines and reboots. The root image is a sysprep’d fully updated windows 10 image.

As of this writing I have working:

  1. qcow2(the format kvm/qemu virtual machines uses for hard disk images) mounted to the server as a network block device. Thank you qemu-nbd.
    This is where the copy on write layers work. And I can fork off another snap for a new client when needed.
  2. tgt (iscsi server) The nbd (network block device) is a format tgt can recognize and rehost the per client snapshot as an iscsi lun. (kernel version needs module reload which takes too long, and user mode version isnt’ as fast grr)
  3. client machine pxe boots the iscsi lun for it’s own specific mac… but then gives me a boot device inaccessible bsod. maybe sysprep ripped out the network drivers?.. still troubleshooting this. WIP.

I’m a noob when it comes to windows and iscsi so I’m still running into problems when sanbooting the winpe kicks out the sanhooked iscsi drive. Perhaps this can be fixed with multi-lun sanboot. Again, WIP.

Sorry for the long winded update. Thanks @RageBone for all the input. That ibft boot order hack is FFFFFFF inspired. Nice.

your dirty sanboot …:$Mac
followed by sanboot …:sysprep
isn’t dirty in my eyes.

My server can saturate 10Gbs ethernet with NFS. But this iscsi ouput is going through two usermode drivers qemu-nbd and tgt so it’s probably going to be slow. But I don’t need great speed, just functioning OS to get drivers installed on, and to later use as a driver laden baseline. Which fyi booting linux on the client machine and pulling this same image over nfs is fast as FFFF. The baseline qcow is only 17GB the qemu-img process fills in the zero’s on the physical device on the fly. So it takes about 5 minutes for an nvme 10g system.

-Edit spelling, clarity, spelling, more spelling. FFFF it.

1 Like

-WIP of my writeup. Comeback later for finished product. But for now some working content of how I got this project this far. -

I’m using debian sid. So if you ran sid/stable you just need…

apt install tgt libguestfs-tools dnsmasq

–and all the crap for kvm/qemu to build the baseline. Not essential or needed on same machine.

step 1, get gud.
make a windows 10 baseline in kvm/qemu using qcow2 as the disk format.
No I’m not making a guide for that.

step 2 update, install apps, optimize, and sysprep it
Not making a guide for that either.

step 4 I suck at math.

step 5 you want to make a snapshot (copy on write layer) here. I keep my lab snapshots in ram because I got ram for days.

sudo mkdir /mnt/ramdisk
sudo mount -t tmpfs -o size=16G tmpfs /mnt/ramdisk
qemu-img create -f qcow2 -b /path/win10sysprep.qcow2 /mnt/ramdisk/win10<MAC>.qcow2

step 6 mount the qcow2 image file as a network block device.

sudo modprobe nbd
sudo qemu-nbd -c /dev/nbd0 -r -f qcow2 /mnt/ramdisk/win10<MAC>.qcow2
#test
fdisk -l /dev/nbd0

Step 7 tgt (the iscsi service) can base a lun off a network block device like so…

systemctl restart tgt
tgtadm --lld iscsi --op new --mode target --tid 1 -T iqn.test.net:<MAC>
tgtadm --lld iscsi --op new --mode logicalunit --tid 1 --lun 1 -b /dev/nbd0
tgtadm --lld iscsi --op bind --mode target --tid 1 -I ALL
tgtadm --mode target --op show

Step 3 – WIP come back later. dnsmasq scripting to fill in mac address on the fly. I need someone I trust to call me an idiot and peer review me before I post it online. But for now you can statically run these commands to emulate what I have so far.

For testing you can boot the ipxe iso with qemu, crtl b, then:

dhcp
sanboot iscsi:192.168.1.108:::1:iqn.test.net:<MAC>

Step 9 deal with windows 10 iscsi problems WIP

Step 10 ???

Step 11 Profit.

To repeat or rather summarize my findings described at length in my post.

Windows has this Lightweigth Filter Driver that is a core component to the windows system.
It only has a blacklist, and you can only block known and already bound Nics. Since windwos10, it is a bit more resilient to actual changes of the network card itself, but if the system changes enough that the boot-nic is detected as a new and non blacklisted device, the driver will bind to it, and will then cause the “inaccassable_boot_device” bsod.

The currently known way of fixing it, is getting the targetsystem too boot once, probably from an ssd, hdd, or what ever, then have it detect the hardware, go in and blacklist the boot-nic, and then image it back to the iscsiTarget.

Possible options i have thought about are:
Killing the driver completely, which doesn’t seem possible.

Have code run in the kernel on boot, that automatically blacklists all the nics. Which seems to be the most realistic option to me currently.

Quick update. Susscessful boot of window’s 10 from qcow2 over iscsi. I still don’t have the “new snapshot per mac” script working yet. This is a spare time project after all.

@Ragebone does your iscsi solution support using an iso as a lun? I’m thinking of using a multi-lun target with the other target being winpe. So the winpe boots up first but with the second lun being the sysprep’d disk image. Therefore you have winpe’s dism able to patch in essential drivers and perhaps fix the lightweight filter driver. On the next lun_disconnect the server can change the multi-lun sanboot configuration to remove the winpe.

I really hate this idea as I want to to do a little modification to the image as possible. As I would later use it as a gold source image to reimage the machine’s local drive.

Yes, all iscsi target providing softwares, openiscsi or the freebsd one, support Files as targets.
You can choose between Files and Zvols in Freenas, and i think i did boot “win 10 edu.iso” successfully over iscsi to then get “can’t install to this target”

The driver-binding is of cause also a registry key, though it will be a bit tricky to pinpoint it since everything is device id specific and i have no clue if you can change the registry externally.

Thinking about it, you should be able to change the registry externally, since what does the installer do when installing to iscsi target? the same i guess!

thinking about this more:
External reg-Edit from winPe is no problem.

But if you boot winPe first, your target windows will never have seen any of the network devices, and will not have any bindings that you could blacklist. So the question would be, how to create the new bindings, and how to then blacklist the ones you need to blacklist.

Or you boot windows first, have crash miserably, and then boot winPe with the hope, that Windows did create new bindings permanently to then disable the relevant ones.

Does the hotfix work on Win10?

What hotfix? The un-binding of the LWF Driver?

Yes, that works.

I was talking about the hotfix download here: https://support.microsoft.com/en-us/help/976042/windows-may-fail-to-boot-from-an-iscsi-drive-if-networking-hardware-is

But apparently it has been removed. I was wondering if we could set up the hotfix to run at every logout to avoid Inaccessible Boot Device errors.

well, the hotfix isn’t actually applicable in this case, and already part of windows 10.

What was the problem before this patch:
The moment you change one super small thing on the diskless client, for example, you exchange the Nic that you boot over, with an identical one. The only difference is the Nic Serialnumber or MAC address, and windows will crapp itself and BSOD “inaccassable boot device”.

After this patch and in windwos10, you can change certain things around without causing any problems.
But the moment windows thinks that it is a new Nic that you want to boot from, LWF will bind it, and BSOD.

That is also the main reason why a script on shutdown won’t help.
The problem occurs only boot when the kernel detects the hardware and loads drivers for it, in case of new hardware or an error, it binds the LWF driver to the Boot-nic and kills itself.

One possible way would be to have kernelmodule or driver that looks for “blacklisted” Nics, loads first and kills the Bindings to the blacklisted Nics, or even all LWF bindings.

But i have no clue about Windows Kernel Drivers / modules.

So this basically happens when you change hardware/firmware or when Windows updates to a new version (big feature update).

I guess one possible workaround would be Windows LTSB/LTSC versions.

Yes.
Because LWF works through a blacklist, the moment Windows thinks there is new hardware, that hardware isn’t on the blacklist, LWF binds it and windows kills itself.

In windows it was as bad as that changed Device Serials, IDs or PCI ids
caused the Blacklist to not include the Nic.
Wit this patch, at least there is some tolerance so that you can boot Systems with very similar configuration.

The Last Upgrade from 18 to 19 Broke my win install which actually is bug, since windows already knows that it’s being bootet over the network. It should never in any case, fuck that blacklist. Hail the almighty M$ and their flawless software.

1 Like

Information update.
## Markus Partheymueller @ cyberus-technologies.de has an excelent write up for patching windows 7 sysprep’d (golden) images, and the under the hood registry mechanics for getting that to work. I am trying to apply to this to windows 10, but windows isn’t my jam. @ragebone I tested the iscsi performance of my natively installed iscsi client. I haven’t gotten speeds over 60MBs and that’s a nvme 1gb ethernet system than has hit 107MBs pulling from an NFS server. I only compare that to emphasise that microsoft iscsi needs an overhaul or we need to figure out how tweak iscsi for more performance.

Side note, I was using virtual machines earlier so mounting an iso to the machine was cheating. But if you also host apache/http you can set your ipxe script to something like this:

#!ipxe
dhcp
set net0/gateway 0.0.0.0
set keep-san 1
sanhook --drive 0x80 iscsi:192.168.1.30:::1:iqn.vhome.net:windows
initrd http://192.168.1.108/qemu_winpe.iso
chain http://192.168.1.108/memdisk iso raw

doing this is way more convenient to mount my custom winpe but I’m getting a helluva long boot time now. Though the 400ish MBs of winpe loads to ram on the client in seconds. Everything else works fine.

A community help request.

I need to learn about the early process of bootmgr.efi and winload.exe, how it knows which drivers to use for what MS used to call boot critical devices. Specifically I need to inject a network driver into an offline generalized sysprep’d image and add it to the boot time/critical/early driver load process. The method of injection will be a simple winpe boot with the sysprep’d image staged as an iscsi target.

What I really need are reference resources and correct terminology. I am probably using the wrong terminology for half the windows functions I describe as I’m primarily a linux user. From what I’ve learned I can inject drivers to an offline image. What I can’t figure out, is how to make them avaiable to the early boot process.

Let’s recap.

I have a server that can successfully build and boot a fully updated windows 10 diskless client. The client pxe boots then pulls winpe while mounting an iscsi disk from the same server. I have to do the setup /noreboot option to regit out pagefile.sys but that’s fine. Pagefile on diskless systems doesn’t make sense anyway. In addition a custom default sysprep’d image will install if run through the above process using setup /installfrom custom.wim .

The iscsi backend image is full sized and unique to the machine that requested the ipxe boot. IPXE can use client variable like mac to generate a unique reguest for a particular iscsi target. Meaning the server can contain multiple iscsi targets for multiple diskless clients. Good so far.

The server supports block level snapshots, these snapshots can be iscsi targets. Therefore individual diskless machines can retain persistance. The goal is to optimize my server to store a single generalized sysprep’d image and have the snapshots act as differencing disks per diskless client.

The problem with doing this the optimized way is simply drivers. The sysprep image will not contain network drivers for unknown hardware after a sysprep /generalize has run. I am trying to learn how to patch a sysprep image the same way setup does on /installfrom to be iscsi. Without having to do the full install. The reason being your storing too much data per client. See the backend individually unique systems are block level different, therefore the block level deduplicate function of the snapshot engine is useless here. The block level storage simply doesn’t see the file level storage of the ntfs volumes to realize they are 99.9% similar.

I do all this in debian. The linux server hosting windows you say? Blasphemy ! … I get it.
list of function for tags and searchability. I integrated this with pfsense as well for the dhcp options of pfsense are in fact dnsmasq. Still advisable to do seperate host for:
samba tftp ipxe apache2 iscsi target-fb lio .