[Guide]Manual XCP-ng installation on OVHCloud/SoYouStart Server - Suffering through weird issues so you don't have to

Bit of a background, I’ve been struggling securing proxmox on my new host that I got, and Wendell just suggested to use XCP-ng instead. One issue though, OVH doesn’t offer XenServer or XCP-ng on most of their servers and manually installing it is… pain. Decided to write the guide because I encountered a surprising amount of issues trying to get it up and running, and some unfortunate soul in the future might find this lifesaving

Server specifications this guide was made with:
SYS-LE-4 Server
Intel Xeon E3-1230v6
32GB DDR4 ECC 2400MHz
2x 450GB NVMe

This guide is more to serve as extra information addition for the normal setup documentation to include OVH specific quirks
https://xcp-ng.org/docs/install.html

Prerequisites

  • OVH, OVHCloud or SoYouStart server that has IPMIv1 with iKVM or IPMIv2 with iKVM capability
  • Java installed to use KVM on either Windows or Linux (Sorry Mac users, KVM for that is broken)
  • XCP-ng netinstall ISO
  • Monitoring DISABLED in the control panel for this specific server

Step 1: Zap the drives if you already have some system installed
Due of a bug with software raid setup not wiping drives, you have to boot into OVH’s Rescue distro and zap each drive/nvme with dd to make them appear clean.

root@rescue:~# dd if=/dev/urandom of=/dev/nvme0n1 bs=1M count=2
2+0 records in
2+0 records out
2097152 bytes (2.1 MB) copied, 0.0130416 s, 161 MB/s
root@rescue:~# dd if=/dev/urandom of=/dev/nvme1n1 bs=1M count=2
2+0 records in
2+0 records out
2097152 bytes (2.1 MB) copied, 0.0125005 s, 168 MB/s

Step 2: Booting into XCP-ng installer
Primary way:
Bring up the iKVM window, mount the ISO under Device>Redirect ISO and reboot the host. Be prepared to press F6 to get to the boot manager (Soft keyboard recommended) and select the CD Drive. Once you see XCP-ng GRUB menu and pick your option, go make a cup of coffee or tea.

OVH’s ISO redirection is slow and it takes between 15 to 30 minutes for the Netinstall ISO image to start booting. Its not stuck, you just have to wait.

Secondary way:
OVH uses iPXE on their infrastructure, and if you have another server handy, you can load XCP-ng installer through iPXE.
https://xcp-ng.org/docs/install.html#ipxe-over-http-install

In case you start seeing garbled text or glitching during boot like this, close the KVM, go into the control panel and restart IPMI as your keyboard input is going to not work (or worse ISO redirection stops working randomly and your install media breaks until you restart the session).

If you see “[DEPEND] Dependency failed for XCP-ng installer” during boot or are stuck at “Started Update UTMP about System Runlevel Changes.” with installer not starting:
There’s a bug in XenServer XSO-967 that sometimes causes this to happen and nobody knows why. To get around it:

  1. Switch to tty2 with ctrl+alt+F2
  2. Check the status and manually start systemd-udev-settle.service using systemctl start systemd-udev-settle.service
  3. Run /opt/xensource/installer/init to start the installer. It will display syslog messages during install, just so you are aware if things start to look weird. I needed to do this on the SYS-LE-4 server SKU even when the bug is mainly caused by HP hardware.

Step 3: Setup RAID

If during installation when you get asked to pick a primary disk, and see something similar to this without you setting up Software RAID yet, YOU HAVE TO zap the drives as mentioned in step 1 with dd or the installation will FAIL and host will try to boot into broken old OS.

What you should be seeing before setting up software raid
image

Pick whatever you think is best for you. I went with Raid 1 and LVM/Thick provisioned storage.

If you are stuck on blue screen after setting up RAID:
Gracefully reboot the host with CTRL+ALT+DEL and boot back into the installer after its stuck for a while. The raid array will get automatically fixed if something broke during initialization and XCP-ng should automatically go straight to picking VM storage after agreeing to EULA or letting you pick the new raid you created before reboot.

Step 4: Network Setup

Set it to static configuration.

IP Address: The IP address you received with the server and your reverse points at, NOT one of the additional addresses assigned to you by RIPE/IANA.
Subnet Mask: /24 = 255.255.255.0
Gateway: Your IP address with the last ocet being 254
Nameserver: DNS server of your choice.

Do not change the settings for the management interface and keep the eth0 settings if prompted.

.

After this it should be smooth sailing from here and one reboot later you’ll get to enjoy your fancy XCP-ng hypervisor. Now go secure it. You can also probably enable monitoring now. I did without OVH engineers screaming at me.

3 Likes

You can use all the fancy features for free of xcp ng some just gotta diy it. Fyi. Nice work. Is more better? :slight_smile:

Its fine, but with XCP-ng I have the same exact issue I had with Proxmox, where I can’t put the management interface and SSH inside of the internal VM network to stop it being exposed to the public. This time I can’t even attempt to assign the management to the virtual NICs. Its a simple thing to do in ESXi with vSwitches since you can assign the management to a virtual interface for these scenarios but XCP-ng is just not capable of it apparently (Neither XOA, XCP-ng Center or xe will allow it).

Yes I can disable the management via console but by doing that I will lose access to most functions like backup which is the opposite of what I want to do. Management interface being exposed to the public is really bad practice and I’m getting increasingly frustrated with trying to make it somehow secure to the point where I want to jump ship to ESXi.

What about ssh forwarding the port? You can allow an inbound connection via connecting a port in your local machine to the root port? The only thing open before you install the VM to manage is ssh and the helper “omg install the appliance” web page?

Once you install XCP-ng, all type of management is open wide.

I need to lockdown SSH as well as you cannot disable ssh access to the root account, you cannot add any other system users without joining the server to an active directory server, you cannot use SSH keys for authentication, only for CloudInit VM deployment and installing denyhost or fail2ban to ease the spambots can destroy the system.

XenAPI and HA clustering stuff is also open wide. Just take a peek at the default iptables (Ignore Zabbix that was added by me). I didn’t even install the XOA locally on the hypervisor itself and I could access the server via Xen Orchestra running inside of Docker in my homelab (Controlling XCP-ng from a Hyper-V portainer VM, now that’s fun).

# sample configuration for iptables service
# you can edit this manually or use system-config-firewall
# please do not ask us to add additional ports/services to this default configuration
*filter
:INPUT ACCEPT [0:0]
:FORWARD ACCEPT [0:0]
:OUTPUT ACCEPT [0:0]
:RH-Firewall-1-INPUT - [0:0]
-A INPUT -j RH-Firewall-1-INPUT
-A FORWARD -j RH-Firewall-1-INPUT
-A RH-Firewall-1-INPUT -i lo -j ACCEPT
-A RH-Firewall-1-INPUT -p icmp --icmp-type any -j ACCEPT
# Zabbix
-A RH-Firewall-1-INPUT -p tcp --dport 10050 -j ACCEPT
# DHCP for host internal networks (CA-6996)
-A RH-Firewall-1-INPUT -p udp -m udp --dport 67 --in-interface xenapi -j ACCEPT
-A RH-Firewall-1-INPUT -m conntrack --ctstate ESTABLISHED,RELATED -j ACCEPT
# Linux HA hearbeat (CA-9394)
-A RH-Firewall-1-INPUT -m conntrack --ctstate NEW -m udp -p udp --dport 694 -j ACCEPT
-A RH-Firewall-1-INPUT -m conntrack --ctstate NEW -m tcp -p tcp --dport 22 -j ACCEPT
-A RH-Firewall-1-INPUT -m conntrack --ctstate NEW -m tcp -p tcp --dport 80 -j ACCEPT
-A RH-Firewall-1-INPUT -m conntrack --ctstate NEW -m tcp -p tcp --dport 443 -j ACCEPT
# dlm
-A RH-Firewall-1-INPUT -p tcp -m tcp --dport 21064 -j ACCEPT
-A RH-Firewall-1-INPUT -p udp -m multiport --dports 5404,5405 -j ACCEPT
-A RH-Firewall-1-INPUT -j REJECT --reject-with icmp-host-prohibited
COMMIT

Xen Orchestra also doesn’t have all supported XCP-ng features. I need to jump between the XCP-ng Center windows client, xe CLI and XOA if I want to switch the management interface properly.
Alternatively you can do it from local console/IPMI

jp2launcher_2021-07-10_23-29-20

What my current plan is:
Since I got a static IP, pfSense and OpenVPN working, I can modify the iptables to accept access to management only from that static IP from within the local VM network.

I know I will have to make bunch of static routes routes and get them pushed via OpenVPN somehow with “force all traffic to go through VPN” option disabled.

This also gives an easy configuration rollback in case the pfSense VM decides to die. I can just comment out the new rules, uncomment the rules that allow all management to be exposed, restart iptables service and voala I’m in to fix stuff. I have no idea how I’m going to handle backups with this configuration.

1 Like