Noob Question: Why Run VMs?

ulzeraj · March 1, 2024, 2:50pm

This kind of passthrough has the disadvantage of being locked to host’s drivers, which may become incompatible with container runtime - better to keep them in sync. Also if you crash driver you may need to reboot the full host instead of just the VM.

I haven’t encountered this situation with my personal stuff. I run Stable Diffusion, Text-Generation-Webui and similar software on a rootless podman container. I just need to pass /dev/kfd and /dev/dri/* paths to the container. I think transcoding is just as easy if not easier.

ack · March 2, 2024, 3:44am

I’m not so sure about that…Hardware stuff gets a lot of press. But I think more relate’s to the old quote of Schneier - “The very definition of news is something that hardly ever happens”.

Meanwhile the borderline disaster of software security gets relatively little press because buffer overflows and friends are such an old problem and are constantly still being discovered they have become in the general case non-news.

Consider Debian -- Security Information, since I use Debian:

February 28, 2024…chromium…could result
in the execution of arbitrary code…
February 23, 2024…thunderbird…could
result in denial of service or the execution of arbitrary code…
February 23, 2024… chromium…could result
in the execution of arbitrary code…
February 22, 2024… imagemagick…potentially the
execution of arbitrary code…
February 21 2024…firefox-esr…could potentially result in the execution of arbitrary code…

I’m not at work so have little interest to dig deeper into those specific issues, a lot of that I expect/hope won’t be easily exploitable, but that’s only in the last 10 days and are major/popular projects that many of us will have installed. If I look at 365 days I’m bound to find a whole lot more execution of arbitrary code possibilities, some of which will be exploitable in my setup.

I would go as far to say, if we had to bet on which gets rooted first:

An Internet server that was running on just released hardware/firmware/ucode but software from a year ago
An Internet server that was running on just released software but hardware/firmware/microcode from a year ago

I’d be throwing a decent sum of money on the first!

ack · March 2, 2024, 4:06am

Linux containers are typically a user friendly way to use Linux namespacing features, so I’d start experimenting with:
ip link set eth1 netns $CONTAINER_NETNS

Would be nice if docker had a user friendly way to do this plumbing, maybe it does, haven’t seen it.

risk · March 2, 2024, 7:01am

You don’t have to choose, do whatever comes naturally.

In some cases it takes more elbow grease and smarts to build software that results in a good end user installation experience.

VMs have an advantage where they look more like a computer, which is what non developers are used to. Just think about it: “why doesn’t all software ship in containers”.

Here’s where VMs are doing better:

High availability - as weird as it may sound, to build HA stuff apps need to support it, Proxmox lets you migrate. Or at least, you can snapshot stuff more easily for backup to another host, criu gvisor and friends aren’t there yet for containers.
Containerization - I’ve seen a bunch of software that runs in a container, but instructions then say: “oh you need another postgres/mariadb container, oh and another nginx container, oh and a grafana container” - and then the example docker-compose.yml comes with all ports exposed instead of sharing tmpfs sockets, stuff on containers starting as root, and no project name in compose file, so noobs run into conflicts.
Standardization - docker? podman? docker-compose or docker compose, k3s, k8s, OCI or non-oci, and then there’s trucharts and lxc and lxd. Systemd can also spawn whole OS containers, but nobody cares for some reason - it might even be broken I haven’t bothered in half a decade.
Networking - isolating a container is simpler for the end user, they make a new bridge, they hook it up with a vlan with some UI clicks, vlan goes to a vlan supporting router they’re already running where with some clicks they have firewall rules and can route things, and voila. You can do macvlans, ipvlans, physical interfaces with containers… but docker basically only runs veth on bridges and barely macvlans. If you wanted to give a physical interface through a docker-compose.yml you can’t. If you wanted to provision a wireguard and pass it into a container through docker-compose.yml, you need to cobble up your own “plugins” or “provisioners”. VXLANs provisioning is a paid premium nonsense feature of swarm that people think solves their problems but doesn’t and having to pay leaves people salty. With VMs in contrast, the guest OS does networking probably with just DHCP out of the box, and network people have clear instructions for physical interface stuff.

A lot of these “problems” are emerging properties of previous organizational situations, and fixing them is like tail wagging the dog, it’s incredibly hard and you’d need incredible talent across a diverse set of technologies and people skills.

Make project names mandatory in compose
Make af_unix filesystem sockets a first class citizen in compose
Make compose work with k3s/k8s
Make OCI image format default everywhere
Make physical interfaces / preexisting interfaces, first class citizens in docker or docker-compose

SytheZN · March 2, 2024, 12:10pm

Personally, I run proxmox as my hypervisor because I’ve got a single “server” for my whole homelab, and it makes sense for me to do so with my workload.

That server runs

PFSense in a vm to manage networking & firewall for the whole network.
TrueNAS in vm (gasp) with a 16 port SAS card mapped to provide network storage.
A couple of Fedora server boxes for projects where I want dedicated compute resources (pinned cores, reserved RAM, GPUs assigned).
3x templated arch vms hosting my kubernetes cluster that runs all the other bits and bobs.
Other random boxes as needed, but they usually don’t last more than a few weeks.

One of the things I like about running things this way is theres “physical seperation” between processes for the critical stuff, and everything else I can reboot as needed without having to drop my whole network.

I also like that trying out new stuff (a recent example is terraform) doesn’t require me to break anything else in my setup.

maximal · March 2, 2024, 12:24pm

Dont really get how you think c# makes you needing to do a windows syscall any more likely than working with virtually anything else. You need windows syscalls if you decide to do something that would need one. Language is irrelevant as far as that goes. Replace c# with virtually anything else and you are going to write the same syscall.

Darin755 · March 2, 2024, 8:12pm

It depends on what you are trying to do. The real benifit of VMs is that they are a complete system. That means that I will never run into issues with incompatibility and each one is configured the way I need it to be configured.

Containers are good a single applications but you can’t necessarily use then to run games or to do more complex tasks.

BackBlast · March 3, 2024, 12:39am

It really depends on what you want to do.

Containers offer some advantages. Don’t need a dedicated isolated memory pool. Performance is much better, particularly for disk operations as you’re just copying the data around less.

If you are on a non-root base container system it gives you reasonable isolation. But root based systems, if they are jail broken your entire hypervisor is compromised. Which means that your surface area is basically the kernel, and kernel CVEs happen.

VMs have a more complete feature set like letting the abstraction handle it’s own firewall. The security boundary is better. You pay for this in performance and resource utilization.

You can also run containers in VMs BTW, no reason you can’t run both at the same time.

compy386 · March 3, 2024, 3:20pm

I noticed alot of people mention security. That makes sense, but do you really need the posture at home? I think it’s often “because I can”, not “because I have to”. That said, these are homelabs so it makes sense to try new things or go above and beyond what is really required.

Adures · March 4, 2024, 7:33am

I have 1 server with xcp-ng ( I might add second one in future).
Some time ago I tried docker and there were two things which completely put me of from it.

Networking is really strange / counterintuitive, especially when I already have overcomplicated network at home.
I wanted to go rootless docker for added security as I do expose some of my selfhosted services and I was running into problems. The documentation seem lacking.

Recently I wanted to try containers again, so
I spinned up another VM, installed podman and set up jellyfin in container. Everything worked great except when after week or two I started getting problems.
It was harder for me to troubleshoot and fix an issue. After month of fighting I spinned up jellyfin in separate VM and it’s been running since just fine.

The problem with containers I have is that, after building dedicated server with type 1 hypervisor it is hard to me to find a reason to run containers in small environment. With VMs you loose a little bit of cheap storage and RAM (less than you would think with thin provisioning of storage and RAM managed by hypervisor), but that’s it. You gain better isolation, flexibility and it’s easier to troubleshoot problems in VM.

And I’d need to run containers in VM anyways, so it’s seems like an extra step for no gain.

skittlebrau · March 7, 2024, 12:53pm

Some services work better in a VM or are easier to set up. Some other things work better in containers. I make a choice depending on whatever my goals are for a given project.

In my experience, VMs behave in a more predictable way and basically any service can run in a VM, whereas certain services just aren’t suited for a container.

ulzeraj · March 7, 2024, 1:19pm

Kubernetes and docker swarm at the cost of downtime. It’s different as in no teleportation of services from one machine to another. Do you really need this? Are you running a cloud provider and constant maintenance and overbooking of resources constantly on your infrastructure?

I also think that people underestimate the amount of complexity that foes into these live migration solutions.

That’s docker style containers. You can have multiple services running on a container just like a virtual machine on LXC or SystemD containers.

It’s a bit weird that you mention this because it’s something that VMs can’t replicate properly without jank.

Docker and Podman use the same Dockerfile and compose format.

I disagree that nobody cares otherwise there wound be no people working on these projects. Docker is just a layer on top of a functionality that already exists in the kernel. Declarative style containers are more popular because of IaaC and how easy it is to publish and share.

risk:

Networking - isolating a container is simpler for the end user, they make a new bridge, they hook it up with a vlan with some UI clicks, vlan goes to a vlan supporting router they’re already running where with some clicks they have firewall rules and can route things, and voila. You can do macvlans, ipvlans, physical interfaces with containers… but docker basically only runs veth on bridges and barely macvlans. If you wanted to give a physical interface through a docker-compose.yml you can’t. If you wanted to provision a wireguard and pass it into a container through docker-compose.yml, you need to cobble up your own “plugins” or “provisioners”. VXLANs provisioning is a paid premium nonsense feature of swarm that people think solves their problems but doesn’t and having to pay leaves people salty. With VMs in contrast, the guest OS does networking probably with just DHCP out of the box, and network people have clear instructions for physical interface stuff.

You are doing the Containers == Docker thing again when in reality Containers (including Docker) == CGroups.

You can have DHCP on containers. The bridged interface works just like any other.

You can run Wireguard on containers, both user and kernel mode. But wait a minute how and why would you pass the host Wireguard to a VM? Is that even possible? Wireguard uses a layer 3 tunnel.

But I keep seeing this about allocating network cards to VMs and I can’t think of a non niche situation where I would need to do this and even in those the tradeoffs of complexity added eclipse the marginal gain on latency and throughput. Don’t virtualize your firewall folks. The L1 folks branded it ‘Forbidden Router’ for a reason.

shadragon · March 7, 2024, 3:15pm

I run six sites, two on-prem DC’s, two Cloud tenants and 400’ish VM’s on VMWare. We’re a Microsoft shop, with some Linux boxes here and there for special projects. We also have a few bare metal servers for specific applications.

My home lab reflects that. I have a pair of HPE MicroServer Gen10’s running Enterprise VMWare (VMUG). I mirror all of the Enterprise stuff in a smaller scale so I can run updates to see their impact is and test various features in a non-Production setting. If I kill my network, or toast a VM, it’s no biggie. That’s why you have backups.

Docker has its uses and is an excellent tool, but in the Enterprise environment it has limited application for what we do.

There is no system, OS, programming language or network that is 100% invulnerable to all attack vectors. Some are better, some are worse, but that is a continually sliding scale over time. The best defense is layered. Immutable multi-site backups, coupled with vlan isolation, access controls and continuous reviews for security, access and patching. So if there is a breach, it remains in it’s own bubble. All you can ever do is minimize the risk.

ulzeraj · March 7, 2024, 4:30pm

Containers not Docker. We migrated from VMs to containers 2 years ago. Azure environment with more than 5 thousand subscriptions for a company that spams around the globe. Commits build images that are automatically shipped to dev and test Kubernetes. After passing through all controls and approvals production artifacts are built, scanned and fingerprinted before finding their way through the production AKS cluster. We never touch any kind of compute.

Rollbacks are an absolute walk in the park. We just set image to previous version by having an authorized person triggering a pipeline job pointing to that release member.

Doing this with VMs would probably be a an absolute nightmare. So assuming you mean ‘enterprise environment’ in a general way (as opposed to your enterprise environment) that would be a hard disagree.

Yeah sure the nodes are azure vms but it’s a cloud environment. It’s not a comparison but a comment about containers and enterprise environment. If I dig into the crusty old layers of the business which is a financial institution I will find some Cobol crap running on solaris zones, a container system.

Keyword is ‘threat model’. We are talking all enterprisey here but most people running stuff in their homes don’t have that kind of threat model.

trezamere · March 7, 2024, 8:57pm

I dropped VMs for containers several years ago and I’m never looking back. There are certainly reasons already elaborated upon for running VMs, in general it boils down to security or incompatible kernels.

For a homelab I dont think there is a compelling reason unless its to learn, you can just as easily lock down a container to a level required for a home lab (rootless podman user for each service + selinux is already most of the way there), you need to be mindful of the attack surface either way. Your just massively increasing the maintenance burden at home which is more likely to cause you to ignore stuff (updates etc).

mikejmcfarlane · March 7, 2024, 10:34pm

I’d love to work in a place that got to that level of CI/CD.

On the building VMs for CI/CD, I’ve had solid results with Packer and Ansible backed by solid git workflows for the teams. In all cases good git workflows really matter if you want to avoid nightmares.

First use case was nowhere near your scale and didn’t do prod go live as startup and no one had time to write enough tests so not true CD in that case. Instead a deep learning startup. One of the builds was a relatively simple web server (OS config, nginx plus app front end, CIS benchmarks with selinux enabled and appropriate policies in place), but the backend was a GPU server with automated nvidia driver install (used to be tricky, not so now) then a 6 hour build with the complete application software stack built from source as we needed everything bleeding edge for latest features. OpenCV, OpenBLAS, Caffe … 6 hours of make -j $(nproc) on a high end AWS GPU server. CIS inc selinux and policies again. You get the idea.

Single packer build pipeline, took the latest commit from master, running mostly ansible and a bit of bash, reliably produced build artifacts for multiple platforms - KVM for the office/colo Linux dev servers, AWS EC2 AMI and Azure VM images for cloud dev and prod, and VMWare images for the odd on prem client. It just ran apart from when there was a bug in the code, usually mine, and those could either be fixed, or commits reverted to get back to a known build.

We use packer and ansible in the financial enterprise I work in now for the AWS and VMWare image bakery. Not such a complex build, just a standardised and approved OS image for RHEL and Windows with all our lovely enterprise service agents running, sigh. Again it just runs as standard pipeline spitting out the monthly image, from the master branches of all the repos, with automated tests and vulnerability scans and semi automated approval processes.

I suspect I’ve maybe missed your point a little, but I think good IaaC and CI/CD is available across a range of platforms and use cases. I can see that with docker compose files or helm charts for containers that IaaC and CI/CD can be more “manageable” (thinking code peer review/goverance/compliance/audit/ops etc) as they have really pushed the declarative code to a new functional limit, it’s really impressive what can be done with some yaml. And if you are running on a managed cluster like EKS or Fargate then AWS does a lot of the heavy lifting on the platform. (That’s not to take away from the skill required to manage, operate or build apps on a containers platform btw). When I contrast that to our enterprise VMWare IaaC, which needs “wrappers” round services and software to make it work, def could stray into nightmare level easily unless carefully architected and managed. And audit/compliance/governance stuff is def “harder”. But it works, mostly, and likely a lot slower pace of release than you manage

Fwiw home use is just docker. nvidia-docker and good git practise makes my life v simple, and are the right tools for the job here

ulzeraj · March 10, 2024, 5:30pm

It was about container’s value in enterprise environments.

ack · March 12, 2024, 12:15pm

Speaking of security…There was some relevant news for docker users last month:

cotton · March 12, 2024, 11:11pm

Everything depends on what you’re trying to do, but realistically speaking, a lot of the stuff used to manage containers and infrastructure is way overkill and just introduces a bunch of extra complexity which adds insane cost to something that could be much cheaper if a simpler solution was used instead.

Why use a VM? Because a simple monolithic application which runs in a HA environment is probably going to be cheaper than trying to manage everything it takes to develop, maintain and support a microservice style application.

On the other hand, if you’re developing small services which run in cloud functions or lambdas then you might want to dabble in containers so you can do development. Or, if you’re running something which absolutely, positively, cannot have a momentary blip in service, or you have wild swings in traffic or resource usage at mega scale — then containerized applications can be useful.

Like I said, everything depends on what you’re specific needs are, but seriously, it may not be sexy, but simplicity is king and cost control keeps a company alive.

cotton

ThatGuyB · March 15, 2024, 7:45pm

I’m going to repeat most of what was said here, but add my own twist.

Why run VMs? First, to run containers inside VMs, in a cluster. Most homelabs don’t get to 200 containers per host (worker-node), but when you reach these numbers on a high-end epyc server, you will need VMs to leverage all the available hardware. Just make more worker-nodes (because k8s has, or at least had a few years ago, big limitation on using all the available hardware).

Second, to run OS other than linux (sometimes you need to emulate other software, like, idk, BioStar 2 (from Suprema for X-Pass RFID keys, for access control and time sheet). This only runs on Windows. There are other software, like NVR servers that run on windows servers.

Security is another one which I’ll repeat, breaking out of VMs with virtual hardware is harder than breaking out of containers. And containers have the whole host’s memory shared, there’s a possibility (if you don’t configure stringent memory allocation) that the container will request more memory and be able to read the RAM of other containers or the host.

More importantly, which is a big advantage over containers, is live migration of a VM to another host. Containers will shutdown and get started somewhere else, which introduces like 3 seconds of delay (maybe a bit more if your sever is large). And if you’re running a database, like PostgreSQL, that will check its integrity (i.e. read its write-ahead logs) when it crashes, you’ll have to wait more than just a few seconds for the DB to start up (depending on how large your DB is).

And no, I highly advise against running DBs in containers, unless you realize the host of the DB container will need to always be up and that you realize the limitations of DB downtime. Of course, you could get away with running DB replication between 2 or more DB containers (like what DBs on bare-metal do), but by that point, you’re just making your services more prone to failure (DB replication is not exactly instant, log shipping and roll-forward takes a few minutes).

DBs is sure a reason to run VMs. Just live migrate the VM (HA + fencing). Whatever you’re running in the backend won’t even notice that the DB changed hosts and the DB will still run just as it was.

Because the leftover enterprise gear uses lots of power. Unless you’re on your own solar-powered house, your power bill will be pretty spicy.

It can be easily automated just like a container deployment. Just that the deployments are already automated by someone else (using the build files). You can have a VM template that you just instantiate and launch, then have an automation tool set up some minor stuff (like a static IP).

It’s more like LXC (OS containers) than program containers.

Another reason for VMs is that some programs are just unable to run in containers, because they need access to special kernel features, like huge pages. I think IBM doesn’t allow DB2 to run in any kind of containers.

Technically you could have a pod defined there, that runs all your containers and upgrades each one. That way, you have some separation between them, but still get updates for them. When you do it for a few services, having them all in a VM is not that bad. But when you run a few dozens, it’ll get difficult to keep all of them up-to-date with just apt (unless you automate). Technically you still need to get a new pod / container build template to update your services, so it’s not like that’s automated either.

Having dealt with live migration on proxmox and xenserver, I don’t think it’s that complicated. Sure, setting up HA + fencing has a bit more steps, but nothing really unachievable. Just not that worth it for home, even homeprod.

I think Wendell called it that, because most people in IT are recommending against it (including myself).

I’m personally a big fan of containers, because it allows us to run services on very cheap and unreliable computers at home and save a ton in power (when you get a VM for each service, the CPU utilization for emulating hardware starts to add up - ask me how I know).

I prefer system containers, like Incus (previously LXD). I’d love to see a middle-of-the-road approach, like firecracker with all VM features (QEMU folks are working on microVMs), but with less hardware emulated.

But containers don’t have a lot of features that come standard in VMs, which makes life a bit miserable. Even when you try to run containers on NFS with no_root_squash, if you launch the containers, they won’t run (at least one of the latest oddities I’m facing). This makes transferring OS containers way more difficult. IDK how program containers would behave, but I’d guess similarly if you do a persistent path to a NFS volume.

If there was a cheap and somewhat reliable way to build a Ceph cluster at home with enough redundancy and not a lot of power usage, that’d be what I’d use for backend storage for containers. For right now, single NAS it is.

One of the reasons containers are popular is because devs like being able to deploy the same thing from testing to prod and know that it’ll “just work.” It’s mostly a distribution issue. Sure, service containers make for easy recovery when you have a container orchestrator, but most of the popularity I see in it is reproducibility. And I find nixos to be way better in all regards than OCI containers, with the exception of the writing the thing. I haven’t figured out flakes and how to create a nix package, but deploying 1 is insanely easy in nixos.