Server Boot Environment Poll

TL;DR: I’m looking at revamping my server onboarding procedure, so I wanted to see what people here are using.

Currently, on my Linux servers, I am running mostly RHEL distros booting from an MD mirror with lvm for snapshots. I established that procedure before ZFS on Linux and BTRFS were as reliable as they are now. I am aware of the pitfalls of MD/LVM, and of the advantages of ZFS. I have a broad strokes understanding of BTRFS but no hands on experience with it.

Ideally, I would boot from a ZFS mirror, but I don’t want to use FreeBSD for everything. I need a Linux solution that is easily automated/scaled. I know it is possible to get Linux to boot on ZFS, but at a glance it doesn’t look like something I’d want to do on a production system.

Despite the shortcomings of MD and LVM, I haven’t had any issues with it. I don’t use it for large or high performance storage. For a simple boot environment (always a mirror of 2 small SSDs), it provides redundancy and snapshots. So unless someone has a silver bullet for deploying CentOS/RHEL/Fedora onto a ZFS mirror, I think I’m looking at either sticking with MD/LVM or moving to BTRFS.

Thoughts? What are people using in practice? Is there another option I’ve overlooked?

3 Likes

i just throw /boot on a different drive or usb stick for my servers though nothing i do is critical

1 Like

Yeah, I like that for home lab or something I’m physically around, but my stuff is an hour away in a datacenter.

What do you put the base OS on?

1 Like

depends on what im doing but usually ext4 with a hourly backup schedule to a different drive

1 Like

Based on this, I’m assuming you’re working in production/enterprise.

My Linux stuff is all CentOS or Ubuntu (depending on workload)

We use Ubuntu for the bare metal hosts for openstack and CentOS to host our proprietary software.

When working with the bare metal machines, we use MD raid1 with ext4 on top for our system volumes, data volumes (/var/, /home, /srv) get the same treatment with the exception of raid5. (only using 2TB disks at the moment)

Our openstack storage is ceph, so we just keep 3 copies and ceph works against the direct attached disks. ex: /dev/sdb /var/lib/ceph/sdb ext4 noatime 0 0

I’m moving my VMs over to btrfs because it’s pooling is much nicer than ZFS in that you can remove devices easily and block level backups and snapshots are easy. I’m toying with the idea of using all non raid5 volumes as btrfs next time I build a cluster, but I’m going to have to see how that works out.


That said, I think the important questions so we can help you best are: what is the servers workload, how important is the data and is virtualization a factor?

3 Likes

Thank you for your response, this is exactly the kind of answer I was looking for.

My situation is that I’m running IT for multiple small businesses. I have machines split between onsite and in a datacenter. There is a VPN tunnel from each site to the datacenter. Some services are consolidated at the datacenter, and some are onsite. This lets me share some compute/storage across several clients. The clients prefer this to using AWS or other cloud infrastructure.

Currently, I am running mostly CentOS VMs on a small vSphere cluster. I don’t like being married to vSphere, and I am looking at oVirt, but haven’t gotten in too deep yet. I’m supporting a lot of Macs, so I want to dedicate the vSphere license to virtualizing macOS (I believe that’s the only legit way to virtualize macOS) for a few Mac things that have to run in their own OS.

With oVirt in mind (or any RHEL-based virtualization), I want to nail down a solid onboarding procedure for the hypervisors, starting with the boot enviroment/install partitioning. I also like to have some basic services running on metal before setting up the virtualization cluster. I know this is kind of overkill, but I’m really particular about having a comprehensive procedure for everything. I will not be onboarding hypervisors en masse or on a regular basis, but what I don’t want is to have to do it in a year and not remember exactly how I did it or why I did it in a certain way.

1 Like

The only legal way to virtualize macOS is to run it on Apple hardware. In other words, throw a mac pro (trashcan) in the DC, install vsphere on it and run your VMs on there. Is there anything that needs to run on OSX? If it’s mostly just file sharing, I’d recommend looking into freenas. It’s got great privilege separation, supports AFP, SMB, NFS and iSCSI for endpoint connection and is built for ZFS.

How much hardware are we talking? 50 cores, 250 cores, 2000 cores? I’m asking because if you’re going to be scaling out significantly, you’re probably going to want to look at something that’s a bit more size friendly, like openstack.

Alright, let’s talk about your services. Are you going to be doing log aggregation (ELK stack) and monitoring (Nagios)? If so, you’re going to want one medium sized machine for those services for now, I’m thinking 6-8 cores, 32GB ram will work. Keep in mind that you’re going to want a lot of storage on this host (4TB is probably good to start) for the log storage. You’ll be surprised how fast this increases if you log metrics through ELK. My cluster does 80M entries a day just for bare metal logging.

On top of that, you’re probably going to want to run some sort of config management. At my datacenters, we use Ansible to deploy OpenStack. That can be run on any host, but should be run at the datacenter, so you’ll probably want to store the files on the same server as the logging is done on.


Now, as far as actual server config goes, for Virtualization, I’ll recommend this: CoW filesystems are slow. Don’t use them.

This is how I would set up a server that’s a compute node for virtualization:

CPU: 2x16core
RAM: 256GB 
BOOT HDD: 2x 500GB 2.5in RAID 1 (software) (ext4 on LVM2)
STOR HDD: 6-8x 2TB   2.5in RAID 10 (software) (ext4)


Partition layout as follows:
/dev/mapper/vg-root (150GB used, rest for snapshots)
    - /dev/vg-root/root    -> /             (150GB)
/dev/mapper/vg-stor (8TB used, complete capacity)
    - /dev/vg-stor/storage -> /srv/storage/ (8TB)
/dev/tmpfs
    - tmpfs -> /tmp/ (4GB)

Keep in mind I work for a big data company, so we need a ton of power. One of these nodes would probably be enough for you to manage most of your clients.

3 Likes

All of my servers are Windows XP or OSX.

Because the only actual server I own doesn’t have ram in it.

Though when it DID have ram in it it had crunchbang installed and it was 3 years before that died.

I miss crunchbang…

I Run a Napp-IT Server

NAPP-IT "Ready to use and comfortable ZFS storage appliance for iSCSI/FC, NFS and SMB"
https://www.napp-it.org/index_en.html
This is going to be NAPP-IT on OpenIndiana (OpenSolarisish) so Full ZFS Support and Solaris hCIFS Provides SMB
https://www.openindiana.org/

So tha Hardware - I’m going to find out how much support for this there is.

Mobo : ASUS P7P55D-E LX
CPU : Xeon X3450
Ram : 2 X 4GB Dims of Ballistics Tracer
HDDs : 5 X 2TB Segate’s + 1 X 2 TB Hitachi

NIC;s Unknown - Have got 10GB SFP+ with juniper LR SFP+ in it. , Might change to Intel Quad GB Nic Or the Dual Broadcom Nics , Need to see what is compatible.

So this is the System at current state. the 5850 will be remove for something more appropriate

Link to Build log

So my virtualization infrastructure is currently a few ESXi hosts, one of which is on Mac hardware so I can run a few Apple services that can only be run on Macs (not file sharing). I also have some Mac VMs running onsite in VMware Fusion. It’s mostly for device management, similar to GPO in Windows. There’s also some proprietary software that the clients use that runs on Macs.

Because of the cost of vSphere, and because I’d like to stick with the RHEL platform as much as possible, I want to only use vSphere for Mac services. That means 1 ESXi host per client (replacing Fusion) and 1 for the DC. This leaves me looking at oVirt for my infrastructure services, but again, I have not taken the deep dive on oVirt and am open to alternatives (OpenStack, Xen, whatever).

It’s small. I currently have 36 cores of hypervisor in the DC and then VMware Fusion running various Mac hardware onsite. This is all working fine and the clients are happy. For me, it’s about learning how to deploy scalable infrastructure on a small scale so I can work out the kinks and be confident in bringing on more clients or a larger client.

I’m not familiar with ELK. A syslog server and some e-mail notifications are where I’m at right now. I totally understand that I’ll need something more scalable in the future but one step at a time.

I have played with icinga2, but I haven’t deployed it yet. It seems like there are lots of people hating on every monitoring solution, so it’s difficult to commit to one.

Yes, Ansible is definitely on the roadmap, especially since RHEL is opening up Ansible Tower. Another reason I am trying to stay in the RHEL world as much as I can.

Yes, I could probably run everything on one of those, although I am running a 60TB NAS as my primary datastore. MD/LVM/EXT4 answers the original question of my post, and it’s interesting that that’s basically what I’m using now for my CentOS machines (except XFS instead of EXT4). Do you see any real advantage to running a BTRFS mirror for the base OS instead?

Thanks for all this information, I really appreciate your time. It’s very helpful to get a glimpse of how things look at a much larger operation.

Instead of Nagios, you may want to look at Prometheus, imho it’s ideal for when you have a couple of well instruments servers or for when you have a few metrics across thousands of servers.

As for baremetal stack, puppet and ovirt would get my vote. I like the idea of ansible, but it just uploads the agent on demand, not really different.

1 Like

Hey @SgtAwesomesauce I’ve been very lightly tinkering with Openstack, and would love to go the Ansible route. Have you open-sourced your playbooks or would you consider doing so (if it’s not against your company policies NDAs?)

I’m keen to find a working Ansible based approach to boostrapping a minimum Openstack ‘stack’ if you will. Sorry to others for being OT.

For the work that I do, so far I’ve been able to rely on snapshots in AWS; bootstrapping though, right now, is done via Ansible. My most recent stack is a Docker-based CD-pipeline for a Rack/Rails platform. There are quite a few custom aspects to it, such as running cmake to compile a C++ IPC tool the first time containers spawn, pulling ENV vars from S3 etc.

Ideally, I would have liked to have gone completely 12-factor, with a Vault HA stack, ELK etc. but I’ve so far not been given a budget to do so :confounded:

We use a slightly modified deployment of Rackspace’s Openstack playbooks. I can’t release our playbooks, however rackspace has theirs at github, link above.

That’s the balance of IT, huh. As long as you have the compute/storage resources to spare, you should be able to build an ELK stack. Elastic is open source, it’s just x-pack that costs money. (also, just an FYI, I got quoted at 18k/yr for a 3-node deployment of x-pack. Didn’t do it)


The benefit to Ansible is that all you need to do is open SSH on the target machine and you can use it, as opposed to installing an agent. That’s why I prefer Ansible to Puppet. I do enjoy the way the Ansible config flows better as well.


So I don’t use Xen, so I can’t comment on its merits. I’m thinking that you’re going to need a bit more hardware to get openstack working properly, so I’m not sure if that’s something you’d be interested in. I plan on doing an article on openstack at some point in the near future, so I’ll explain it then. The problem with Openstack is that while it’s a wonderful system, it’s immensely complex and requires a ton of time invested to conquer it. I’m going on 3 years experience with it and I’m just starting to get to the level where I’d be comfortable if stackexchange went under.

So with all that said, I think you were right with Ovirt as your base. It’s a solid system (I’ve been doing research into it and I’m liking what I see!) and it’s more or less a drop-in replacement for ESXi.

I’m assuming this is 2 or 3 machines. This is definitely the prime use-case for Ovirt more importantly you’re going to be familiar with the structure of it.

I’m planning to do an in-depth talk on this one at some point, the long and short of it is that ALL monitoring solutions suck, it’s about finding the one that sucks in the way you’re not concerned with. For my company, it’s Nagios, for yours, it may be icinga2 or Prometheus or Monit or any of the other systems that have been coming out of the woodwork lately.

Damn, how’d I miss that! That’s been the main thing that’s frustrating me lately with Ansible. Thanks for the link. RHEL is a wonderful world, but don’t get too married to it. Ubuntu and SLES both have their merits and use cases.

yeah, we do have about 800TB of SSD in our ceph cluster for high-iops storage and about 5x that for low-priority storage on rust.

I use EXT4 over XFS because I like the flexibility and it’s just what I know. I’m actually familiar with manually repairing the journal on EXT3, so I guess that more or less makes me an expert on it. If you’re more comfortable with XFS, go for it. There are no real downsides and I’m hoping more distros adopt it as a preference to EXT4 in the near future.

BTRFS mirror is a difficult situation. I’ve seen kernel updates (that implement BTRFS patches/updates) nuke a filesystem before. That’s my only concern with using it at this point in time. I’m thinking my company will be dropping MD/LVM in the next year or two, since my test lab has been running it for about 5 months with no problems, I just don’t want to have a production failure caused by a stupid filesystem.

Glad I could help!


EDIT: I was digging through the Ansible Docs and found out that they have a module for Ovirt, so you can go full-circle in deploying VMs from Ansible. This is cool.

2 Likes

Appreciate the link @SgtAwesomesauce

1 Like

But on the target machine, you generally need a whole bunch of stuff, you can upload and install puppet over SSH just the same.

If, on the other hand you decided to go all cloud, you could create a root image as part of the make/build process, same way you build your apps into packages. To upgrade you would drain useful traffic away from the machine, reconfigure PXE for that host to boot into a special mode where you could rsync the new image across the existing one, then reconfigure PXE again, and boot into the new image. Machine would then rejoin your pool and your cluster manager like kubernetes would start giving it stuff to do, as needed.
There’s also machine configuration to worry about, and data on the machine.
It sounds simple but there’s complexity in this approach too.

I’ve never encountered this. (except suse)

Ok, so I think our discussion has satisfied the purpose of this poll. Just want to do a tldr synopsis for anyone who runs across this in the future.

  1. LVM on software RAID mirror (MD) is viable and in use as a boot/OS environment (as of late 2017). EXT4 and XFS are both fine.

  2. Booting from ZFS on Linux is technically possible, but:

  1. BTRFS is maturing and will work, but it is safest to give it another year or two.

@SgtAwesomesauce anything you’d add/change?

EDIT: quoted @SgtAwesomesauce on ZFS

ZFS is purpose built and works for scalable production environments, but for this workload, you’re better off not using it because synchronous writes are the bane of ZFS.