Linux UEFI boot - Partitioning & Software Raid

sharky345 · January 4, 2021, 11:59am

Dear All,

I intend to build a power workstation with Linux as primary OS. (Fedora or Debian)

On top I aim to run some virtual environments using KVM. W10 shall use GPU pass-through, for this reason I have 2 GPUs.

Now I’d like to ask the L1Techs Community for best practices for the Linux OS partitioning, Software Raid and LVM setup to use for the Gigabyte Aorus Master Mainboard that’s to my understanding mainly UEFI boot.

My aim is to have 2 NVME 1TB as RAID1 for
/
/boot
swap

and 2 SATA HDDs 4TB as RAID1 for
/data or /home

I intend to use Linux software Raid as the Gigabyte BIOS FAKE RAID seems to work only with W10 directly based on other threats in here.

My approach would be to:
Partition the NVME
nvme1p1 - primary - bootable -for RAID use (1024 MB)
nvme2p1 - primary - bootable -for RAID use (1024 MB)

nvme1p2 - logical - (1 GB) - for swap
nvme1p2 - logical - (1 GB) - for swap

nvme1p3 - logical - (remaining space ~1TB) for Raid use
nvme2p3 - logical - (remaining space ~1TB) for Raid use

Partition die HDDs:
sdap1 - primary - for RAID use (4TB)
sdbp1 - primary - for RAID use (4TB)

Setup RAIDs:
nvme1p1 & nvme2p1 as RAID1
mount: /boot/efi
EFI Partition in FAT Format

nvme1p3 & nvme2p3 as RAID1
mount: /
btrfs

sdap1 & sdbp1 as RAID1
mount: /data or /home
btrfs

My questions are:

Will this work with Fedora or Debian on my Mainboard as single boot OS Linux, or should I consider an alternative approach like LVM? My last build was a while ago in the pre UEFI days so I might have gathered a false understanding on how to best deal with it in Linux.

Also is there any reason not to go for the full RAID1 setup. So far I have used similar setups for ages and really never had to deal with total rebuilds of my systems because of that.

Any suggestions for the actual install, as I saw many posts having issues with GRUB.

Feedback would be appreciated. I have yet to wait till my main board is delivered, but I might test the setup on a virtual OS in before.

kacyl1 · January 4, 2021, 12:37pm

Practically all my Linux systems run LVM on mdraid RAID 1 and I’ve never encountered issues directly due to the use of RAID 1 requiring a rebuild. For partitioning, I’d strongly recommend you use LVM. With LVM, you can add more disks subsequently and use them to enlarge the existing file systems with ease.

That said, I’ve not used btrfs so the experience might be different.

risk · January 4, 2021, 12:50pm

I’m half way through writing a guide for using Debian with btrfs as a NAS.

Not planning on LVM - not needed for the use case.

Forget about primary / logical stuff. Use GPT, and just make 3 partitions on your nvme drives , 1. ESP 2. boot 3. root
I think ESP (/boot/efi) was limited to 512MB by spec (iirc, i may be mistaken). mdadm raid 1 is fine. As this is only for GRUB, 512 is probably overkill, but whatever - disk is cheap.
I plan on having /boot on btrfs raid1 (GRUB reads it fine, even with zstd compression enabled), and I plan to size it to 2GB, which should be more than enough, but not too much.
I was going to use a third partition for root, but it’d be a cryptsetup luks partition, and btrfs in it for all the stuff, if you prefer LVM, go with LVM.
I don’t think swap partitions make much sense in this day and age.
btrfs can scrub the devices and it can verify checksums on reads and if the data is corrupted it can recover by reading from the other copy. If you expose an LVM logical volume to btrfs, it’ll do checksumming but it’ll only ever see one copy of the data and LVM won’t care to recover anything along the way.

If you want to use LVM with btrfs, it’s better if you leave raid handling and snapshots to btrfs, and use LVM for dynamic partitioning only.

If you wanted a block device instead of a filesystem, for example for a VM, you can either use qcow2 on btrfs, or you can use LVM raid.

I’ll send you a preview of the guide, I’m using hyper-v and vhdx for experimentation.

misiektw · January 4, 2021, 12:59pm

Forget swap partition, its useless. You can always add file from nvme if you really need it, but better yet add more memory.

As for partitioning I would go for (both nvme):
999MB - EFI (md raid 1.0 metadata with vfat32)
~100G - OS (md raid 1.2 metadata with ext4)
900G (rest) - ZFS mirror

HDD’s - ZFS mirror full devices

Optionally you can put OS on ext4 on LVM for snapshots before upgrade if you need it. Or just ZFS too but you have to mind initrd, depending on distro.

EDIT: Ah, forgot. DONT use ASmedia/Promontory fake raid. It can work on linux (as of kernel 5.4 with community patches) but it sucks big time, and there is no support for it from AMD.

sharky345 · January 4, 2021, 1:00pm

thanks a lot for the feedback. I’m looking forward to your guide…

risk · January 4, 2021, 1:09pm

Basically latest “stable” version of Debian is “stretch” which is using an ancient 4.19 kernel from 2018, they haven’t had a release in a while, and that means old grub, old drivers, and old btrfs code. What will get installed will be old and will need update from backports.

I prefer rolling release distros, so when I use Debian I just switch it to testing, and get that experience but with debian package management and software library

BansheeHero · January 4, 2021, 1:25pm

I would advise against mdraid and fast NVMe drives. Just use the mirroring inside LVM.
Regarding how I would do it - just use LVM and you can decide whether you want 1 volume group or multiple.

There are some benefits for keeping OS independent from data, but it requires you to have split NVMe into more physical partitions. Thus decreasing flexibility and increasing allocated empty space.

Because you mixing uses (Swap, KVM block devices, regular filesystems) LVM is the best way to approach it outside of industrial solution like ZFS or Ceph. I still believe it is not butter. (btrfs)

Dutch_Master · January 4, 2021, 1:37pm

FYI: Funtoo warns in their install docs against using btrfs (or zfs for that matter) for / and/or /boot, as this is quite complicated. It’s fairly trivial to use ext4 and/or jfs instead. Of course, other parts of the tree can use btrfs just fine.

zlynx · January 4, 2021, 3:25pm

I don’t know how safe it is to use RAID-1 copies of an EFI System partition. This would create two devices with identical UUIDs.

I suppose that as long as the UEFI doesn’t crash, it won’t matter which one it picks to boot from if they are really always in sync.

risk · January 4, 2021, 3:52pm

@sharky345 … actually, I think there’s a use case for LVM/device mapper underneath btrfs … tiered storage / writeback cache for your spinning rust drives.

With LVM as a dynamic partition manager, you can eventually more easily shrink the logical block devices underlying the two btrfs legs and dedicate some of the NVMe space as writeback cache.

Yep, exactly how it works. The data there is mostly static, and UEFI / grub can just pick any of them (as long as it doesn’t pick one on a broken drive), and boot.

If UEFI ends up picking one on a broken drive, you can redirect it through some boot ui menu options, or just pull the drive, boot, then plug it back in and fix the mess with the drive.

mdraid is just an easy way to keep them in sync, for that once in a while grub upgrade, what you end up mounting in linux will be just a device mapper name, not any kind of UUID (funny looking fat32 uuid).

misiektw · January 4, 2021, 4:22pm

Exactly. Even if you mount it with UUID md device takes precedence to vfat, but it may be distro dependent. So best way is to put
/dev/md/host\:efi /boot/efi vfat ...
in /etc/fstab if you named your md.

Grub may complain a bit at boot, but didn’t notice any problems otherwise.

EDIT: I suppose saving of boot choice option may not work as it should, but it’s probably more of OS /boot/grub md choice than efi

sharky345 · January 6, 2021, 6:36am

Again, thank you all for your feedback !

sharky345 · January 8, 2021, 11:20am

Short update.

I have tested my primary setup on a VM with Fedora 33.

sda 20GB
sdb 20GB

p1 raid1 1024MB of each sd : EFI-Partition mount:/boot/efi
p2 raid1 19GB of each sd: btrfs mount: /

worked fine in fedora graphical installer.

I aim to do one more test without sw raid using LVM Raid functions, but I guess I will need to do so on shell level as Fedora graphical installer did not offer the options for LVM Raid as far as I saw.

still waiting for my mainboard to be delivered so I have to wait with my build and have a little time to test my intended setup some more.

has anyone some more insights on setting up a /boot/efi and / as Raid using LVM?
Goal remains to be able to replace a broken NVME or HDD and keep the system running.

lae · January 8, 2021, 7:43pm

Swap itself isn’t totally useless, but yeah one can just use a swap file instead of a partition.

UEFI BIOS cannot read LVM so you can’t exactly put /boot/EFI on LVM. It expects a FAT partition. I’d suggest just partitioning them the same here (if avoiding swraid) and then, since the data on these don’t change that often, using some method to keep them in sync - such as a hook on update script or a cronjob.

sharky345 · January 10, 2021, 12:57pm

@lae: great thanks for confirmation!

I was getting nowhere when I tried it a install with LVM and didn’t manage the Fedora 33 setup as long as I intended to have the lv_boot as lvraid 1 not even when i forced mkfs.vfat -F 32 on it

… failed to find a suitable stage1 device …

for me personalty LVM is nice but does not really work in my setup, so I will skip it and stick with Linux Software Raid, so that I can get a solid RAID 1 setup for my OS NVMEs

for my data Raid on HDDs I might actually go for lvraid, but not for my OS boot and root.

sharky345 · February 13, 2021, 12:18pm

short update, using btrfs on a lvm lv is a very bad idea, lvextend -r does not support btrfs as it seems. back to ext4 on my data raid. other than this is works like a charm.

risk · February 13, 2021, 1:44pm

btrfs filesystem resize max will do it.