Proxmox 8 Multipath / Multipathd Quick Start

Premise

I was working on the level1 storage server and decided to move from RedHat to Proxmox 8 (mainly because of the new Kernel and because I wanted to switch back to Docker from Podman).

This system has an LSI disk shelf and the first shelf is comprised of 24 SAS 12TB HGST Helium-filled drives.

I am working on adding a second disk shelf of 18tb SAS Seagate Exos Mach.2 drives as well.

These disk shelves can operate in active/active mode, SAS-6, which is quite fast. Unfortunately, this will still bottleneck my especially zippy Exos Mach.2 drives… so I am working on a solution for that. In the mean time I wanted to setup proper Worldwide Number (WWN) mutipath because this is the correct way to handle disks when you have a true multipath.

Instead of zfs being made from /dev/sda etc it is made from mpatha, mpathb, etc. This is because through the SAS machinery there are multiple physical paths to each disk – hence multipath.

Multipath is becoming obsolete because multiple path disk storage is becoming less of a thing than it has in the past – when a fault occurs an entire chassis is ejected from the cluster (rather than a single machine having a high level of internal redundancy as used to be the case).

Getting Started

apt install multipath-tools multiipath-tools-boot

It is a good idea to man multipath.conf – multipath doesn’t do much for you out of the box.

defaults {
        polling_interval        2
        path_selector           "round-robin 0"
        path_grouping_policy    multibus
        uid_attribute           ID_SERIAL
        rr_min_io               100
        failback                immediate
        no_path_retry           queue
        user_friendly_names     yes
}

This is t he contents of a sensible /etc/multipath.conf starting point. Note especially that round-robin is not active/active and you want to do active/active if your hardware supports it. I recommend starting with this configuration to ensure stability, then switching to active/active.

Every disk has a gobally unique world-wide number (WWN). That’s what multipath uses to identify disks.

Sample lsblk output:

sde                8:64   0  10.9T  0 disk
sdf                8:80   0  10.9T  0 disk
sdg                8:96   0  10.9T  0 disk
sdh                8:112  0  10.9T  0 disk
sdi                8:128  0  10.9T  0 disk
sdj                8:144  0  10.9T  0 disk
sdk                8:160  0  10.9T  0 disk
sdl                8:176  0  10.9T  0 disk
sdm                8:192  0  10.9T  0 disk
sdn                8:208  0  10.9T  0 disk
sdo                8:224  0  10.9T  0 disk
sdp                8:240  0  10.9T  0 disk
sdq               65:0    0  10.9T  0 disk
sdr               65:16   0  10.9T  0 disk
sds               65:32   0  10.9T  0 disk
sdt               65:48   0  10.9T  0 disk
sdu               65:64   0  10.9T  0 disk
sdv               65:80   0  10.9T  0 disk
sdw               65:96   0  10.9T  0 disk
sdx               65:112  0  10.9T  0 disk
sdy               65:128  0  10.9T  0 disk
sdz               65:144  0  10.9T  0 disk
sdaa              65:160  0  10.9T  0 disk
sdab              65:176  0  10.9T  0 disk
sdac              65:192  0  10.9T  0 disk
sdad              65:208  0  10.9T  0 disk
sdae              65:224  0  10.9T  0 disk
sdaf              65:240  0  10.9T  0 disk
sdag              66:0    0  10.9T  0 disk
sdah              66:16   0  10.9T  0 disk
sdai              66:32   0  10.9T  0 disk
sdaj              66:48   0  10.9T  0 disk
sdak              66:64   0  10.9T  0 disk
sdal              66:80   0  10.9T  0 disk
sdam              66:96   0  10.9T  0 disk
sdan              66:112  0  10.9T  0 disk
sdao              66:128  0  10.9T  0 disk
sdap              66:144  0  10.9T  0 disk
sdaq              66:160  0  10.9T  0 disk
sdar              66:176  0  10.9T  0 disk
sdas              66:192  0  10.9T  0 disk
sdat              66:208  0  10.9T  0 disk
sdau              66:224  0  10.9T  0 disk
sdav              66:240  0  10.9T  0 disk
sdaw              67:0    0  10.9T  0 disk
sdax              67:16   0  10.9T  0 disk
sday              67:32   0  10.9T  0 disk
sdaz              67:48   0  10.9T  0 disk

This system only has a total of 24 10.9T disks, but there are 48 entries. Two paths per disk = disk shows up in two places.

From here it is easy enough:

# multipath -lv /dev/sde
mpathb (350000c9000389aa8) dm-1 HGST,HUH721212ALE60SA
size=11T features='1 queue_if_no_path' hwhandler='0' wp=rw
`-+- policy='round-robin 0' prio=0 status=active
  |- 6:0:0:0  sde  8:64   active undef running
  `- 6:0:25:0 sdac 65:192 active undef running

run multipath -lv for each disk yo u want to add to the multipath. This tells us this particular disk shows up at /dev/sde and /dev/sdac. You could also write a shell script to iterate over the first half of entries and add them.

This utility adds the worldwide numbers explicitly to multipath’s config:

cat /etc/multipath/wwids

# Multipath wwids, Version : 1.0
# NOTE: This file is automatically maintained by multipath and multipathd.
# You should not need to edit this file in normal circumstances.
#
# Valid WWIDs:
/350000c9000389aa8/
# ... and so on ... 

Checking the contents of the multipath/wwid file should confirm we’ve added all the disks. running lsblk can also help us confirm:

lsblk

... snip ... 

sde                8:64   0  10.9T  0 disk
└─mpathb         253:1    0  10.9T  0 mpath
sdf                8:80   0  10.9T  0 disk
└─mpathc         253:2    0  10.9T  0 mpath
sdg                8:96   0  10.9T  0 disk
└─mpathd         253:3    0  10.9T  0 mpath
sdh                8:112  0  10.9T  0 disk
└─mpathe         253:4    0  10.9T  0 mpath
sdi                8:128  0  10.9T  0 disk
└─mpathf         253:5    0  10.9T  0 mpath
sdj                8:144  0  10.9T  0 disk
└─mpathg         253:6    0  10.9T  0 mpath
sdk                8:160  0  10.9T  0 disk
└─mpathh         253:7    0  10.9T  0 mpath
sdl                8:176  0  10.9T  0 disk
└─mpathi         253:8    0  10.9T  0 mpath
sdm                8:192  0  10.9T  0 disk
└─mpathj         253:9    0  10.9T  0 mpath
sdn                8:208  0  10.9T  0 disk
└─mpathk         253:10   0  10.9T  0 mpath
... snip ... 

Now the output should show us that there are block devices accessible by their multipath paths. This is now also true in /dev/

ls /dev/mapper/
control  mpathb  mpathd  mpathf  mpathh  mpathj  mpathl  mpathn  mpathp  mpathr  mpatht  mpathv  mpathx mpatha   mpathc  mpathe  mpathg  mpathi  mpathk  mpathm  mpatho  mpathq  mpaths  mpathu  mpathw  mpathw-part9

I recommend also doing systemctl enable multipathd and systemctl start multipathd

to enable and start the multipath service. From here we can create our zfs storage pool on multipath and carry out initial burn-in and integration tests.

The nicest thing about multipath is that a disk will be accessible even if a cable or controller flakes out on the storage bus, but it is possible to also enjoy a performance benefit. Once you’re satisfied it is stable, it is time to try an active/active configuration.

Proxmox Quirks

Once drives are part of a multipath, they no longer show in the proxmox ZFS/disk/storage gui. Tbh this gui in proxmox has always been kind of half-baked – all but the simplest use cases will require the use of the CLI anyway.

zpool create elbereth -o ashift=12 -f raidz2 mpatha mpathb mpathc mpathd mpathe mpathf mpathg mpathh mpathi mpathj mpathk mpathl raidz2 mpathm mpathn mpatho mpathp mpathq mpathr mpaths mpatht mpathu mpathv mpathx

… and I’m adding some special devices…

zpool add -f elbereth special mirror /dev/nvme2n1 /dev/nvme3n1 mirror /dev/nvme0n1 /dev/nvme1n1

finally, recordsize. The default is 128k, 512k or 1m makes more sense for my default use case. Remember you probably want a smaller record size on other datasets used for things like… VM storage.

zfs set recordsize=512k elbereth
5 Likes