ZFS Zpool replace disk with same device id

HI there

I have a setup with several NetApp DS4246 shelves attached to my Ubuntu box.
I have created a nice setup in the vdev_id.conf file so that it creates nice device names that patch the shelf slots… Since I work with NetApps for a living, they are “0a.00->0a.23” for the shelf on SAS HBA A, and 0b.00->0b.23 on SAS HBA B… etc…
I was unable to get multipathing working, not sure why, because it works fine when using a NetApp controller… but anyway… this has been working fine, and I am able to identify my disks in the shelfs which was the point of this setup…

Now a disk has failed, and I have to replace it…
Lets say is was 0a.16, I then replace the disk in the shelf.
The new disk shows up with the same vdev name…

But I cannot get “zpool replace aggr0 0a.16 0a.16” to work, maybe because it thinks that this is still the same disk which I am trying to replace…

The way I got it woking was with “zpool replace aggr0 0a.16 sdaq” but of cause now my fine setup is “broken” and the sdaq device stands out like a sore thumb :wink:

What should I have done here in order to keep my vdev id ?

Once the resilvering is done, is there any way to fix this back to the correct vdev name? Other than “zpool export aggr0” and “zpool import -d /dev/disk/by-vdev” ?

Any help is welcome :slight_smile:

BTW: Since ZFS and NetApp WAFL has a similar feature set, has anyone thought of creating a “shell” where it is possible to use NetApp specific commands, which is then translated into ZFS-commands? Would be great for some like me who has been working with NetApp’s for over 15 years :wink:

1 Like

This is literally what I was gonna ask, but only because I use /by-id was not sure if one Can do it by /label or /vdev or whatever.

Have you tried it? Did it not work?

Well it’s 8TB disks so as soon as it has completed resilvering I will tell you :slight_smile:

But it is kind of a bummer that you have to the it offline, export it etc… would be nice if you could just do like a “mount -o remount”-kinda thing…

/Heino

1 Like

Do you seriously have a /by-vdev folder? I don;t have one of those. then again I don’t have shelves of carefully organised drives, addressed by shelf, bay, controller or anything.

If it were just a dummy / inconsequential disk, then maybe starting the replace to /dev/sdx then immediately off-lining it, and replacing it with the /dev/whatever might work, before it fully resilvers, but I don;t personally play fast and loose with drives, and that is just pure speculation.

I tested this in my ZFS test VM (you have one of those, right :grin: )

Assuming you mapped to the physical device ports in vdev_id.conf like:

alias 0a.00       /dev/disk/by-path/virtio-pci-0000:09:00.0
alias 0a.01       /dev/disk/by-path/virtio-pci-0000:0a:00.0
alias 0a.02       /dev/disk/by-path/virtio-pci-0000:0b:00.0
alias 0a.03       /dev/disk/by-path/virtio-pci-0000:0c:00.0
...
This is what it looks lone on my physical box
# ZFS DISK ALIASES
#
# SSD
# BOOT
alias Boot00 /dev/disk/by-path/pci-0000:01:00.0-nvme-1-part3
alias Boot01 /dev/disk/by-path/pci-0000:02:00.0-nvme-1-part3
# SPECIAL
alias Special00 /dev/disk/by-id/nvme-SAMSUNG_MZPLJ6T4HALA-00007_S55KNG0R100269
alias Special01 /dev/disk/by-id/nvme-SAMSUNG_MZPLJ6T4HALA-00007_S55KNG0R100265
# HDD
alias Bay00 /dev/disk/by-path/pci-0000:41:00.0-sas-exp0x500304802094857f-phy0-lun-0
alias Bay01 /dev/disk/by-path/pci-0000:41:00.0-sas-exp0x500304802094857f-phy1-lun-0
alias Bay02 /dev/disk/by-path/pci-0000:41:00.0-sas-exp0x500304802094857f-phy2-lun-0
alias Bay03 /dev/disk/by-path/pci-0000:41:00.0-sas-exp0x500304802094857f-phy3-lun-0
alias Bay04 /dev/disk/by-path/pci-0000:41:00.0-sas-exp0x500304802094857f-phy4-lun-0
alias Bay05 /dev/disk/by-path/pci-0000:41:00.0-sas-exp0x500304802094857f-phy5-lun-0
alias Bay06 /dev/disk/by-path/pci-0000:41:00.0-sas-exp0x500304802094857f-phy6-lun-0
alias Bay07 /dev/disk/by-path/pci-0000:41:00.0-sas-exp0x500304802094857f-phy7-lun-0
alias Bay08 /dev/disk/by-path/pci-0000:41:00.0-sas-exp0x500304802094857f-phy8-lun-0
alias Bay09 /dev/disk/by-path/pci-0000:41:00.0-sas-exp0x500304802094857f-phy9-lun-0
alias Bay10 /dev/disk/by-path/pci-0000:41:00.0-sas-exp0x500304802094857f-phy10-lun-0
alias Bay11 /dev/disk/by-path/pci-0000:41:00.0-sas-exp0x500304802094857f-phy11-lun-0
alias Bay12 /dev/disk/by-path/pci-0000:41:00.0-sas-exp0x500304802094857f-phy28-lun-0
alias Bay13 /dev/disk/by-path/pci-0000:41:00.0-sas-exp0x500304802094857f-phy29-lun-0
alias Bay14 /dev/disk/by-path/pci-0000:41:00.0-sas-exp0x500304802094857f-phy30-lun-0
alias Bay15 /dev/disk/by-path/pci-0000:41:00.0-sas-exp0x500304802094857f-phy31-lun-0
alias Bay16 /dev/disk/by-path/pci-0000:41:00.0-sas-exp0x500304802094857f-phy32-lun-0
alias Bay17 /dev/disk/by-path/pci-0000:41:00.0-sas-exp0x500304802094857f-phy33-lun-0
alias Bay18 /dev/disk/by-path/pci-0000:41:00.0-sas-exp0x500304802094857f-phy34-lun-0
alias Bay19 /dev/disk/by-path/pci-0000:41:00.0-sas-exp0x500304802094857f-phy35-lun-0
alias Bay20 /dev/disk/by-path/pci-0000:41:00.0-sas-exp0x500304802094857f-phy36-lun-0
alias Bay21 /dev/disk/by-path/pci-0000:41:00.0-sas-exp0x500304802094857f-phy37-lun-0
alias Bay22 /dev/disk/by-path/pci-0000:41:00.0-sas-exp0x500304802094857f-phy38-lun-0
alias Bay23 /dev/disk/by-path/pci-0000:41:00.0-sas-exp0x500304802094857f-phy39-lun-0

These are the steps

Paranoid long version steps

Identify the physical device

cat /etc/zfs/vdev_id.conf | grep -i 0a.01 (or whatever disk is in question)

list the block device

lsblk /dev/..... <--output  from previous command
   # *may* return the device info, depends on how bad off the drive is.

tell zfs to offline the disk

zpool offline __pool__ __device__

remove the disk (I believe you can hotswap based on your setup)
verify it’s no longer listed

lsblk /dev/..... <--same as other  lsblk command
   # should error with "not a block device"

install the replacement
list the block device

lsblk /dev/..... <--same as other  lsblk command
   # should return the device info

Online the disk in ZFS

zpool online __pool__ __device__

officially replace the device in the pool

zpool replace __pool__ __device__
    # notice that I don't indicate a replacement device,
    # as a it's the same device as the source
Just do it steps

YANK THE FAILED DISK OUT, SLAP THE NEW ONE IN
TELL ZFS TO DO ALL THE WORK, NOW DAMNIT

zpool replace __pool__ __device__

These
REFs:

https://openzfs.github.io/openzfs-docs/man/8/zpool-offline.8.html
https://openzfs.github.io/openzfs-docs/man/8/zpool-replace.8.html

1 Like

Also, in your case now… You should be able to

Zpool replace pool vdaq 0a.xx

after taking vdaq offline as noted above to get back where you want to be… with another resilver

I think this will work without any extra steps, but if not, you may need to

Offline the device

remove the zfs label
See pool-labelclear.8[](zpool-labelclear.8 — OpenZFS documentation

To be safe clear the disk partition table

Online the drive in zfs

Replace the drive using

zpool replace pool vdaq 0a.xx

Let us know how it goes

Just for reference, this is my vdev_id.conf file…

multipath no
topology sas_direct
phys_per_port 4
slot bay
enclosure_symlinks yes
channel 03:00.0 0 0a.
channel 03:00.0 1 0b.
channel 03:00.0 1 0c.
channel 03:00.0 1 0d.

I do not alias every drive in my shelfs… that would defeat the purpose of this vdev file, and be a total nightmare to maintain :wink:
With this file, I get the /dev/disks/by-vdev/… with devices 0a.0 → 0a-23 etc…

Once the current resilvering is done, what would happen if I just did another replace… like (zpool replace aggr0 sdaq 0a.16) ? Would it do a whole new resilvering, or would it see that the disk already have the data on it, and do nothing?

I believe it would resilver as it is a “new” device, but that’s an educated guess and it wouldn’t take the pool offline. Next time (when, not if), try using zfs to offline the disk as noted above, it works for my setup (zfs 2.1 branch).

As for the vdev.conf, with my setup, I don’t trust that the devices won’t be renamed (just a hba going to a backplane), and it is a one time activity to define them in the conf with an alias. If I see a red light on the front, it maps to the bay entries in the conf I posted, no thought required.

With disk shelves, you are probably accomplishing the same thing, but I’ve never used NetApp shelves outside of NetApp hardware running ontap though.

This may be my paranoia more than anything else.

…about NetApp shelfs, I have work with them for a lot of years and they are rock solid, both with NetApp hardware and with HBAs attached to a server…
Of cause you will get more advanced features if it is connected to a NetApp controller… such as multipathing and out of band shelf management etc…
But the shelf is essential just two SAS-expanders… I have seen the same shelfs used on HP storage systems as well… In fact I think someone told me that it is in fact Fujitsu that OEMs these shelfs for NetApp and others…

In all the years I have been working with NetApp equipment I think I have replaced one disk shelf… of cause many disks and some PSUs and a few IOM modules (the SAS-expanders)…

It will take awhile to resilver my pool because I first had to format the NetApp disk from 520 blocks to 512 blocks with the sg_format tool… which takes just as long as the resilver :slight_smile:

I think I will go with the zpool offline approach, as it just seems a bit safer :wink:

I will keep the thread updated once I have completed the process…

1 Like

Hello again,

Resilvering done, and export/import -d /dev/disk/by-vdev did the trick :slight_smile:

1 Like

Nice, good to hear!

Out of interest, which OS/distro/spin are you running?