Recover Bluestore OSD in Ceph cluster

The journey to CephFS metadata pool’s recovery

The epic intro

Through self-inflicted pain, I’m writing here to ask for volunteers in the journey of recovering the lost partitions housing CephFS metadata pool.

The setup

1 proxmox host (I know)
1 replication rule only for NVMes (2x OSD)
1 replication rule only for HDDs (8x OSD)
Each with failure domain to osd.
Each OSD configured to use bluestore backend in an LVM.
No backup (I know, I know).

The cause (me)

Long story short: I needed the PCIe lanes and decided to remove the two NVMEs that were hosting the metadata pool for CephFS and .mgr pool. I proceeded to remove the two OSDs (out and destroy).
This is where I’ve done goof: I didn’t change the replication rule to HDDs’ one, so the cluster never moved the PGs stored in the NVMes, to the HDDs.

What I’ve done untill now

  1. Re-seated the NVMes to their original place.
  2. Found out that the LVM didn’t have the OSD’s labels applied
  3. Forced the backed-up LVM config to the two NVMes (thanks to the holy entity that thought that archiving LVM config was a good thing, it payed back)
  4. Trying ceph-volume lvm activate 8 <id> to find out that it’s unable to decode label at offset 102 in the LVM for that ODS.

Wishes

  1. Does anyone know a way to recover what I feel is a lost partition, given that the “file system” is ceph’s bluestore?
  2. Is there a way to know, if it is, how the partition has been nuked? And possibly find a way to reverse that process.

Closing statement

Eternal reminder: If you don’t want to lose it, back it up.
Thanks for your time, to the kind souls that are willing to die on this hill with me, or come up victorious!

The fact that we’re on bluestore makes it a bit more difficult :frowning_face: Ceph is really designed to tolerate these single OSD failures and walk it off like nothing happened.

This means that recovery tools that support bluestore are likely few and far between.

You may be able to get the disk into a better state by goofing around with testdisk, but I would try this with a cloned image first before attempting to fuss with the physical disk.

Unfortunately, my Ceph experience is ancient, so I don’t know if I can provide a solid solution here. :confused:

It is actually my current strategy. Heh. Keeping this thread updated on the journey!
Thanks for tuning in!

1 Like

I would ask on the ceph-users mailing list if I were you.

So could you reconstruct all the lv_tags on the lv’s? Did you use encrypted osd?

1 Like

Thanks a lot for the pointer!
Yes, i managed to re-apply the tags, the LVs are actually viewed as OSD.8 and OSD.9.
But unfortunately the ceph-volume lvm activate is unable to find what i suppose is a superblock indicating that the volume is formatted using ceph_bluestore.
Trying to mount an HDD, it finds the bluestore superblock, but doesn’t know how to mount it (it’s not supposed to, to do that you need a FUSE tool ceph’s providing).
But trying to mount the LV, mount itself is not finding the superblock.

1 Like

Sorry I missed that part, fortunately no. One less abstraction layer to solve.