LVM RAID with SSD Cache Guide

This guide is the result of my sifting through hundreds of pages of Admin manuals, wikis, man pages and kernel developer notes. I am writing this so you don’t have to go through what I just did.

In this particular case, I will be using RAID 6 as that is what I just did.

There are some terms you will need to become familiar with.

PV = Physical Volume aka the Hard Drive or SSD.
VG = Volume Group = a collection of PVs
LV = Logical Volume = think like partitions of the VG

There are several RAID types that LVM can do such as RAID 0, 1, 4, 5, 6, 10.

The difference between RAID 5 and RAID 6 is that RAID 5 is single parity and RAID 6 is dual parity.

In this guide, I will be using the drive quantity of 8.
This guide will also include making a RAID 1 Cache-pool that will cache the RAID 6 array.

All told, there are 10 drives in this setup.

This guide assumes you have lvm installed on your server or computer.

The first step is ensuring your drives are clean. Remove any partitions and data from the drives before starting.

Next, you will need to make the RAID drives into PVs.

Start by using the following to show all block devices.
sudo lsblk

You should see devices like /dev/sda.

Let’s make them Physical Volumes.

sudo pvcreate /dev/sd[b-h] I shortened the amount of typing needed by using [b-h] which tells PVCREATE to use all the devices starting from /dev/sdb to /dev/sdh.

It should look something like this:

[root@server ~]# pvcreate /dev/sd[b-h]
  Physical volume "/dev/sdb" successfully created.
  Physical volume "/dev/sdc" successfully created.
  Physical volume "/dev/sdd" successfully created.
  Physical volume "/dev/sde" successfully created.
  Physical volume "/dev/sdf successfully created.
  Physical volume "/dev/sdg" successfully created.
  Physical volume "/dev/sdh" successfully created.

Next, run “pvscan” to show what is now available to use in LVM.

[root@server ~]# pvscan
  PV /dev/sdb                       lvm2 [<9.10 TiB]
  PV /dev/sdc                       lvm2 [<9.10 TiB]
  PV /dev/sdd                       lvm2 [<9.10 TiB]
  PV /dev/sde                       lvm2 [<9.10 TiB]
  PV /dev/sdf                       lvm2 [<9.10 TiB]
  PV /dev/sdg                       lvm2 [<9.10 TiB]
  PV /dev/sdh                       lvm2 [<9.10 TiB]

Now that we have made PVs, we can make the Volume Group (VG).

In this case we will use the example VG name: Library.
DON’T add the SSD(s) you are going to use for the Cache.

[root@server ~]# vgcreate Library /dev/sd[b-h]

The Volume Group “Library” should have successfully been created at this point.

Now, you can run “vgs” to show the Volume Groups on the system.

The next step in this process is making the RAID 6 array from the VG.

Since RAID 6 uses 2 of your 8 drives for Parity, you will need to subtract 2 from your drive count for the command “lvcreate”.

Example:

[root@server ~]# lvcreate --type raid6 -i 6 -l 100%FREE -n LibraryVolume Library

The variable “–type” is the type of RAID.
The variable “-i” refers to the number of drives that are NOT used for parity.
The variable “-l” is saying to make 100% of the capacity available.
The variable “-n” is the name you want for the Logical RAID Volume
The last Variable is the VG. In this case the VG is Library.

Basically you are making a RAID 6 array called LibraryVolume from the Volume Group called Library.

Because we said raid6 and listed 6 of the possible 8 PVs in the Library VG, it automatically made the remaining 2 PVs in the Library VG the Parity devices.

Now, if you run “lvs” you should see the LVs you have available.

The next step is where we will add a CachePool.

You will want to make them PVs.
In my case, I used NVMe SSDs so they showed up as /dev/nvme0n1 and /dev/nvme1n1.

pvcreate /dev/nvme0n1 /dev/nvme1n1

Next, you will add them to the Volume Group Library. This will not add them to the Logical Volume RAID.

We will use the command: vgextend

vgextend Library /dev/nvme0n1 && vgextend Library /dev/nvme1n1

We now need to make 2 additional LVs: the CacheLV and the MetadataLV.

Let’s start with the Cache LV.

I have 2x 250GB NVMe SSDs for caching.

I am going to use RAID1 to mirror the cache drives so that I have some redundancy in case one fails.

lvcreate --type raid1 -m 1 -L 225G -n cache1 Library /dev/nvme0n1 /dev/nvme1n1

-m” is how many mirrors. In this case it was 1.
-L” is the size of the cache. I took the total available capacity of ~238GB and knocked about 10GB off to give room for the Metadata LV.

Next let’s create the Metadata LV for the Cache.

lvcreate --type raid1 -m 1 -L 225M -n cache1meta Library /dev/nvme0n1 /dev/nvme1n1

Let’s break this one down.
The Syntax would be: lvcreate --type -m #number of mirrors -L size -n Name VG Device1 Device2

To get a decent MetaCache size, take the size of Cache1, convert it to Megabytes, divide by 1000 and you get the number I used for “-L

The next step involves converting the two LVs we just made into 1 CachePool LV.

lvconvert --type cache-pool --poolmetadata Library/cache1meta Library/cache1

Here, we are making a “cache-pool” with the cache-pool metadata included.
the first part is VG/LV for Meta followed by VG/LV for Cache1

The final step in this process is converting the RAID6 LV into a Cached RAID6 LV.

lvconvert --type cache --cachepool Library/cache1 --cachemode writethrough Library/LibraryVolume

In this command, we made a Cache Pool for the RAID 6 LV named LibraryVolume using the Writethrough mode which ensures that any data written will be stored on both the cache pool and on the RAID which helps prevent data loss should a device in the cache pool LV fail.

You could also use Writeback Mode (usually found on UnRAID for example).

After this, it is simply formatting the resulting Logical Volume to a format like XFS or EXT4, etc and mounting it to a location as well as adding it to FSTAB so it gets mounted at boot.

Basically that is all you have to do to make an SSD Cached RAID6 LVMRAID array.

The following is an example of lsblk after doing the above steps.

NAME                                        MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
sda                                           8:0    0   3.7T  0 disk 
└─sda1                                        8:1    0   3.7T  0 part 
  ├─Library-LibraryVolumeLMV_corig_rmeta_0  253:0    0     4M  0 lvm  
  │ └─Library-LibraryVolumeLMV_corig        253:27   0  21.9T  0 lvm  
  │   └─Library-LibraryVolumeLMV            253:16   0  21.9T  0 lvm  /mnt/Library
  └─Library-LibraryVolumeLMV_corig_rimage_0 253:1    0   3.7T  0 lvm  
    └─Library-LibraryVolumeLMV_corig        253:27   0  21.9T  0 lvm  
      └─Library-LibraryVolumeLMV            253:16   0  21.9T  0 lvm  /mnt/Library
sdb                                           8:16   0   3.7T  0 disk 
└─sdb1                                        8:17   0   3.7T  0 part 
  ├─Library-LibraryVolumeLMV_corig_rmeta_1  253:2    0     4M  0 lvm  
  │ └─Library-LibraryVolumeLMV_corig        253:27   0  21.9T  0 lvm  
  │   └─Library-LibraryVolumeLMV            253:16   0  21.9T  0 lvm  /mnt/Library
  └─Library-LibraryVolumeLMV_corig_rimage_1 253:3    0   3.7T  0 lvm  
    └─Library-LibraryVolumeLMV_corig        253:27   0  21.9T  0 lvm  
      └─Library-LibraryVolumeLMV            253:16   0  21.9T  0 lvm  /mnt/Library
sdc                                           8:32   0   3.7T  0 disk 
└─sdc1                                        8:33   0   3.7T  0 part 
  ├─Library-LibraryVolumeLMV_corig_rmeta_2  253:4    0     4M  0 lvm  
  │ └─Library-LibraryVolumeLMV_corig        253:27   0  21.9T  0 lvm  
  │   └─Library-LibraryVolumeLMV            253:16   0  21.9T  0 lvm  /mnt/Library
  └─Library-LibraryVolumeLMV_corig_rimage_2 253:5    0   3.7T  0 lvm  
    └─Library-LibraryVolumeLMV_corig        253:27   0  21.9T  0 lvm  
      └─Library-LibraryVolumeLMV            253:16   0  21.9T  0 lvm  /mnt/Library
sdd                                           8:48   0   3.7T  0 disk 
└─sdd1                                        8:49   0   3.7T  0 part 
  ├─Library-LibraryVolumeLMV_corig_rmeta_3  253:6    0     4M  0 lvm  
  │ └─Library-LibraryVolumeLMV_corig        253:27   0  21.9T  0 lvm  
  │   └─Library-LibraryVolumeLMV            253:16   0  21.9T  0 lvm  /mnt/Library
  └─Library-LibraryVolumeLMV_corig_rimage_3 253:7    0   3.7T  0 lvm  
    └─Library-LibraryVolumeLMV_corig        253:27   0  21.9T  0 lvm  
      └─Library-LibraryVolumeLMV            253:16   0  21.9T  0 lvm  /mnt/Library
sde                                           8:64   0   3.7T  0 disk 
└─sde1                                        8:65   0   3.7T  0 part 
  ├─Library-LibraryVolumeLMV_corig_rmeta_4  253:8    0     4M  0 lvm  
  │ └─Library-LibraryVolumeLMV_corig        253:27   0  21.9T  0 lvm  
  │   └─Library-LibraryVolumeLMV            253:16   0  21.9T  0 lvm  /mnt/Library
  └─Library-LibraryVolumeLMV_corig_rimage_4 253:9    0   3.7T  0 lvm  
    └─Library-LibraryVolumeLMV_corig        253:27   0  21.9T  0 lvm  
      └─Library-LibraryVolumeLMV            253:16   0  21.9T  0 lvm  /mnt/Library
sdf                                           8:80   0   3.7T  0 disk 
└─sdf1                                        8:81   0   3.7T  0 part 
  ├─Library-LibraryVolumeLMV_corig_rmeta_5  253:10   0     4M  0 lvm  
  │ └─Library-LibraryVolumeLMV_corig        253:27   0  21.9T  0 lvm  
  │   └─Library-LibraryVolumeLMV            253:16   0  21.9T  0 lvm  /mnt/Library
  └─Library-LibraryVolumeLMV_corig_rimage_5 253:11   0   3.7T  0 lvm  
    └─Library-LibraryVolumeLMV_corig        253:27   0  21.9T  0 lvm  
      └─Library-LibraryVolumeLMV            253:16   0  21.9T  0 lvm  /mnt/Library
sdg                                           8:96   0   3.7T  0 disk 
└─sdg1                                        8:97   0   3.7T  0 part 
  ├─Library-LibraryVolumeLMV_corig_rmeta_6  253:12   0     4M  0 lvm  
  │ └─Library-LibraryVolumeLMV_corig        253:27   0  21.9T  0 lvm  
  │   └─Library-LibraryVolumeLMV            253:16   0  21.9T  0 lvm  /mnt/Library
  └─Library-LibraryVolumeLMV_corig_rimage_6 253:13   0   3.7T  0 lvm  
    └─Library-LibraryVolumeLMV_corig        253:27   0  21.9T  0 lvm  
      └─Library-LibraryVolumeLMV            253:16   0  21.9T  0 lvm  /mnt/Library
sdh                                           8:112  0   3.7T  0 disk 
└─sdh1                                        8:113  0   3.7T  0 part 
  ├─Library-LibraryVolumeLMV_corig_rmeta_7  253:14   0     4M  0 lvm  
  │ └─Library-LibraryVolumeLMV_corig        253:27   0  21.9T  0 lvm  
  │   └─Library-LibraryVolumeLMV            253:16   0  21.9T  0 lvm  /mnt/Library
  └─Library-LibraryVolumeLMV_corig_rimage_7 253:15   0   3.7T  0 lvm  
    └─Library-LibraryVolumeLMV_corig        253:27   0  21.9T  0 lvm  
      └─Library-LibraryVolumeLMV            253:16   0  21.9T  0 lvm  /mnt/Library
sdi                                           8:128  0 223.6G  0 disk 
└─sdi1                                        8:129  0 223.6G  0 part /
nvme1n1                                     259:0    0 238.5G  0 disk 
├─Library-cache1_cdata_rmeta_1              253:19   0     4M  0 lvm  
│ └─Library-cache1_cdata                    253:21   0   225G  0 lvm  
│   └─Library-LibraryVolumeLMV              253:16   0  21.9T  0 lvm  /mnt/Library
├─Library-cache1_cdata_rimage_1             253:20   0   225G  0 lvm  
│ └─Library-cache1_cdata                    253:21   0   225G  0 lvm  
│   └─Library-LibraryVolumeLMV              253:16   0  21.9T  0 lvm  /mnt/Library
├─Library-cache1_cmeta_rmeta_1              253:24   0     4M  0 lvm  
│ └─Library-cache1_cmeta                    253:26   0   228M  0 lvm  
│   └─Library-LibraryVolumeLMV              253:16   0  21.9T  0 lvm  /mnt/Library
└─Library-cache1_cmeta_rimage_1             253:25   0   228M  0 lvm  
  └─Library-cache1_cmeta                    253:26   0   228M  0 lvm  
    └─Library-LibraryVolumeLMV              253:16   0  21.9T  0 lvm  /mnt/Library
nvme0n1                                     259:1    0 238.5G  0 disk 
├─Library-cache1_cdata_rmeta_0              253:17   0     4M  0 lvm  
│ └─Library-cache1_cdata                    253:21   0   225G  0 lvm  
│   └─Library-LibraryVolumeLMV              253:16   0  21.9T  0 lvm  /mnt/Library
├─Library-cache1_cdata_rimage_0             253:18   0   225G  0 lvm  
│ └─Library-cache1_cdata                    253:21   0   225G  0 lvm  
│   └─Library-LibraryVolumeLMV              253:16   0  21.9T  0 lvm  /mnt/Library
├─Library-cache1_cmeta_rmeta_0              253:22   0     4M  0 lvm  
│ └─Library-cache1_cmeta                    253:26   0   228M  0 lvm  
│   └─Library-LibraryVolumeLMV              253:16   0  21.9T  0 lvm  /mnt/Library
└─Library-cache1_cmeta_rimage_0             253:23   0   228M  0 lvm  
  └─Library-cache1_cmeta                    253:26   0   228M  0 lvm  
    └─Library-LibraryVolumeLMV              253:16   0  21.9T  0 lvm  /mnt/Library

You will notice that in my case, /dev/sdi is my Root (/) SSD for the OS and it is not in any LVM group.

11 Likes

Nitpick: You seem to switch between including /dev/sda and not including it. sda is part of your Library down at the end, but isn’t included in your setup instructions at the top.

Dumb question, but if i want to do the same with a ramdisk, should i search for write only cache ? or using write-through is enought ?

What sort of advantage do you plan to get from a ramdisk that you wouldn’t get from the regular Linux file cache?

This is because, Usually, /dev/sda is the root device. In this case, mine was /dev/sdi.

The point of this guide is for you to use lsblk to determine the device names to use BEFORE continuing the guide.

Also, the reference to /dev/sda in the beginning is to give an example of what you are looking for.

I simply have to much ram (old video computer, 32gb) and no spare ssd since a recent magic smoke accident :sweat_smile:. I try to have a read cache for my iscsi steam library

Hi Friends,
i am planning a new setup for my main machine on basis of lvmraid. I am using the Taichi X570 with a Matisse CPU. So i have 3 damn fast M.2 Gen4 SSDs and 8 SATA Conns.
First Plan was to create a RAID 5 with the 3 NVMEs for root and home and to bring in the SATA SSDs per on demand mount points (backup, archive, media, etc)
Could you estimate to what extend your setup, where i would use the 3 NVMEs only for cache and metadata, will reach the speed of the native RAID5 on the 3 NVMEs? I don’t have any “feeling” about that. So if you have some experience with your setup, you could help me a lot for a decision, even if it’s a rough guess only!
Thanks a lot and best regards from Wurzburg, Germany
Daniel

probably worth trying a few different setups to find what you like best.

Try root and home on their own SSD.

Then try the raid5 of them…

Test out performance on each… see what feels faster?

Without testing i would guess that the RAID5 over the 3 NVMEs is the fastest solution. But with respect to cost/capacity i would be happy for a guess “how much” speed i loose with the NVME Cache + SATA Storae solution of this concept here…
For my RAID5 with NVMEs i would have 1TB capacity with 3 disks.
With the SATA + cache solutiion i would have 4TB with 6 disks + 2 small NVMEs.
The Costs would be comparable. So if i had an advice how fast the caching solution is, i would definitely build this. But if i loose lets say 50% speed, i would compromise the capacity and go for the superfast NVME RAID…

Little bits of understanding:
The cache is writeback, so i have the full NVME Speed in writing, but i need a RAID1 Setup for the Cache. But no Problem, i could write files up to the cache Size, lets say 250GB with full speed. So not too much loss there…
What about the random reads? There is the main speedloss of the SATA+Cache solution, correct? For the root FS the Cache rate could be fairly acceptable, but on home… no clue… i think there will be not too much more than the SATA speed for the random reads…

How do you run the maintenance commands on the raid part of the cached-LVs? None of the usual commands for checking sync and such seem to work on the hidden LVs behind the cache. At least I can’t figure out how and I also haven’t been able to find anything online about it either.

I’m starting to suspect that I have to first turn off the cache and then do all the checks and maintenance steps. That would be a massive pain if I have to do that for every blown fuse or the occasional poweroutage.

you should be able to run the scrub command just fine. I had no issues doing that.

I have since been on ZFS and from my experience on ZFS I have a suggestion for how to do your RAID5 arrays to get both resiliency and speed.

On ZFS I am using Nested RAIDZ aka Nested RAID5. I can have 1 drive fail for each vdev giving me the same as RAID6.

Nested RAID5 would also give you the R/W performance. It is either a Mirror of RAID5 or Stripe of RAID5.

That’s effectively raid 50… Not raid 6

RAID 6 is 2 drive redundancy. I am working with Mirrored RaidZ1 arrays:

where does it say those are mirrored?

I dont think you can mirror raidz vdevs…

It has been awhile since I set it up. I think I striped the 2 vdevs in the pool.

But from some lite reading, yes you can mirror 2 RAIDZ vdevs in the same pool.

Using a RAIDz calc you can play with the numbers a bit and see. 2 groups, 4x4TB drives each. Striped groups of RAIDz1 disks.

Additional disks can be added similarly to a RAID-Z configuration. The following example shows how to convert a storage pool with one RAID-Z device that contains three disks to a storage pool with two RAID-Z devices that contains three disks each.

The total available space with striped RAIDz1 vdevs in the same pool matches my available space.

Also, that image is just from ZWatcher. It isnt super detailed. It just gives me the basic layout and if there are errors

Try lvchange --syncaction check <LV>_corig (or repair instead of check).

This has been very helpful, thank you. I’m pretty well-versed in LVM but I’ve never done this before, and it seems like there are no other guides that cover the case when you want to mirror or stripe your cache drives. I’ve got a stripe of NVMe drives writeback-caching a mirror of SATA drives and it works great!

REDACTED

If your cache data and metadata are on the same device(s), you can skip these two steps. When you run the subsequent lvconvert --type cache --cachepool command exactly as-is it will automatically reserve 0.1% of the cache LV for metadata.

The extra steps are useful if you have e.g. a larger–slower cache drive and a separate smaller–faster cache drive and want to put cached blocks on the former and metadata on the latter.

EDIT: Oops! I am quite mistaken. It will automatically reserve 0.1% of the cache LV for metadata, but it’ll put it on the first available extents in your volume group. If you want it striped or mirrored and/or stored on another device these extra steps are indeed required.

No, you can’t. Sorry. You can’t put a vdev in a vdev, and if you have more than one vdev they are always striped.

1 Like

That was quite some time ago. I now use ZFS.

Thanks for putting this guide together, very helpful. I’m curious about your experience with ZFS. How is read/write performance in comparison to your raid6 setup?