Gluster with cached LVM different performance on nodes

Franco_Martin · January 7, 2023, 4:22pm

Hi everyone! long time lurker, first time poster.
I’ve got a gluster replicated volume running under proxmox. Two bricks with an arbiter brick in another node. This gluster cluster provides storage for my three node kubernetes cluster.
The nodes are running gluster 10.1 on ubuntu 22.04.1 with 1TB of HDD storage and a 19GB nvme cache.
Let me give you some specs:
Proxmox nodes:
- CPU: i5-6500T
- RAM: 16GB
- HDD (pve01): Seagate ST2000LM007 (2TB 2.5’’)
- SSD (pve01): Crucial CT500P2SSD8 (500GB nvme m2)
- HDD (pve02): WD WDC_WD20SPZX(Blue 2TB 2.5’’)
- SSD (pve02): Samsung 970 EVO Plus (500GB nvme m2)

Gluster VMs have 2GB of ram and 1 vCPU, 1000GB of HDD and 20GB of SSD.
Im using LVM caching to speed the volume up.

lvs output on gluster01(pve01)

  LV        VG        Attr       LSize     Pool             Origin       Data%  Meta%  Move Log Cpy%Sync Convert
  data      gluster   Cwi-aoC--- <1000.00g [lv_cache_cpool] [data_corig] 15.41  0.50            0.18            
  ubuntu-lv ubuntu-vg -wi-ao----   <15.00g

lvs output on gluster02 (pve02)

  LV        VG        Attr       LSize     Pool             Origin       Data%  Meta%  Move Log Cpy%Sync Convert
  data      gluster   Cwi-aoC--- <1000.00g [lv_cache_cpool] [data_corig] 99.99  0.74            0.01            
  logs      ubuntu-vg -wi-ao----    10.00g                                                                      
  ubuntu-lv ubuntu-vg -wi-ao----   <15.00g

I have noticed that the node running on pve01 has way higher latency

PVE01 also has iowait consistently over 10%, while PVE02 is usually under 5%.

These PVE nodes also run some opensearch nodes, here are their iowait metrics:

My questions are:

I understand that the SSD on pve01 is worse than the one on pve02, but can the difference in performance be explained by a difference in SSDs?
Why would gluster01 have Cache used blocks at 15.59% and gluster02 at 99%?