HA ZFS Cluster

cw_syndeo · February 7, 2020, 4:58pm

I know this is an old conversation but I was hoping for a little help with my own HA cluster ZFS. I have built what you originally wrote up at github dot com/ewwhite/zfs-ha/wiki and it is partially functional but seeing some strange behavior and was wondering if anyone has any ideas.

I have 2 dell servers that are my Centos7 HA cluster, and have connected those via 2 HBA’s to a HP D2700 JBOD. When my primary controller is up, my zpool status shows perfect, all drives are great. When I do a test failover, and my secondary controller is up, my zpool shows a degraded state and shows that 3 drives (out of 25) are bad. I have tried using the zpool clear option to clear the status of the drives but they stay in a degraded state. Not sure why one controller shows all disks are good and one controller shows 3 bad drives. Any thoughts? Could this be caused by a bad HBA? Mini-SAS cable? Configuration issue?

Thanks in advance
A ZFS/cluster NOOB!!

P.S. - This is just a test lab, not production so there is no emergency, more of a curiosity.

oO.o · February 7, 2020, 6:00pm

@cw_syndeo welcome to L1T! Made a new topic for you.

Split from Opinion on HA ZFS Cluster for reference.

cw_syndeo · February 10, 2020, 10:26pm

Figured out my own issue. I have been trying to get this to work for awhile so there was apparently an old zpool on the JBOD I was using and it wasn’t importable. Had to do the zpool labelclear command on each disk still showing in the old cluster to remove the data and get both my controllers reporting fine.

risk · February 11, 2020, 6:42am

I never liked these setups where you have this giant SPOF that holds all your data.

I’m curious if there’s any reason you haven’t done a ceph or gluster setup with so many disks?

cw_syndeo · February 14, 2020, 8:01pm

Well, I am still learning and while I did read about both ceph and gluster, they both seemed to revolve around object storage. I know they both can do block, but I thought they did block via some sort of translation layer (and anytime you have to add a layer of translation I would assume your sacrificing some performance). Also, the point of this project was to get my feet wet with ZFS, and I know, I am going way overboard by creating a cluster along with figuring out ZFS but I do love a challenge(I blame Wendell for this). Eventually I do intend on investigating ceph more (as its part of the openstack), right now, I want to dive more into ZFS.