Reccomendation for a Cloud Backup solution

Have to make a redundant backup solution for roughly 10k users that will have enough redundancy to loose multiple drives or multiple servers while still retaining availability.

I’ve been thinking of using GlusterFS across multiple servers, but have also been thinking of using S3.

If anyone has a better Idea I’m open to suggestions. High-speed isn’t a priority. Storage will be roughly 1PetaByte

If you want to host it yourself, Gluster or Ceph are going to be your best options.

1PB is going to be prohibitively expensive at S3. My recommendation: Just build your own cluster.

We built 840TB in 2015 for about 250k. I’m sure you can do it much cheaper now.

If you need 1PB of usable storage and want to tolerate multiple server failures, you’re going to need to follow the 3 copies, multi-rack rule. This means you’ll want 3PB of raw disk for your OSDs (I’ll get to this later). I’ll give an example using Ceph because that’s what I’m familiar with.

As far as connection goes, there are a few options:

  • RBD (RADOS Block Device) is a virtual block device.
  • CephFS is going to be similar to Gluster’s filesystem.
  • RGW (Ceph Object/RADOS Gateway) REST API which is S3 or Swift compatible.

You’ll need a few management servers. With this amount of data, you’re going to need dedicated metadata and monitor servers. They operate in a cluster of their own, with a quorum, so an odd number 3+ is recommended. Dedicated gateway nodes is also recommended if performance is an issue. Expect 2/3 wire speed on sequential reads or writes if you don’t.

On to OSD (object storage daemon) nodes. Each disk will get its own daemon. Each server will have a couple SSDs for caching. (my recommendation, at least). It will make the network your bottleneck rather than the disks.

When you build out your cluster, you can tell the cluster (when building what’s called placement groups) which server resides in which rack, and which disk resides on which server. Ceph will automatically place the 3 copies of an object on different racks, to help prevent against rack failure.

I’m not sure what your performance requirements are, but a cluster this size should be able to fully saturate 10GbE, up and down.

This is extemely oversimplified, but I thought it would be a good jumping off point for you.


Addendum, hardware ramblings:

I highly recommend 10GbE at least in Ceph. If you ever need to reweight, or recover from failure, you’re going to thank me.

As far as RAM goes, more is better. 1GB per TB of storage is a good rule of thumb, but that’s not always realistic. As long as you have at the very minimum, 500MB per daemon on OSD nodes, you’ll be fine. On metadata nodes, you’ll want to have 1GB per daemon instance, or as much as you can shove into it.

Make sure your disk controllers are pure passthrough. (IT mode) It’s critically important.

Hardware recommendations:

OSD:

  • RAM: 1GB per TB
  • Storage: 1 disk per daemon
  • Journal: 1 SSD partition per daemon (only needed if you want to maximize throughput)
  • NIC: 2x10GbE
  • CPU: single socket, 8 cores.

MON:

  • RAM: 1GB per daemon
  • DISK: 10GB per daemon
  • NIC: 2x10GbE
  • CPU: 4 cores minimum (at least, for a deployment of this scale)

MDS:

  • RAM: 1GB per daemon
  • DISK: 1MB per daemon
  • NIC: 2x10GbE
  • CPU: 16 cores minimum (for a deployment of this scale)

I recommend using 4 or 6 TB disks with ~36 storage disks per server. Supermicro makes some nice 36 bay chassis. If you make the single-node storage too large, you could start having serious problems if a rack fails.

2 Likes

Yes! thanks for this info. I’ve got some reading to do on it, but this is quality info. Thank you very much.

No problem. I’ve added a bit more info to my previous post.