I’m currently experimenting with scaling databases with containers. Specifically in Docker Swarm. I’ve found many technologies for scaling databases, but the structure for this type of thing is something that’s not intuitive to me.
The objectives I’m trying to achieve are:
Synchronous writes.
Shared storage of database files.
Master-Slave replication.
Database Proxying.
I believe all are achievable in some form. The point of #2 is that I don’t want to make data redundant. i.e. say if I went with a “simple” solution and put 1 Slave MariaDB process on each node in the swarm, the database is being replicated to every node which doesn’t seem very storage-efficient. I want to handle data redundancy with a separate system (Gluster), but that means pointing any database file systems to the shared storage.
Of those 3, I’ve tried Galera’s image, but their documentation lead to me having issues after the initial database creation. Once the seed service is scaled to 0 containers, the node service containers start failing with no errors in the logs.
I’m probably going to try MariaDB TX next because I’m not 100% on if CrateDB would be “different enough” that it’d be annoying to support (i.e. changing applications & queries therein for compatibility).
Does anyone else have experience doing this? Or experience with these specific solutions?
Are there other solutions I’m not aware of (i.e. not listed here)?
You’re going to have a bad time. Galera is the best of the bunch, and it’s not amazing. What you can do is restrict the number of tables/databases replicated to each container, but in practice, you’re better off having the full database replicated to each node in the slave pool. Honestly, containerization doesn’t help you all that much for most databases, especially if you’re already in a virtualized environment, and this is coming from a guy that commonly wears a “Docker all the things” t-shirt. The moment you start having stateful information inside of a docker container, you’ve done things completely wrong.
DO NOT USE GlusterFS/NFS/ETC FOR MYSQL STORAGE.
There are really only two answers for scaling relational databases [reliably]. Slave Pools and In-Memory caches (redis/memcached and the like). God help you if you’re write heavy, because Galera deadlocks are a nightmare, and master<->master replication will also make you contemplate picking up garbage for a living.
You’re going to run into file locking issues with a ReadWrite-Many Filesystem (like GlusterFS Fuse or NFS), and even if you didnt have that issue, the latency of the nfs protocol is enough to bring those machines to their knees.
Could you explain the structure of how this would work then, in a Docker Swarm environment? That’s where I think my understanding is really faltering.
Would you have a separate master MySQL instance from the Swarm, then have slave instances on each node in the swarm? I may be misunderstand what Slave Pool means, but I think that’s what it is?
It’s also important to understand that [My,Postgres,MS]SQL were never designed to scale out, so it’s best to scale up as far as you can before trying to scale out. The clustering features were really hacked on quite poorly.
If you want proper scale-out, you might consider a different database. (I know NoSQL is a bit of a meme, but lots of implementations are better at scaling)
So if you’re dead set on using swarm for this, you’d have X MySQL slaves, each with an iSCSI (preferable) volume attached to /var/lib/mysql, and then those slaves would replicate from your master. You’d need your entrypoint command in the dockerfile to do the slave initialization process only on first startup (you can put a touch file in /var/lib/mysql to avoid reinitializing on restart). That bootstrap process could be configured to only replicate certain tables/databases, or the whole dataset.
Additionally in code, and I’ll use Java/Springboot as an example here, you can have your application cache the result of methods defined in your code in Redis by simply using the CacheManager plugin for springboot. This will take a load off your mysql slaves, and allow the application servers to perform “dirty reads” against the redis database instead of hitting the MySQL server for a clean read each time. Depending on the size of your dataset, and the performance penalty of re-warming a cache in the event of a redis failure, you can either set redis to save the database state to an iSCSI volume periodically, or you can just let the dataset live in memory and just be rewarmed whenever it comes back online.
Option 2 is how I’d recommend doing it if i were you. Most modern frameworks have plugins to cache returns in redis, so it should be nearly trivial for your developers to implement.
This is a subject we’ve argued ad-infinitum about at work. The consensus of the experienced old hands, or boring old farts if you prefer is…
…that for production you are better off building a dedicated data platform to host your databases (usually on dedicated VM’s). The default scale out approach would be to send read traffic to the replica/mirror databases and only allow write traffic to the master. This way you should be able to achieve <30 second fail-over of the master databases with zero data-loss and have a database platform that is simpler to manage than a K8’s or Swarm cluster. Since the majority of database activity is usually only 20-30% write this model can work well.
By all means put databases in containers for Dev/Test purposes, likewise for Redis caching layers if you prefer.
The production database workloads we have that do run in containers are where we are using 3rd party PaaS e.g. Azure SQL DB which runs inside of Azure Service Fabric. The running and administration of which we don’t have to worry about.
My overall advice with databases is to work out your RTO & RPO’s that MUST be met and then aim for the simplest and most mature solution. Dealing with data recovery because the Developers insisted on using the latest tech is not fun
Best solution for scaling out mysql is sharding it. Galera is OK, but it doesn’t scale infinitely. It isn’t truly synchronous, it uses certificates rather than 2-phase commit, but it’s close enough for horseshoes. NoSQL DBs are much better suited for scaling, but the problem is developers despise working with them as anything but a write-only datastore.
Shared DB file storage doesn’t really exist outside of enterprise DBs. Oracle has it with OCFS, but MySQL with glusterFS is a bad idea.
MySQL does, of course, support master/slave replication natively.
It’s aces. Query caching that actually works unlike the built-in MySQL one that is really a trap for inexperienced DBAs, intelligent query routing, and real connection pooling. All makes life so much better.
I’m not sure if this is any use to you, but here is a high-level diagram I used at work to explain how we would build a new highly available SQL Server Data-Platform that could host databases for several projects with as few VM’s and instances as possible.
SQL Server has partially contained databases (isolate users completely from each others databases) and a Resource Governor (limit applications accounts to a % of CPU/IO etc.)
One of the main reasons to do it was that DBA retraining would be kept to a minimum and there would be less need for support tickets to be passed to other teams (which are not always with the same 3rd party support vendor). Obviously this is for a slow movong big corporate environment. The platform life-expectancy would be approx. 5 years but it will be simple enough to add new ESXi hosts to the main cluster and new Windows Core VM’s to the Windows Cluster - or even use the Availbility Group to migrate to a new VM guest cluster (it will even be possible to swap to a Linux OS with minimal downtime).
id go with a loadbalancing solution spread on multiple computers/servers. unless you tera bytes, and more tera bytes going in and out every day, mysql really does hold it’s own.
unless you spread the docker containers over a multiple of systems/computers, all you’ll be achieving is basically what the OS/DB daemon allready does e.g. spread the workload onto cores on the same system, just with the overhead of “virtualization”, since docker.
I registered just so that I could respond to this thread. Collect garbage for a living. Man, that was awesome! I’ve been feeling the pain, and I just stopped myself from testing GlusterFS this weekend. Sync and Async replication in MariaDB will have to be it, but man, it can not keep up.