How to best structure a DB for a system replacing spreadsheets

Context, apart from CMS solutions I’ve only really built custom systems for small businesses so i lack knowledge on scaling and 2 years ago i was hired to help reduce costs and improve the speed of a Drupal system with a bloated 50gig db, that really opened my eyes to the importance of choosing the correct data structure for your system.

I am now tasked with building a system that replaces a businesses reporting system of spreadsheets and paper. Because reports need to be approved and individual cells on the spreadsheet can have incorrect data, a field needs to have status fields to know if that fields is approved or not. I then recently found out how many reports they process in a year and realized the field table will hit 6 million rows in a year possibly more. Is this an issue?

One of my mates said 6 million is not alot and 30 million is 5 years is manageable. I then reminded him that he works at a bank with crazy big servers. He then recommended database sharding which seem like crazy overkill for a business this small.

Now ive confirmed that we wont ever have to filter reports on their fields or do weird calculations based on field data it will strictly be used to store, update and view. This means field data doesn’t have to be in a relational DB

One solution I have is storing the field data as a JSON, this doesn’t help with overall database size but should make retrieving the data quicker then calling them from a 6 million row table. But Ive never done this, is there a limit to a column total size if say a has store 100 fields. What other complications are there

Another solution is storing the field data in a flat file on the server, but this complicates the backing up solution.

My final solution is instead of storing it in a flat file i rather store it in a Mongo DB entity. This solves he backup complications, but introduces new ones with a hybrid database solution. Plus ive never used MongoDB and maybe this solution is not as good as i think.

One of my friends suggested that i look into PostgreSQL instead of a MYSQL database because people are doing all sorts of cool things in PostgreSQL. But i dont know where to start with that one.

What are your thoughts am i over reacting which solutions have complications i don’t foresee. I am desperate for input

Sounds like you’re using MySQL to power Drupal with default parameters.

6 million rows in terms of data volume are nothing for relational databases even on consumer hardware. However, bad queries and insufficient resource allocation can bring even the smallest relational database on its knees.

Relational databases (RDBMS) generally are designed to never fail. That means if they run out of preferred resources, e.g. RAM, they silently switch to less efficient algorithms that “get the job done” (e.g. involving swapping).
Since all of this is happening internal to the RDBMS this is not obvious to the uninitiated. e.g. you don’t see OS level swapping.

Default parameters for MySQL, PostgreSQL enable operation on the smallest of hw (think raspberry pi). Most tuning advice starts by adjusting the amount of RAM allocated to RDMS to the hardware being used. These basic tuning steps are typically not performed when deploying MySQL/PostgreSQL as part of an app (such as Drupal).

I would start looking at implementing basic tuning step in your RDBMs before you consider more drastic options as you layed out above.

4 Likes

How did this end up? Did you manage to add some indexes?