How far is too far for data separation when designing a database?

risk · August 23, 2017, 5:18am

Maybe middle management read some sharding guide, or how to reduce locking contention guide, downside is you’re generating a lot more i/o (assuming you’re using a clustered index and not some LSM tree db)

If you must keep the tail of the table locked for one or more network round-trips, you can save some space and use uint64s assigned to clients through a sequencer helper table. (One batch of statements is enough to increment and get the id), IDs may be non-contiguous in that case, but you save both on space and on io.

Levitance · August 23, 2017, 12:01pm

Yeah, sounds like something management would do. Sacrifice performance now for something we probably won’t ever implement.

Vitalius · August 23, 2017, 1:39pm

TBH, that sounds like something I’d do personally if I were making decisions without a certain future plan or vision for what the database will end up looking like.

Particularly if I felt performance was secondary to functionality and easier to accomodate.

It sounds like a similar argument to starting off with an organize and structured database using many tables vs using few tables.

The idea is you’re building for the future so you do more work now so you can do less work later. I guess it has everything to do with what you can discern about what the future of the database’s usage will look like, but I feel like switching from GUID to vanilla IDs is easier than switching from vanilla to GUIDs? Never done it. Uninformed guess.

Levitance · August 23, 2017, 2:36pm

It is indeed a very similar argument. But you can’t prepare everything for every possibility. Especially where something so niche as sharding is concerned. You really need to look at the application being designed and ask if it is ever intended to have a reach far and wide enough to warrant sharding.

Edit:
I can use words today.

risk · August 23, 2017, 3:26pm

If you make a system more complicated today, you’ll be paying the cost of it being more complicated from now on, as well as the cost of not doing something else instead.

If you make a system too simple, you’ll be paying the cost of it being too simple in the future, but you’ll be reaping the benefits now.

In various software project management diaciplines/philosophies/frameworks there’s this concept of a “product owner” who should be educated enough and/or vested enough to be able to make future proofing decisions having in mind long term goals and roadmaps and value of various product features.

As an good entry level engineer, it’s yours to ask, and try to assess your own implementation cost, management - leadership roles can assess the implementation cost in more detail, balance it against opportunity cost of not doing something else, product owner can then make a more informed decision.

The question you should be asking yourself is: “do you take pride in your work?” A good manager/lead will be able to be a good conduit for a “product owner” and allow you to understand decisions the “owner” makes as well as give you the opportunity to influence it.

It doesn’t always work this way, and even when it does it doesn’t always work out well for everyone or anyone sometimes.

If unsure, ask, start simple, ask, move on… would be my recommendation…

With sharding in particular, these days with various replication systems that allow you to move data into secondary systems for various analytics workloads, you can get pretty far without sharding.

FaunCB · August 23, 2017, 3:45pm

Please tell me you have a hard drive for every line lol

risk · August 23, 2017, 6:05pm

I apologize for my lengthy replies, I find it hard to explain things like those the OP was asking about succinctly.

I understand your question was rhetorical, but fun thing is I actually really might. (It’s less than 30 lines on my tablet, lol.)