What is a good software management 'meta'

PT1 - Your requirements


  • I'm a newblood DevOps-like person in this space. I've done some cool things, but this does not mean I know the ideal way of doing things. Don't take this blurb of text as an indication I really know what I'm talking about.

Software project needs. Devs, Sysadmins, "The Project"

Almost all software developers seems to have a fairly similar amount of needs:

  1. Consistent project structures for each language
  2. Quick, easy build for testing new features separately
  3. Cross project integration testing
  4. Server upgrade testing
  5. Bug testing
  6. Environment management
  7. Disaster (damage) control
  8. Not to lose hours of time due to software lifecycle / tooling

Almost all projects seem to need the following:

  1. Non-intrusive release cycle
  2. Reliable software deployment/updating
  3. Version retention
  4. Verification of environment statuses and functionality
  5. Low cost

Almost all sysadmins want:

  1. Working software
  2. Reliable logging
  3. Knowledge of requirements for any given project/environment
  4. Equivalent servers through all environments
  5. Not to be called at 2AM, or called at 7PM on date night.
  6. Low overhead

Your developers:

Consistent project structures for each language

When a developer is asked to do work on x, y, or z project - Enabling the developer to be able to go to the said project and find a similar, known, project structure enables them to get started quicker than having to find someone who knows the project already for an overview of the project structure, tooling, documentation, etc.

Quick, easy build for testing new features separately

Your devs tend to want to be able to run a command, click a button, or cast a magical spell and instantly have a working piece of software so they can see if what they wrote works. Further they want what they built to seamless build on your build/deployment tools.

Cross project integration testing

Typically, you're not going to have just one project. You're going to have multiple in house projects talking and working with each other. You're going to want the developers to be able to test against each other's components effectively without hindering each other from testing new features.

Server upgrade testing

Serverside vulnerabilities/upgrade needs happen. Whatever you do needs to have a reliable way to test the new versions of the DB backend, SSL configurations, or whatever is being upgraded - without interrupting standard development process.

Bug testing

Bugs in the software happen. The bug needs a developer to patch it, and then verify it was patched without hindering the development of the other developers (unless it is a core design flaw... Then refactor the code under another branch/project.. I guess)
On top of this, you'll need a stable staging environment to do your tests in before allowing them up to your higher environments.

Environment management

You're looking at multiple different environments now. You don't want to have your devs trying to manage the configurations of all those environments. There needs to be something easy to update and have propagate up as the software continues along with development, along with a system for adjusting specific features for certain environments.

Disaster (damage) control

Unfortunately you won't catch everything. Stuff breaks in prod, and usually it breaks in a way that the devs will need access to the servers/dbs/logs/etc to figure out what is going on.

Not to lose hours of time due to software lifecycle / tooling

Finally, your developers will want to be able to work without interruption. If you have builds/verification/promotions/deployments taking more than 20 minutes each, you are setting up for a point where when things are going badly, you'll risk half of a day to a team that is against the gun for the new version release already. Risking a delay, broken software deployment, improper testing, and more.

Your 'The project's:

Non-intrusive release cycle

As the developers need their work to not be interrupted, you also need a quick developer > production release cycle, tools and have it not break.

Reliable software deployment/updating

When you say your software is going to be on the server, it needs to be on there, and quickly verifiable.

Version retention

Audits happen. If you have to produce something, best not have to spend energy/risk trying to get that release that was made 3 years ago back /rebuilt. Probably even your server configuration for that time.

Verification of environment statuses and functionality

When you make a move to have multiple projects, environments, and many interacting systems, you need to know what is where, what the status is, and when the next version push is happening.

Low Cost

Having this project is cool, but you can't have the project just 'running' be the cap for the budget of the project. You will have 3x the traffic you're expecting, and you just found out everyone was wrong on their prediction of what you're getting 5 alerts ago...

Your sysadmins:

Working software

When a software is deployed to a server, you don't want your sysadmin called in to fix the server, only to find out the developer released something that had no chance of working in the first place.

Reliable logging

When you have multiple environments, multiple projects, and you went live with your new versions on 14 different sets of software, having something come up and needing to go to the 'logs' is pretty overwhelming. It is made worse if you have to have access to every server, every environment, and not interrupt the functionality of the application server just trying to look at the logs as they're spitting out 3000000000+ lines of debug code.

Knowledge of requirements for any given project/environment

Project 31 needs another environment and integration with all of its stuff for anther thing tomorrow, and it is new. Get it running. When this happens, you want to know what is needed. Last thing you want is to be ready for deployments only to find out everything is wrong, and assumptions were made

Equivalent servers through all environments

Anything for functionality testing purposes should be equivalent to the final server they will end up on. And the software should be delivered in the same way (ideally). Last thing a sysadmin wants to find is that env3 that everyone was doing testing against had the /proc/sys/net/ipv4/tcp_mem tripled, which was the only reason that the software was working, and prod broke 2 hours ago but your're just found out this is the difference that fixes it.

Not to be called at 2AM, or called at 7PM on date night.

When something is broken, make sure you've set up systems and processes that allow a minimum amount of people involved to get their job done to fix the problem. The more people and steps involved to work with your environments, the greater the risk of missing something and making a mistake.

Low overhead

Whatever is used to solve all the above requirements doesn't do anything if the overhead is obscene. Makes it hard to keep the hardware requirements down, and makes it so you can't do as much with the same hardware/vCPUs and RAM

PT2: Your tools Placeholder

PT3: Putting it together Placeholder

PT4: Unsolved Placeholder

This took a while to write. I'll go over the tools I'm using to address as many of these concerns as possible

Need to incorporate disclaimer:

  • This overview is after I've had experience in a software development shop. Had to overhaul many systems due to lack of scalability, and lack of maintainability. I do not know how much of this can translate to websites, or other oddities I've not personally worked with.
  • The OS I've worked with the most for servers is CentOS, but most of this should be adaptable to other distros.
  • I do not use Windows/Mac. I do not think my solutions presented will translate easily to a Windows/Mac shop.
  • Finally, I have been thinking about this as a.. platform(?) in which I hope can be used to manage multiple projects in different languages.

Any thoughts on PT1? What have I missed? What do I have absolutely WRONG\

That took a while to write up. Going outside for a bit and will resume with PT2