PT1 - Your requirements
- I'm a newblood DevOps-like person in this space. I've done some cool things, but this does not mean I know the ideal way of doing things. Don't take this blurb of text as an indication I really know what I'm talking about.
Software project needs. Devs, Sysadmins, "The Project"
Almost all software developers seems to have a fairly similar amount of needs:
- Consistent project structures for each language
- Quick, easy build for testing new features separately
- Cross project integration testing
- Server upgrade testing
- Bug testing
- Environment management
- Disaster (damage) control
- Not to lose hours of time due to software lifecycle / tooling
Almost all projects seem to need the following:
- Non-intrusive release cycle
- Reliable software deployment/updating
- Version retention
- Verification of environment statuses and functionality
- Low cost
Almost all sysadmins want:
- Working software
- Reliable logging
- Knowledge of requirements for any given project/environment
- Equivalent servers through all environments
- Not to be called at 2AM, or called at 7PM on date night.
- Low overhead
Consistent project structures for each language
When a developer is asked to do work on x, y, or z project - Enabling the developer to be able to go to the said project and find a similar, known, project structure enables them to get started quicker than having to find someone who knows the project already for an overview of the project structure, tooling, documentation, etc.
Quick, easy build for testing new features separately
Your devs tend to want to be able to run a command, click a button, or cast a magical spell and instantly have a working piece of software so they can see if what they wrote works. Further they want what they built to seamless build on your build/deployment tools.
Cross project integration testing
Typically, you're not going to have just one project. You're going to have multiple in house projects talking and working with each other. You're going to want the developers to be able to test against each other's components effectively without hindering each other from testing new features.
Server upgrade testing
Serverside vulnerabilities/upgrade needs happen. Whatever you do needs to have a reliable way to test the new versions of the DB backend, SSL configurations, or whatever is being upgraded - without interrupting standard development process.
Bugs in the software happen. The bug needs a developer to patch it, and then verify it was patched without hindering the development of the other developers (unless it is a core design flaw... Then refactor the code under another branch/project.. I guess)
On top of this, you'll need a stable staging environment to do your tests in before allowing them up to your higher environments.
You're looking at multiple different environments now. You don't want to have your devs trying to manage the configurations of all those environments. There needs to be something easy to update and have propagate up as the software continues along with development, along with a system for adjusting specific features for certain environments.
Disaster (damage) control
Unfortunately you won't catch everything. Stuff breaks in prod, and usually it breaks in a way that the devs will need access to the servers/dbs/logs/etc to figure out what is going on.
Not to lose hours of time due to software lifecycle / tooling
Finally, your developers will want to be able to work without interruption. If you have builds/verification/promotions/deployments taking more than 20 minutes each, you are setting up for a point where when things are going badly, you'll risk half of a day to a team that is against the gun for the new version release already. Risking a delay, broken software deployment, improper testing, and more.
Your 'The project's:
Non-intrusive release cycle
As the developers need their work to not be interrupted, you also need a quick developer > production release cycle, tools and have it not break.
Reliable software deployment/updating
When you say your software is going to be on the server, it needs to be on there, and quickly verifiable.
Audits happen. If you have to produce something, best not have to spend energy/risk trying to get that release that was made 3 years ago back /rebuilt. Probably even your server configuration for that time.
Verification of environment statuses and functionality
When you make a move to have multiple projects, environments, and many interacting systems, you need to know what is where, what the status is, and when the next version push is happening.
Having this project is cool, but you can't have the project just 'running' be the cap for the budget of the project. You will have 3x the traffic you're expecting, and you just found out everyone was wrong on their prediction of what you're getting 5 alerts ago...
When a software is deployed to a server, you don't want your sysadmin called in to fix the server, only to find out the developer released something that had no chance of working in the first place.
When you have multiple environments, multiple projects, and you went live with your new versions on 14 different sets of software, having something come up and needing to go to the 'logs' is pretty overwhelming. It is made worse if you have to have access to every server, every environment, and not interrupt the functionality of the application server just trying to look at the logs as they're spitting out 3000000000+ lines of debug code.
Knowledge of requirements for any given project/environment
Project 31 needs another environment and integration with all of its stuff for anther thing tomorrow, and it is new. Get it running. When this happens, you want to know what is needed. Last thing you want is to be ready for deployments only to find out everything is wrong, and assumptions were made
Equivalent servers through all environments
Anything for functionality testing purposes should be equivalent to the final server they will end up on. And the software should be delivered in the same way (ideally). Last thing a sysadmin wants to find is that env3 that everyone was doing testing against had the /proc/sys/net/ipv4/tcp_mem tripled, which was the only reason that the software was working, and prod broke 2 hours ago but your're just found out this is the difference that fixes it.
Not to be called at 2AM, or called at 7PM on date night.
When something is broken, make sure you've set up systems and processes that allow a minimum amount of people involved to get their job done to fix the problem. The more people and steps involved to work with your environments, the greater the risk of missing something and making a mistake.
Whatever is used to solve all the above requirements doesn't do anything if the overhead is obscene. Makes it hard to keep the hardware requirements down, and makes it so you can't do as much with the same hardware/vCPUs and RAM