Re-architecture of cloud and infrastructure

Alright my DevOps Lords and legends!

I have a pretty big challenge ahead of me!

My company has been captured by the Azure ecosystem and we are continuously developing and deploying new solutions to the platform.

We are looking to Re-Architect the entire platform and look at subscriptions resource groups, management groups and policies from scratch.

Once we have our reference architecture, we will move our existing deployments into our new house.

Infrastructure as code will be a critical success factor, i have looked at terraform and think it looks like a great way to go.

I especially like that terraform promotes a multi cloud strategy.

Our Microsoft designated service engineer recommends that we go with this reference architecture .
https://registry.terraform.io/modules/Azure/caf-enterprise-scale/azurerm/latest

My question to you!, my Lords and Legends.
If you where starting your cloud development from scratch, what would you do differently? What key concepts should we keep in mind as we step into these design desicions?

I’d like to mention, we are already 100% cloud based. Nothing is left onPrem

Depending on the size of your deployments, you may want to have a look at

https://terragrunt.gruntwork.io/

1 Like

The terraform Azure landing zone bit would be a module of the above infrastructure…

1 Like

First it would be helpful to understand what the problem your company is trying to solve with this re-architecture, that way you can fix the scope, staff and get an accurate timescale.

Microsoft have a lot of documentation already on how to migrate onto azure.

Microsoft Cloud Adoption Framework for Azure - Cloud Adoption Framework | Microsoft Learn

The CAF registry is a good place to start. I would make sure your state is going to be manageable, an example I worked on something similar with AWS we would have several terraform statefiles, one for each account. Just something to consider it may not be a problem in your case.

“I especially like that terraform promotes a multi cloud strategy”
I would be very careful of what you want to do with this. It is very difficult to be cloud native and cloud agnostic at the same time. If what you mean it’s one “language” for several providers that makes sense, it does make it easier to find engineers since a lot of people write in it now.
If what you mean is cloud agnostic then you have to write a translation layer to make your modules work on different providers, as you don’t really want to replicate your code and technical debt. I assume you would be using kubernetes, I personally would put your cloud agnostic functionality within the cluster and your native services handled by azure.
Kubernetes should not be assumed to be cloud agnostic in it’s implementation normally but its easier to lift and shift. For example a native feature of kubernetes are secrets. However if you want to incorporate azure secrets you’re going to need a CSI driver ideally on cluster which is a cloud native driver. However you can swap the driver if you wanted to leave Azure and if written well would not affect your cluster code.

If you intend to be fully cloud native I would advise not using kubernetes at all, unless it’s a technical requirement. I would highly recommend Azure Container Apps so you avoid spending a lot of time configuring and maintaining kubernetes. Similarly to kafka and queue storage/service bus.

Ultimately if goes back to the start, what problem are you trying to solve ?

Otherwise just generic advice like only use hashicorp partner modules is possible for example in regards to terraform.

That’s my two cents, hope your project goes well!!

4 Likes

You may also find this of interest to yourself.

Azure Landing Zones - Community Call - 27th April 2023 · Issue #1191 · Azure/Enterprise-Scale · GitHub

Thanks for the detailed explanation, you’ve clearly got some experience with this.

The scope os honestly quite small today. I’ve got a few key vaults, a few Azure functions, some SQL servers and blob storage that’s gonna get moved.

Maybe multi cloud isn’t the correct strategy for us, but rather we should try to make our code (Azure functions) as easy as possible to migrate when/if time comes.

The re-architecture is mostly for management reasons. The way our architecture is set up today collects everything into a single subscription. And that’s not ideal.

We need to setup a architecture that silos different projects into their own subscriptions and make sure we have a management system in place to collect logs, monitor service health, setup virtual networks and apply policies (limit what type of resources is allowed within a subscription, or limit where services are hosted)

But it’s an opportunity to course correct and make smart long term decisions.

Looks like i have some reading material before going to bed. What a great source! Have you personally followed this writeUp?

Like @coltmarshmallow said, creating a framework for managing infrastructure in general, and cloud in particular is never as clear cut as the powerpoint and youtube warriors make it to be.
One thing that is certain is that the more flexibility and scalability and security you need in your operations, the more time/money/sweat/resources you’ll need.

I have architected (as in designed, coded, and managed the first deployments to production) a framework that loosely folllows, and expands, and adds years of experience, to what is presented in the gruntwork blog.

Our framework has been created for a different purpose, namely to allow our consulting company that works for many different customers at once to deploy and manage infrastructure using devops patterns.
We target mainly AWS and Oracle Cloud, but if needed the framework can be extended (with loads of additional work) to other providers.

It is based on terraform and Terragrunt, that comes into place when you start hitting the limits of what terraform state files can manage safely in single runs (think not wanting a change in your test subscription to involve anything regarding production subscriptions, to start with)
Terragrunt also allows you to write very DRY infrastructure code as opposed to terraform, and also allows to impelement patterns like using a CI pipeline tool to handle automatic (but controlled and managed and documented) deployment of infrastructure changes …

It may be that your use case is much simpler and you can get going with a couple of terraform modules and/or a couple of separated deployments, but it never hurts to read about enterprise grade deployments in case you and up getting into a much more complex situation (if for example your company is looking to apply for SOC or PCI compliance, having your terraform runs performed from a dev laptop will be frowned upon :slight_smile: )

You’re welcome.

Splitting your expenses makes sense, I can see why you would want different subscriptions. Ultimately I think the biggest factor is how expenses are handled at your company, both monetary and human.

If you practice some good abstraction you can build resources that work across your projects. I have often written helm libraries which then any repository can build a chart for. Another example is I like working with github actions for it’s feature set, and I create my own actions and reusable workflows so my workflows are responsible for building 100+ repositories with one code base.

There is an initial higher cost since you’re going to have to design and implement your abstraction (careful not to go too general and have abstraction leaking, object oriented programming suffers from this) , but if your company can not expense a flat amount then you’ll find difficulty communicating that internally. In english, project a is not going to want to pay for project b’s need for x. So you need a cost structure that can be horizontally applied as opposed to vertical.
I’ve run into that bureaucracy before sigh.

I’ve not handled that at microsoft, but at AWS we would lay out what we’re trying to do and speak to an AWS architect. Someone in that role may be or maybe not able to provide more context at a company level how to approach things than a designated service engineer.

GitHub - tfutils/tfscaffold

Just thought I’d throw this your way MadMatt

I indirectly know the person who developed this incase you’re not allowed to adopt terragrunt for what ever reason