Redesign Network Layout

SirKlip · April 25, 2023, 3:10pm

We are in the process of upgrading some of our internal switches ( one of them has been in an on state for 9̶9̶4̶5̶ d̶a̶y̶s̶ ±3600 days)

it was originally over designed I plan on removing 1 or 2 switches from the current setup and using vlans to seperate the networks. (We are upgrading all the switches in this image besides the main and backup ingress switches.)
I would be interested to hear your opinion on the best way to design and impliment this.

my initial thought is to get rid of C1-A and make use of C1-B for its ingress

Also stack C7 + C8 and isolate networks with vlans

I am fairly new to this so any input or advice would be appreciated

risk · April 26, 2023, 4:59am

9945 days == 27.247 years

Must be a RTC bug, it sounds impossible that a router/switch from 1996 would still be in use and not have been removed earlier for causing some kind of network issue.

Pic of the device … if possible?

In any case, I wouldn’t expect anything today to last as long, … I’d say you have 2-5% chance of getting a device with some kind of Heisenbug.

The other thing I find somewhat hard to understand in your use of the term DDOS. In my experience, generally if someone has a thing for you to DDOS you, there’s nothing you, and sometimes not even anything your ISP can do to redirect the botnet. This is where cloudflare and fastly and various cloud providers come in.

Regarding other aspects of your setup, it looks like you have a typical “corp”/“prod”/“internet” separation of concerns - and you should treat them as 3 networks.

prod-prod traffic is probably huge, and you don’t want to filter it much - probably not at all.
internet-prod can do go through some ingress/reverse proxy/filter
corp-prod can happen for trusted corp employees
corp-corp is probably tiny and you want to filter it
corp-internet … whatever
prod-internet and prod-corp should generally only be allowed if initiated from internet or from corp.
internet-corp only via VPN.

This way you can apply probabilistic filtering on the internet-prod route and can lock down most of the other stuff behind device certs.

If you need high availability (e.g. need to upgrade switches and firewalls and servers without dropping connections), you’ll need to “double up” some of your network gear, and you’ll need your servers and network gear to be aware of some internal routing protocol, and some links would need to be doubled up, so that you can “drain” network traffic away from devices and links you want to service. It means running things like VRRP, pfsync if you go with pfSense, conntrack with Linux, and probably ospf or bgp internally e.g. between your container and VM hosts if you don’t have too many if then and maybe towards your ISP too.

SirKlip · April 26, 2023, 7:26am

Morning i was incorrect
its been up for just over 9yrs not 27yrs(that would be impressive)
so 3600 odd days

Additional Info
each Customer has their own SRX firewall in front of their hardware
Our internal network has a clustered set of SRX’s

DDOS = we have a Riorey DDOS box that filters the traffic for us.

We have a maintenance window in which we can do the work
customers are aware there will be a loss of service during this time.

Thank you for this
I will take all this info on board when we put pen to paper for the design

thro · April 26, 2023, 8:45am

some cisco switches with no battery backed up clock (rely on ntp) default to clock set to 1993 from memory. eg. cisco 3550

EniGmA1987 · April 26, 2023, 3:11pm

The only thing I would say if you remove C1-A is that you are moving your internet lines to a single point of failure. If C1-C goes down in the new layout then both primary and having backup internet wont matter.

Right now if the main line goes down, traffic can go out the backup., if the backup goes down then traffic still goes out the main, if C1-C goes down then traffic goes out backup, if C1-A goes down traffic still goes out the main. Any one of those can still go down and you are fine. If multiple fail at the same time then it would go down, but that would most likely mean other issues as well.

Though, that may not matter anymore if you are also moving the networks onto a single switch and they run through a single “DDoS”. If any one of those went down then the whole thing is down. So I guess moving the front part to a single switch as well is fine.

risk · April 26, 2023, 4:01pm

Sounds expensive.

You could build what network folks call an SDN, maybe. Basically a bunch of relatively dumb L2 switches and a couple of beefy epyc boxes to run your LVS/ipvs based load balancing of individual flows across software firewalls. You can probably squeeze 100Gbps of imix from a single epyc box relatively easily if all it does is LVS/IPVS, and basic load healthchecking for load balancing.

When you pop in a new client, assign some IP space to them on your balancer and they can have at it. … and if they want DDOS, you can rent them a fleet of VMs for that, … and so on.

Dumb, but VLAN capable 100G switches are also fairly cheap these days, and VLANs are generally enough for isolation.

For more than 100G wan ingress, there’s people out there doing cheap “maglev” style setups with ipvs too, there’s plenty of folks writing blogs on how they did it, I think even cloudflare had such a setup back in 2016/2018 era… in a bunch of physical sites of course.