(DIY) School Router / School Filter

risk · July 28, 2020, 12:15pm

I can see for example how either HTB qdiscs can be misconfigured with tiny bursts that at high bandwidth would both thrash the CPU caches and mess with actual throughput. I can also see how 10 years ago, back when certain ISP provided gateways used to crash whenever WoW downloaded an update, one might easily forget how to increase the size of the connection tracking table.

These days lots of these are not really big issues - Linux kernel itself has much smarter defaults, distros like VyOS and OpenWRT set them right out of the box, and there’s a million tutorials and checklists for building perfectly capable Linux routers out there, … in addition to hardware itself advancing. Drivers are still ugly though, apparently doing a mix of interrupts and polling in your nic firmware correctly still requires sacrificing some self-proclaimed c hackers at the altar or whatever, despite there being opensource code they can literally copy paste and get their device working correctly, but I digress…

If you have a stake in this kind of thing you should add an x86 machine / router to your lab … for experimenting and testing.

wertigon · July 28, 2020, 2:53pm

A few statements.

I used to build low-latency industry routers professionally, so… I had a stake in it, has since moved to a different path.
You don’t seem to understand my concerns at all here though, which is; the performance will degrade exponentially as more hosts are added, if you do CPU based routing.
If you, however, add a 4 port network PCIe card, this will mitigate pretty much any concerns I have since this card will have a “firewall chip” built in, doing 99.99% of all routing before it even reaches the CPU.
At this point, however, there is no point in buying a beefy CPU, since the network card will do all the lifting for you. And this will happen transparently in the PC with pfSense hiding everything behind the scenes, since pfSense pretty much supports all network cards properly.

Honestly, I thought you knew all this since you apparently seem to work a lot with networking, but apparently not. Oh well.

wertigon · July 28, 2020, 2:59pm

And to directly answer the concerns of OP:

Yes, most definitely. It requires some extra tinkering but the end result should be the same.

Just buy a 4-port 10 GBit NIC and that SSD. It’s fairly cheap and will come with it’s own switch core, at which point the components in the machine won’t really matter anymore, except for logging and stuff.

If school budget allows, get a €500-€1000 prebuilt router, it’ll be so much cheaper than spending two weeks on getting your own build up and running.

risk · July 28, 2020, 6:15pm

I admit don’t really understand this “degraded exponentially”, business when you add more hosts.

It sounds very unspecific to me - in trying to understand you better the impression I have is that your concerns are based on unresolved issues from years ago that you did not fully understand then (hence unresolved obviously), and don’t remember well enough today - except that it didn’t work for you. I’ve been there myself, and I can’t deny you had a bad time back then, but my experiences so far using linux for routers have been a lot more positive, challenging at time, but positive overall, and judging by your description it sounds like something was misconfigured for you back then. Was it HTB, was it the route cache size or route cache gc, or was it connection tracking - it’s hard to tell now - and the kernel networking stack has indeed changed a lot since 2015.

For example these days we have: https://www.kernel.org/doc/Documentation/networking/nf_flowtable.txt

This bypasses most of routing and firewall stuff and it likely would have resolved (or at least masked) whatever tuning problem you had back then. It doesn’t sound like you had driver/interrupts balancing/chipset issues otherwise your performance would probably suck for a lot fewer connections. This netfilter module is still relatively new - it wouldn’t have been available 5 years ago when you were testing.

The issue I have recommending “accelerators” is that they’re either too cheap when it comes to capabilities or too expensive in $$$.

If they’re cheap, they’re likely both not programmable and have only a small amount of inflexible CAM for storing the FIB (I guess you could still call it a FIB even through there’s all kinds of junk in those entries), and all you typically need is one kid doing a bunch of DHT / P2P lookups or ramping up connections to start thrashing your CAM entrie – at which point you start writing complicated firewall rules to limit them, … ugh.

On the other hand, if you buy enough of those expensive cards you get sent a bunch of sad consulting engineers to go with the cards that probably will be hard to use effectively – both the card and the people, … poor folks god bless them.

There’s also no reason to not consider having a secondary x86 router and doing some kind of VRRP / moving VIPs between them … so that your networks is still up while you’re upgrading one of the routers. … well except that 2 machines cost twice the money. … but you also get to spread the load across 2 machines when both are working.

In any case, … regardless of the chosen option - setting up a simulated test network should be easy enough thanks to network namespace containers and ipvlan/macvlan these days - and is worth it for @dual_brot to do, before they rip out the currently “working” router.

For example, one could easily grab a linux laptop and write a shell script to create e.g. a 200 virtual nics each with their own mac address (macvlan), assign them IPs and then have e.g. nginx listening on all of them.

And then on the other side, you can have your load generator linux laptop - just setup another 200 network namespace and run something like github.com/wg/wrk once per network namespace with a random subset of those IPs. You’ll have thousands of real connection tuples on the router in no time, and you’ll be able to observe the behavior including the overall throughput.

One thing people don’t realize is that a typical e.g. macbook pro level hardware you can just get from a computer store (or order online these days) can issue enough legitimate looking requests to rival what 10 years ago would have made the news as a dos attack.

Like: OMG! the RPM on our spammy blog is over 60,000 ! - oh it’s just Bob testing, ah ok … 5 minutes later … oh damn it Bob you’re running with multiple threads now, stop pointing loadtest traffic at prod.. It’s that ridiculous.

My point is that you don’t really need to buy expensive “solutions” just so you could do a basic low bandwidth (<10Gbps) loadtest. (well you probably need a desktop with a free pcie slot for >1Gbps, there’s a shortage of thunderbolt 10Gbps nics for some reason).

There’s no reason not to do it, and whichever issues you resolve during loadtest will be issues you won’t have to resolve before stuffing the router into “prod” environment. You can ask for help here on the forums or on IRC or on various mailing lists where developers hang out.

The thing I don’t know much about is pfSense, … I sort of gave up because I wanted to run my router virtualized with a KVM host and virtio nics, … and FreeBSD (and pfSense) had a bunch of issues with both e1000 and virtio drivers and I couldn’t really get useful performance out of it even at home and ran out of time to debug, in contrast, linux worked oob - so I just used that. Now my home router is just a plain old linux box running debian testing - not any kind of fancy routing distro.

dual_brot · July 28, 2020, 6:33pm

Oh, wow. Thanks for that huge amount of detailed information. Really appreciated that.

To answer to some of your questions: I think the fastest fibre optic internet connection we can get here in Germany as a school (because we are sort of binded to Telekom) is 1 Gig. This is a fact that I researched. The school district, which is sort of handling that, they just inform you about things that they already decided and when they half way done, just said you will get that and let the ISP send an technician who looked at some cables and only said that it could be done and that he can install it maybe this fall.

This idea with pfsense and DIY came because I wanted to present an alternative solution, one that is as good as this Time for Kids Software Solution (so that we can replace that in case that hard disk image I cloned onto the NAS SSD was corrupted in the first place, I mean there were burning marks under that HDD when pulled it out to clone, or even start with it anyways) and that is more future proof than that box from 2014, because they told me that even 5-6 iPads are struggeling playing back Youtube videos in 360p, so not even a full class. But I think that has more to do with the wireless APs than the network / router itself, right? The worst case in the moment would be when all iPads are surfing the web. But we also have tackle the challenge of broadcasting 5-6 lessons live to students who can not come to school, because of the situation right now, via a conference tool, so we are tempted to ban the use of this iPads at the moment to make sure that we can do that.

So if I understand you guys correctly (and correct me if I’m wrong) I could get some old PC (and by that I mean not a brand new one) which is well enough equiped and which is laying arround in school, stuck in that SSD with pfsense on it and stuck in a NIC for better routing / filtering performace. The NIC could be easily changed if our needs change (for example if we would add a storage server to the network for archiving old DVDs/VHSs/CDs on a Plex Server (10 Gig via SFP+?) or if we just need more performance) and we could filter the traffic just by dropping some suitable blacklists on pfBlocker / the NIC. And if I want that to be tidy I could get an server case, drop all the components in there and put it neatly in our network cabinet. That would get me to a pricepoint of around 150 - 200 €, right?

And thanks again for your support!!!

Ruklaw · July 28, 2020, 7:07pm

Just to throw my hat into the ring, I’m a network admin at a school and have been in a similar position in that we recently upgraded to gigabit internet speed and needed to replace our Cisco ASA 5505 to accommodate the upgrade.

I decided to virtualize the router because obviously at this stage all our other servers are virtual so we have the infrastructure in place, and also means I’m screened against this sort of issue (old hard drive failing) taking out a box that is being relied on in production.

Once I decided to virtualize it was then an easy leap to make to decide to run two routers in a redundant failover configuration.

I looked first at running PFSense as I was familiar with it from prior projects, but it’s mechanism for failover - CARP - relies on promiscuous mode for it’s networking so that MAC addresses can be shared between the two routers - this is fine if you are building physical but virtual it relies on particular hardware support, especially if you are using it with SR IOV nics.

I then looked at OpenWRT which uses VRRP for failover - VRRP doesn’t need to use shared MAC for failover, and in my experience our downtime switching between the two routers is in the order of a couple of seconds.

I also did some Iperf testing of the routing speed between the two and OpenWRT was hugely faster. On the not very well specced VM it was routing 5gb per second where pfsense would manage about 2gb per second.

Anyhow. Even if you do want to dedicate physical hardware to the project I’d still suggest running it virtualized on hyper-v or your choice of hypervisor. Makes hardware upgrades/replacement so much easier when you don’t have to reinstall, and makes replication very easy.

This was the guide I followed to install openwrt on hyper-v:

I’m actually doing some more work on it now as we’ve just had a secondary internet connection put in (Gfast, 350mb down 50 up), so need to workout how to failover if we do get issues with the main line.

risk · July 28, 2020, 7:30pm

Hey Ruklaw, there’s also use_vmac option in keepalived, assuming that’s what you’re using.

Are you also syncing firewall state between hosts, using conntrackd perhaps?

Any chance you could update / review the openwrt wiki page - it sounds a lot like your setup?

(I started that wiki page years ago, and forgot it existed and no longer have a redundant setup at home).

Ruklaw · July 28, 2020, 8:03pm

Yep, that Wiki page is an awful lot my setup, in fact I think that was the main resource used setting it up, so many thanks for your work on it!

I do have conntrackd running pretty much as spec on the page, I don’t think I spent very much time on it as it was already performing to expectations (so appeared to be just working).

I’ll see if I can put something useful in that wiki page, there certainly was one gotcha I hit putting in the configuration which I could note on the wiki:

Now it’s just a question of looking at what else I changed - been a lot happening over the past few months!