So I was thinking with how routers and switches normally tend to be rated to a certain speed (say 10GbE), and was thinking how much the firmware or operating system played a role in that. Obviously a lot of that depends on the speed of the hardware, the NIC, and the connection speed to the outside.
But provided you had fast hardware, very high bandwidth NICs, and a very fast connection to the outside, how much traffic can a router OS like PfSense route at once?
Note: I tried searching but couldn't find a discussion on this, so if there is an existing one then oops
I don't know if there is a maximum, it depends on the hardware. It's not as fast as a hardware firewall with ASICs. If you have the hardware to support 1gbps then pfsense will do 1gbps if that's what you're asking.
Might get a better idea by understanding the steps PF takes to forward a packet.
Faster NICs tend to have multiple packet buffers, and will hash inbound traffic across them. I believe FreeBSD which pfsense is based on supports spreading these buffers across the cores to help distribute load. The heavy lifting is done through DMA, but the CPU still needs to be involved so the operating system knows where the packets are going into RAM to process by the NIC driver.
Once the packets in RAM, PF (the actual packet filter code) has to do a state table lookup, this means searching through a table for key data from the packet. If no state matches then it will need to traverse the ruleset to see if a matching rule exists and create a state from them (unless no-state is specified).
Now assuming the traffic isn't destined for the local host, it now needs to get put into an output queue for the proper interface, this means route table lookup unless it's already been determined by an existing state entry.
I don't personally use pfsense, but I have used OpenBSD a fair bit which FreeBSD took the basis for their PF implementation from. OpenBSD has since changed some significant portion of the code but the basic premises mostly stays the same. Mind you I didn't go into a deeper rabbit hole of dealing with prioritization or policing/rate limiting and the intricate details of network driver handling and that.
Currently though OpenBSD has a limitation where only CPU0 can handle interface interrupts and handle the PF code. However on a modern server this can still process ~800-900Mbps of traffic with 20,000 packets per second. Keep in mind it's really the pps that matters, not so much bps. If you're pushing a bunch of max sized packets it takes less state table lookups (and thus less CPU time) then a bunch of minimum sized packets. Distributing this across multiple cores should raise the processing rate substantially.
Small packets is where traditional routers excel, because they don't do stateful lookups at all and optimize the routing table lookup with specialized hardware. A lower end hardware router like the Juniper MX80 can do 55-60 million packets per second with a single Trio ASIC chip. With the right software and hardware you can probably reach that on modern machines. For example, Intel touts while using the DPDK for their NICs you can get up to 80Mpps with a single processor machine. I believe Juniper is using this for their vMX product for exactly this purpose. However you forego the features of PF like stateful NAT in order to reach this kind of performance.
I've seen pfsense on a supermicro handle 1Gbps (970/970) in a couple of malls (public wifi) with roughly a hundred simultaneous active users splendidly, haven't bumped into higher speeds or more traffic than that so that's all I know.
Incredible write up, thank you very much for all the detail you put into that. By the seems of it the limitation is more so in the hardware than OS, so hardware routers are able to be so fast due to ASICs performing a more singular specialized job incredibly fast versus a multipurpose x86 CPU?
Thank you for adding your experience, that's quite a work load so it's seeming to be able to handle quite a bit
You also have to consider the PF software is doing much more than a high throughput stateless router does with it's ASIC. If you wanted to implement PF, or at least parts of it, via ASIC it would probably be pretty costly and less adaptable. Software vs hardware is always a tradeoff, as is ASIC or general purpose processing.
Yeah good point, by your post it sounds like there's many more steps involved slowing it down in terms of what goes on in the OS. That makes sense I definitely can see a trade off there, I guess you need to go with a different OS if you need more then 1gbs so you can use a hardware solution and take advantage of that so you can get maximumized pps
Out of curiousity how powerful is your PFSense box?