General Purpose Compute Cards via PCIe

I’m sitting here at home wondering if it’s possible to install a PCIe card for extra compute into an existing system (ie. installing an extra CPU on a PCIe card). Mainly the question I had was if someone can have graphics/AI compute via PCIe, why can’t they have general purpose compute via PCIe? Does this exist already?

I’ve never heard of a product that supports this (but conceptually the Raspberry Pi compute module carrier boards are kind of what I’m thinking about). I’d think this would be a big market for people/companies looking to get more compute out of existing systems without spending buku bucks for entirely new systems.

A follow-up question comes to mind. Can we connect computers together via PCIe? Say two AMD EPYC systems for example?

On multi-CPU EPYC systems, the processors are interconnected with PCIe (half of them to be precise) so… kinda?

Sort of. You have FPGA boards for the “specialized” compute, e.g. Xilinx Alveo. You have the GPGPU-like cards like the MI250 / H100. On either of these you can do CPU-like computation, but depending on the workload more or less efficiently than a “real” CPU.

You don’t really see a “real” CPU on PCIe boards because you would need a separate memory, power, cooling, probably storage, etc - likely a new, small box on its own. In these cases, compute blades seem to be a better solution.

EDIT: well, technically, Intel NUC Compute Elements exist, but they are not configured to act as devices in the PCIe tree as far as I’m aware, so you’d need another way to communicate with it, like Ethernet or Thunderbolt maybe?

The Xeon Phi cards are the closest thing I’m aware of:

You plug them into PCIe and then usually offload computations to it, similar to a CUDA/OpenCL on a GPU. But they run on x86 so they can run x86 code natively. You can also ssh into them and use them ‘standalone’. They are discontinued though…

A follow-up question comes to mind. Can we connect computers together via PCIe? Say two AMD EPYC systems for example?

Not really. The closest we have is RDMA (remote direct memory access) over Ethernet and/or Infiniband. This allows the connected PCs to directly access each other’s memory via the network and it’s how supercomputers are built up from multiple nodes.

1 Like

Interesting info. Thanks!

What you are thinking of was actually done in the early PC days. Back then the CPU didnt have all the functionality, and so you could add in something like a floating point expansion unit or other math co-processor. You see this sort of thing like on a rPI and compute boards because they are in the same situation now that PCs were in way back then.

We still do have our add-in cards today though, like a GPU. It is an expansion unit designed to accelerate graphics and now days various compute loads as well. We have network expansion units designed to process network traffic (a NIC). We have storage expansion units designed for accelerating or connecting more storage, like a raid card. There are less common ones as well, like cryptographic accelerators. We used to have sound cards but I dont think ive seen a new one of them released in a very long time because sound processing is such a menial task to a CPU now. But processing types that are needed in 99% of use cases for a PC end up being integrated in to what we already have (like FP units in CPUs, AES-NI accelerators, video endcode/decode for watching a movie, network port that has basic functionality). Integrating things brings the cost down from discrete solutions and often brings higher performance by moving them closer to the CPU. It is a step backwards to move them away from the CPU into a card, though we still do that for some devices that are either too big to put into a CPU (like a high end GPU is) or is too niche still to warrant the space in a CPU.

Generally, an add-in card for general purpose computing like an extra CPU isnt desired so no one makes it in consumer form factors. You can easily just put a faster CPU in the MB you have instead and it would save a ton of money. If you really need to add in something like 64 CPU cores then you could just buy a higher class of PC that has those. Especially since an add-in card with a 64 core threadripper would likely cost just as much as a new MB, RAM, and CPU all bought separately for a whole PC. A card like that would also likely bottleneck on the PCIe bus, limiting its usefulness to only processing tasks that are very long, which is often ones that also benefit from offloading to a GPU instead that would give a higher performance gain for less money. So it just doesnt make sense for a general purpose CPU type of device.


In the server space, such things do exist though they are much less common now days than 20 years ago. It would be what is referred to as a Blade Server. It consisted of a blade enclosure and the blade servers that slot into it. As your compute needs grew you could add more blades. IBM was famous for them, though other companies also make them. You can also today find PICMG 1.3 backplane boards that have a large single board computer slot into them. This SBC has a full modern CPU, RAM, etc… These systems can upgrade the CPU entire generations (different sockets) because the whole computer changes out with the new SBC that you slot in. These are typically large systems, and you can get them with up to 19 expansion slots. Additionally, there are also very large and powerful core routers from people like Cisco or Juniper that are effectively blade servers meant for network processing. You slot in extra add in cards that add a big CPU and a group of network ports. You buy the main router chassis and slot in all the various line cards you need. These are like what you use at an internet backbone or for the main router at a massive corporate office spanning multiple locations.

1 Like

Beyond xeon phi, there is usual suspect GPUs in GPGPU usecases. Like Tesla series card from nvidia.

Other than that, there are fpga kits.