GPUs vs FPGAs for Nural Networks

In my digital systems class last semester we learned about FPGAs.
So I went poking around to see if people are using FPGAs like GPUs.
I found a stack overflow post from 8 years ago that a smiler question was asked.

Why don’t we use FPGAs for neural network processing?
And how do you think an FPGA would preform vs an off the shelf GPU?

I’m betting that the prince of FPGAs just stops it right there. But if so assume you are a CS/EE student who already has a dev board.

mainly complexity of development

writing a bitstream that could train a DNN or an adversarial network would

a) be incredibly difficult compared to the object oriented frameworks and tools available for GPUs

b) require a big, top of the line FPGA with a massive amount of ram onboard. These usually run 5-10k

so tl;dr, ease and price/performance. with tensor cores on volta that advantage only gets better.

Worse.
FPGAs are ideal for processing simple data at extreme speeds. GPUs are for medium difficulty data at high speeds and CPUs are complex data and low speeds.

I think it was an Altera dev board with 8 DDR4 slots on it. Recommendation said “populate them all”

1 Like

FPGAs make sense when a special operation is required which cannot be efficiently performed by existing ASICS. But neural networks require almost exclusively additions and multiplications, both operations that GPUs are optimized for. So any FPGA would effectively be emulating GPUs anyway and only be slower.

Besides specialised AI accelerators are in development and the Titan V contains units targeting AI as well so there’s really no point in using FPGAs.

1 Like

Well they’re being used as AGA and AAA/AA chipsets in Amiga tuning boards. Thats the most GPU like thing I’ve heard.

The operations that need to be done are usually just incredibly simple arithmetic, but they need to be done on a large amount of memory. There’s a market out there that would happily pay for 256GB of HBM2 on a card in a PCIe slot and to home some hot and inneficient asic connected to it that can run through all of it a couple of times in a milliseconds.

On the other hand if you say, “oh we can have 4GB worth of registers connected by simple ALUs” … go for it (I don’t know how you’d do that, but nobody’s stopping you).