Cats per second on Vega

Mandrewoid · January 8, 2018, 10:33pm

Hey all, you may remember about 6 months ago L1 did a review of the ASUS GTX 1080 Ti Strix, which featured some machine learning in TensorFlow.

TensorFlow is a python machine learning library that leverages GPU hardware with CUDA. There has been much grumbling in many corners of the internets about porting it and other machine learning libraries to OpenCL so that it isn’t tied to Nvidia hardware… From what I’ve read on the topic, it seems the major sticking point was the cuDNN library and cuBLAS library that provided a lot of low level primitives and were deeply integrated into every machine learning library from TensorFlow to Torch.

A few years ago, AMD announced their intention to change things with their ‘ROCm’ ‘HIP’, and a few other catchy acronyms. Their approach was to solve things in the classic computer science way: Add another layer of abstraction, hooray!
Somehow, the HIP toolchain compiles CUDA to C++ that will run on the GPU, or something like that.

TL;DR

There’s a version of TensorFlow that should run on Vega.
EDIT: In case you’re interested and have AMD but not vega, I think it is supposed to run on >= 380X

I haven’t been able to find any benchmarks anywhere yet, and I for one am really interested in what the performance is like on AMD.
I’m kindof in the market for a GPU but I am loathe to pay money to Nvidia, but I also want to be able to use TF. If anyone on the forums has any experience with this, hit me up.

L1 Team: If you guys are interested in trying this out, I think it would make good content, and you would be literally the first to cover it. I may be able to help sponsor it, if that makes any difference(And maybe hit up Ed from Sapphire too, sounds like something he would dig)

pFtpr · January 8, 2018, 10:37pm

This site has some Vega benchmarks. TLDR, performance is abysmal. Tensorflow will need a lot of rework for AMD to become competitive.

https://www.pcper.com/reviews/Graphics-Cards/NVIDIA-TITAN-V-Review-Part-3-Deep-Learning-Performance

Mandrewoid · January 8, 2018, 11:21pm

Hm, that is pretty terrible.
It does show they have the AMD GPU Pro driver installed. Apparently you’re supposed to uninstall that

I’m not sure if that would make it perform better or not, but it does at least show they didn’t follow the instructions.

in https://github.com/ROCmSoftwarePlatform/hiptensorflow/blob/hip/rocm_docs/tensorflow-quickstart.md

it shows CifarNet processing 6662.7 examples/sec, but it doesn’t specify which card that is, or what a comparably priced Nvidia card would be doing, so that’s kindof unfortunate.

FurryJackman · January 9, 2018, 2:51am

Just budging in here with this…

The reason why Nvidia is currently optimized for Tensorflow is because it was kind of built for Nvidia GPUs in mind. Titan V has “Tensor Cores” specific to Tensorflow like operations and can do 110GFLOPS of that on a single card.

Mandrewoid · January 9, 2018, 9:27am

@FurryJackman you could mail it to me if you’d like
I’d love a GTX 1080 ti. Unfortunately I missed all the Christmas sales, and now it seems prices have gone up (Canada) and most of the cards are out of stock
I should’ve bought a GTX 1070 when they were 500. They’re either 650-700 now or out of stock -_-

SuperNeutral · February 17, 2018, 9:08am

Im no expert in cat… but im pretty sure that one is about to destroy the universe.