Benchmark Request Thread

merry · November 20, 2020, 2:08pm

This is a thread for requesting specific benchmarks on specific hardware, to help users with niche needs determine if a piece of hardware is right for them. Apologies if such a thread exists, if it does I can’t find it.

What are the odds, I have a benchmark request. If it goes smoothly it shouldn’t take long assuming you already have a working OpenCL environment.

edit: This request is still open, I’ve reworded it to simplify.

Requirements: A Big Navi card, a working OpenCL installation, an internet connection to download the repo and its requirements of the program to test
Purpose: A preliminary investigation to determine how viable Big Navi is for Mersenne prime hunting, by running some short tests with gpuowl (a Prime95 equivalent for PRP testing using OpenCL)
Prep (Ubuntu, alter as necessary for your OS):

sudo apt install libgmp-dev
git clone https://github.com/preda/gpuowl && cd gpuowl && make

With luck it compiles with no fuss, the best chance of that is to use 20.04 or another recent distro.

Test (run from gpuowl directory):

#!/bin/bash
if [ ! -f ./gpuowl ];then
make
fi
ITERS=200000
MAXMEM=14000
for VAR in “” “-carry short” “-maxAlloc $MAXMEM”; do
for NUM in 57885161 77936867; do
echo “[$NUM, $VAR]” >>gpuowltest.log
./gpuowl -prp $NUM -iters $ITERS $VAR >>gpuowltest.log
rm -r $NUM
done
done
echo “[332220523, ]” >>gpuowltest.log
./gpuowl -prp 332220523 -iters 50000 >>gpuowltest.log

With luck it all goes well, I’m after the entire contents of the gpuowltest.log file, which should contain the results from seven tests. Ideally a lot more tweaking is necessary to get a proper picture, but these preliminary tests should be enough to determine if it’s worth me attempting to get a card to find out. Thanks.

edit:
@Praetorian fulfilled my benchmark request, thank you very much. For future reference quoting a script as above borks the script so future requests should attach instead. OpenCL in ROCm 4.0 works with Big Navi from a standard install even if the rest doesn’t, which is a result in itself.

Result:
The benchmarks show that Big Navi is of interest for prime hunting with gpuowl and generally FP64-heavy scientific workloads. tl;dr a larger exponent requires a larger FFT to test it so more working memory. The 77 million bit number test performs ~84% as well on the 6900XT than it does on a Radeon VII, and the 332 million bit number test performs ~66% as well. Despite having 1TB/s memory bandwidth gpuowl is memory bound on a Radeon VII, these results show that the infinity cache does compensate nicely for the lower memory bandwidth when the cache is not overwhelmed. gpuowl is an odd workload in that every iteration is basically doing a giant FFT multiplication where all used memory is accessed (and each iteration requires the previous to have been completed), causing the cache to be less useful at high bit count exponents. This isn’t a dealbreaker for practical testing as the current wavefront is at ~110 million bits, well within where the infinity cache does a good job.

I suspect that the 77 million bit test is not memory bound but compute bound, big navi is not a compute card but thankfully still has a decent enough 1:16 DP ratio to serve some FP64 compute workloads like this one. In that case a 6800XT may perform similarly to a 6900XT, higher clocks to compensate for 8 missing CU’s but a worse bin and higher clocks will negatively impact efficiency slightly.