New benchmark, who dis? -- A cpu benchmark from devember

Running the benchmark on PC, in the meantime a question via mobile: would you accept a PR adding CMake support (side by side with the current Makefile of course)? It would allow for easier option switching and build configuration.

1 Like

yeah, definitely

I don’t have full results yet, but something weird is going on here - usually clang gives out faster code than gcc, meanwhile here gcc is clearly far faster. Also, this bench puts a lot of pressure on memory - I noticed because my PC became almost unresponsive.

Anyway, here are the results from my 5900X, RAM is 4x16GB 3733@16-18-1838. Full only for gcc, for clang I unfortunately closed the terminal before copying the full results.

gcc

$ time ./helsing -n 12
time ./helsing -l 1000000000000000 -u 1020000000000000
time ./helsing -n 14
Checking interval: [100000000000, 999999999999]
Found: 4390670 vampire number(s).
./helsing -n 12  75.57s user 0.02s system 2341% cpu 3.228 total
Checking interval: [1000000000000000, 1020000000000000]
Found: 61785276 vampire number(s).
./helsing -l 1000000000000000 -u 1020000000000000  8861.24s user 1.81s system 2274% cpu 6:29.73 total
Checking interval: [10000000000000, 99999999999999]
Found: 208423682 vampire number(s).
./helsing -n 14  8327.88s user 2.38s system 2371% cpu 5:51.29 total

clang

$ time ./helsing -n 12
Checking interval: [100000000000, 999999999999]
Found: 4390670 vampire number(s).
./helsing -n 12  100.46s user 0.06s system 2347% cpu 4.281 total

1 Like

The performance characteristics are a bit odd.

GCC always produced a faster executable, -O2 was always faster than -O3/-Ofast, and it doesn’t like smt/multithreading.

In a 12 core 24 thread system:

./helsing -t 12 ...

will usually outperform:

./helsing -t 24 ...

I expected the other way around to be honest. We once benched a friend’s project and clang has shown something like 40% improvement over GCC on multiple CPUs. But that’s a math heavy compute code.

As for scaling, it seems there’s almost no scaling above 12 threads. Guess yours is one of those edge cases where SMT doesn’t help. Or the Ryzen is that memory starved. I still remember back in 2011 or so Autodesk recommented turning HT off on Intel processors.

$ for T in 1 2 4 8 12 16 20 24 28; do time ./helsing -n12 -t $T; done
Checking interval: [100000000000, 999999999999]
Found: 4390670 vampire number(s).
./helsing -n12 -t $T  32.54s user 0.01s system 99% cpu 32.555 total
Checking interval: [100000000000, 999999999999]
Found: 4390670 vampire number(s).
./helsing -n12 -t $T  33.32s user 0.04s system 197% cpu 16.889 total
Checking interval: [100000000000, 999999999999]
Found: 4390670 vampire number(s).
./helsing -n12 -t $T  33.84s user 0.00s system 397% cpu 8.523 total
Checking interval: [100000000000, 999999999999]
Found: 4390670 vampire number(s).
./helsing -n12 -t $T  35.88s user 0.05s system 794% cpu 4.524 total
Checking interval: [100000000000, 999999999999]
Found: 4390670 vampire number(s).
./helsing -n12 -t $T  39.85s user 0.00s system 1178% cpu 3.381 total
Checking interval: [100000000000, 999999999999]
Found: 4390670 vampire number(s).
./helsing -n12 -t $T  52.35s user 0.00s system 1560% cpu 3.355 total
Checking interval: [100000000000, 999999999999]
Found: 4390670 vampire number(s).
./helsing -n12 -t $T  65.16s user 0.03s system 1981% cpu 3.290 total
Checking interval: [100000000000, 999999999999]
Found: 4390670 vampire number(s).
./helsing -n12 -t $T  75.51s user 0.03s system 2316% cpu 3.261 total
Checking interval: [100000000000, 999999999999]
Found: 4390670 vampire number(s).
./helsing -n12 -t $T  76.26s user 0.05s system 2323% cpu 3.285 total
1 Like

@frred for some reason, I’m only getting this warning when building a release build under CMake (which, also, segfaults right now). Of note is that CMake defaults to -O3 in release builds and as of yet I’m having a hard time figuring out how to change this

/home/jaskij/projects/helsing/helsing/src/main.c: In function ‘main’:
/home/jaskij/projects/helsing/helsing/src/main.c:67:25: warning: ‘*threads_44 + _19’ may be used uninitialized [-Wmaybe-uninitialized]
   67 |                         pthread_join(threads[thread], 0);
      |                         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1 Like

I was able to isolate which compiler option caused the segfault - it was not the optimization level, but add -DNDEBUG.

2 Likes