1950X vs i9-7960X

Some personal testing. I do mostly pubsub and nosql type stuff @ work. I ran this bench this morning and found it interesting. This largely tests execution within the L2 cache, since the benchmark doesn’t touch that many keys in simplified mode. The execution profile is basically a test of gettimeofday + message serialization + hashing + copying.

The takeaway is that the Intel part gets slightly better IPC than the AMD part, no surprise there, the reason appears to be slightly better branch prediction, even though the Intel part records more branches in “perf stat”. The Intel part is frequency locked, it could probably turbo higher if I let it. I prefer the stability of a locked frequency when comparing performance runs. I’ve also noticed that the Intel part has better latency on cache reads and AMD has better write latency when the memory is exclusive cache write mode, although I haven’t spent a lot of time comparing that.

> redis-benchmark -q -p 8888 -c 1 -P 64 -n 1000000

  Intel i9-7960X @ 3.5ghz  AMD 1950X @ 4.15ghz    AMD - Intel

             requests/sec
             ------------
PING_INLINE: 9708738.00     9900990.00      +192252.00 (+1.9%)
PING_BULK:   9090909.00     8928571.00      -162338.00 (-1.7%)
SET:         4032258.25     4366812.00      +334554.00 (+7.5%)
GET:         4761905.00     5154639.00      +392734.00 (+7.6%)
INCR:        4032258.25     4524887.00      +492629.25 (+10.8%)
LPUSH:       3745318.50     3484320.50      -260998.00 (-7.0%)
RPUSH:       2008032.12     1769911.50      -238120.62 (-11.9%)
LPOP:        4065040.50     3984064.00       -80976.50 (-2.0%)
RPOP:        4081632.50     4016064.25       -65568.25 (-1.6%)
SADD:        3717472.25     3623188.50       -94284.25 (-2.5%)
HSET:        3246753.25     3067484.50      -179268.75 (-0.6%)
SPOP:        4273504.50     4166666.75      -106837.75 (-0.3%)
LPUSH        3773585.00     3496503.25      -277081.75 (-7.3%)
LRANGE_100:   185082.36      195541.66       +10459.30 (+5.3%)
LRANGE_300:    44458.28       48398.02        +3939.74 (+8.1%)
LRANGE_500:    26572.42       28996.43        +2424.01 (+8.3%)
LRANGE_600:    20656.46       22362.86        +1706.40 (+7.6%)
MSET_10:      690607.75      814995.94      +124388.19 (+15.3%)
                                                   avg  +2.1%

AMD "perf stat"
        106,916.39 msec task-clock:u              #    0.999 CPUs utilized
                 0      context-switches:u        #    0.000 K/sec
                 0      cpu-migrations:u          #    0.000 K/sec
             2,607      page-faults:u             #   24.384 M/sec
   443,205,373,114      cycles:u                  # 4145360.593 GHz                   (83.33%)
    91,057,431,294      stalled-cycles-frontend:u #   20.55% frontend cycles idle     (83.33%)
   167,896,924,202      stalled-cycles-backend:u  #   37.88% backend cycles idle      (83.33%)
 1,420,032,191,249      instructions:u            #    3.20  insn per cycle
                                                  #    0.12  stalled cycles per insn  (83.33%)
   246,230,102,295      branches:u                # 2303023890.671 M/sec              (83.33%)
       325,094,516      branch-misses:u           #    0.13% of all branches          (83.33%)

     107.042078374 seconds time elapsed

     106.368956000 seconds user
       0.004950000 seconds sys

Intel "perf stat"
        119,082.18 msec task-clock:u              #    1.000 CPUs utilized
                 0      context-switches:u        #    0.000 K/sec
                 0      cpu-migrations:u          #    0.000 K/sec
             2,608      page-faults:u             #   21.901 M/sec
   416,102,969,383      cycles:u                  # 3494255.802 GHz
 1,409,782,444,643      instructions:u            #    3.39  insn per cycle
   253,367,186,982      branches:u                # 2127669899.582 M/sec
       256,069,859      branch-misses:u           #    0.10% of all branches

     119.081333908 seconds time elapsed

     118.913041000 seconds user
       0.012933000 seconds sys
       
    clock ratio 3.5 / 4.15 = .84
run time  ratio 107 / 119  = .89