Some personal testing. I do mostly pubsub and nosql type stuff @ work. I ran this bench this morning and found it interesting. This largely tests execution within the L2 cache, since the benchmark doesn’t touch that many keys in simplified mode. The execution profile is basically a test of gettimeofday + message serialization + hashing + copying.
The takeaway is that the Intel part gets slightly better IPC than the AMD part, no surprise there, the reason appears to be slightly better branch prediction, even though the Intel part records more branches in “perf stat”. The Intel part is frequency locked, it could probably turbo higher if I let it. I prefer the stability of a locked frequency when comparing performance runs. I’ve also noticed that the Intel part has better latency on cache reads and AMD has better write latency when the memory is exclusive cache write mode, although I haven’t spent a lot of time comparing that.
> redis-benchmark -q -p 8888 -c 1 -P 64 -n 1000000
Intel i9-7960X @ 3.5ghz AMD 1950X @ 4.15ghz AMD - Intel
requests/sec
------------
PING_INLINE: 9708738.00 9900990.00 +192252.00 (+1.9%)
PING_BULK: 9090909.00 8928571.00 -162338.00 (-1.7%)
SET: 4032258.25 4366812.00 +334554.00 (+7.5%)
GET: 4761905.00 5154639.00 +392734.00 (+7.6%)
INCR: 4032258.25 4524887.00 +492629.25 (+10.8%)
LPUSH: 3745318.50 3484320.50 -260998.00 (-7.0%)
RPUSH: 2008032.12 1769911.50 -238120.62 (-11.9%)
LPOP: 4065040.50 3984064.00 -80976.50 (-2.0%)
RPOP: 4081632.50 4016064.25 -65568.25 (-1.6%)
SADD: 3717472.25 3623188.50 -94284.25 (-2.5%)
HSET: 3246753.25 3067484.50 -179268.75 (-0.6%)
SPOP: 4273504.50 4166666.75 -106837.75 (-0.3%)
LPUSH 3773585.00 3496503.25 -277081.75 (-7.3%)
LRANGE_100: 185082.36 195541.66 +10459.30 (+5.3%)
LRANGE_300: 44458.28 48398.02 +3939.74 (+8.1%)
LRANGE_500: 26572.42 28996.43 +2424.01 (+8.3%)
LRANGE_600: 20656.46 22362.86 +1706.40 (+7.6%)
MSET_10: 690607.75 814995.94 +124388.19 (+15.3%)
avg +2.1%
AMD "perf stat"
106,916.39 msec task-clock:u # 0.999 CPUs utilized
0 context-switches:u # 0.000 K/sec
0 cpu-migrations:u # 0.000 K/sec
2,607 page-faults:u # 24.384 M/sec
443,205,373,114 cycles:u # 4145360.593 GHz (83.33%)
91,057,431,294 stalled-cycles-frontend:u # 20.55% frontend cycles idle (83.33%)
167,896,924,202 stalled-cycles-backend:u # 37.88% backend cycles idle (83.33%)
1,420,032,191,249 instructions:u # 3.20 insn per cycle
# 0.12 stalled cycles per insn (83.33%)
246,230,102,295 branches:u # 2303023890.671 M/sec (83.33%)
325,094,516 branch-misses:u # 0.13% of all branches (83.33%)
107.042078374 seconds time elapsed
106.368956000 seconds user
0.004950000 seconds sys
Intel "perf stat"
119,082.18 msec task-clock:u # 1.000 CPUs utilized
0 context-switches:u # 0.000 K/sec
0 cpu-migrations:u # 0.000 K/sec
2,608 page-faults:u # 21.901 M/sec
416,102,969,383 cycles:u # 3494255.802 GHz
1,409,782,444,643 instructions:u # 3.39 insn per cycle
253,367,186,982 branches:u # 2127669899.582 M/sec
256,069,859 branch-misses:u # 0.10% of all branches
119.081333908 seconds time elapsed
118.913041000 seconds user
0.012933000 seconds sys
clock ratio 3.5 / 4.15 = .84
run time ratio 107 / 119 = .89