AMD EPYC Turin Is OUT! Zen5 9000 Benchmarking w/ COMSOL & Phoronix Test Suite

Official Press Deck Slides





































COMSOL

FGMRES Compute

Pardiso Compute

CFD Only 10GB

EM Only 260GB

Phoronix Test Suite

AMD EPYC Turin Benchmarks.rar (896.4 KB)

Test Systems


Summary



Timed Compilations

Timed Linux Kernel Compilation 6.8, Build: defconfig

Timed Linux Kernel Compilation 6.8, Build: allmodconfig

Timed FFmpeg Compilation 7.0, Time To Compile

Timed Godot Game Engine Compilation 4.0, Time To Compile

Timed Node.js Compilation 21.7.2, Time To Compile

Timed Gem5 Compilation 23.0.1, Time To Compile

Timed LLVM Compilation 16.0, Build System: Ninja

OpenSSL

OpenSSL 3.3, Algorithm: RSA4096

OpenSSL 3.3, Algorithm: RSA4096 (2)

OpenSSL 3.3, Algorithm: SHA256

OpenSSL 3.3, Algorithm: SHA512

OpenSSL 3.3, Algorithm: AES-128-GCM

OpenSSL, Algorithm: AES-128-GCM

OpenSSL 3.3, Algorithm: AES-256-GCM

OpenSSL, Algorithm: AES-256-GCM

OpenSSL 3.3, Algorithm: ChaCha20

OpenSSL, Algorithm: ChaCha20

OpenSSL 3.3, Algorithm: ChaCha20-Poly1305

OpenSSL, Algorithm: ChaCha20-Poly1305

John The Ripper

John The Ripper 2023.03.14, Test: Blowfish

John The Ripper 2023.03.14, Test: bcrypt

John The Ripper 2023.03.14, Test: WPA PSK

RocksDB 9.0, Test: Random Read

Speeddb

Speedb 2.7, Test: Random Read

Speedb 2.7, Test: Read While Writing

Speedb 2.7, Test: Random Fill & Variant: Monero - Hash Count: 1M

SecureMark 1.0.4, Benchmark: SecureMark-TLS

Coremark 1.0, CoreMark Size 666 - Iterations Per Second

Google SynthMark 20201109, Test: VoiceMark_100

Algebraic Multi-Grid Benchmark 1.2

WRF 4.2.2, Input: conus 2.5km

ACES DGEMM 1.0, Sustained Floating-Point Rate

RELION 4.0.1, Test: Basic - Device: CPU

LULESH 2.0.3

miniBUDE

miniBUDE 20210901, Implementation: OpenMP - Input Deck: BM2

miniBUDE 20210901, Implementation: OpenMP - Input Deck: BM2

LAMMPS

LAMMPS Molecular Dynamics Simulator 23Jun2022, Model: Rhodopsin Protein

LAMMPS Molecular Dynamics Simulator 23Jun2022, Model: 20k Atoms

m-queens 1.2, Time To Solve

miniFE 2.2, Problem Size: Small

ASKAP

ASKAP 1.0, Test: tConvolve MPI - Degridding

ASKAP 1.0, Test: tConvolve MPI - Gridding

NAMD

NAMD 3.0b6, Input: ATPase with 327,506 Atoms

NAMD 3.0b6, Input: STMV with 1,066,628 Atoms

GROMACS 2024, Implementation: MPI CPU - Input: water_GMX50_bare

QuantLib

QuantLib 1.32, Configuration: Single-Threaded

QuantLib 1.32, Configuration: Multi-Threaded

QMCPACK 3.17.1Input: Li2_STO_ae

GPAW 23.6, Input: Carbon Nanotube

High Performance Conjugate Gradient 3.1, X Y Z: 144 144 144 - RT: 60

Pennant

Pennant 1.0.1, Test: leblancbig

Pennant 1.0.1, Test: sedovbig

NAS Parallel

NAS Parallel Benchmarks 3.4, Test / Class: EP.D

NAS Parallel Benchmarks 3.4, Test / Class: LU.C

NAS Parallel Benchmarks 3.4, Test / Class: SP.C

NAS Parallel Benchmarks 3.4, Test / Class: IS.D

NAS Parallel Benchmarks 3.4, Test / Class: MG.C

NAS Parallel Benchmarks 3.4, Test / Class: CG.C

NWChem 7.0.2, Input: C240 Buckyball

Xcompact3d

Xcompact3d Incompact3d 2021-03-11, Input: input.i3d 193 Cells Per Direction

Xcompact3d Incompact3d 2021-03-11, Input: X3D-benchmarking input.i3d

BRL-CAD 7.38.2, VGR Performance Metric

OpenFOAM

OpenFOAM 10, Input: drivaerFastback, Small Mesh Size - Mesh Time

OpenFOAM 10, Input: drivaerFastback, Small Mesh Size - Execution Time

OpenFOAM 10, Input: drivaerFastback, Medium Mesh Size - Execution Time

OpenFOAM 10 & Speedb 2.7, Test: Sequential Fill

OpenRadioss

OpenRadioss 2023.09.15, Model: INIVOL and Fluid Structure Interaction Drop Container

OpenRadioss 2023.09.15, Model: Chrysler Neon 1M

Blender

Blender 4.1, Blend File: BMW27 - Compute: CPU-Only

Blender 4.1, Blend File: Classroom - Compute: CPU-Only

Blender 4.1, Blend File: Fishy Cat - Compute: CPU-Only

Blender 4.1, Blend File: Pabellon Barcelona - Compute: CPU-Only

Blender 4.1, Blend File: Barbershop - Compute: CPU-Only

Blender 4.1, Blend File: Junkshop - Compute: CPU-Only

LuxCoreRender

LuxCoreRender 2.6, Scene: DLSC - Acceleration: CPU

LuxCoreRender 2.6, Scene: LuxCore Benchmark - Acceleration: CPU

LuxCoreRender 2.6, Scene: Orange Juice - Acceleration: CPU

OSPRay

OSPRay 3.1, Benchmark: gravity_spheres_volume/dim_512/pathtracer/real_time

OSPRay 3.1, Benchmark: particle_volume/ao/real_time

OSPRay 3.1, Benchmark: particle_volume/scivis/real_time

OSPRay Studio 1.0, Camera: 1 - Resolution: 4K - Samples Per Pixel: 1 - Renderer: Path Tracer - Acceleration: CPU

OSPRay Studio 1.0, Camera: 1 - Resolution: 4K - Samples Per Pixel: 16 - Renderer: Path Tracer - Acceleration: CPU

OSPRay Studio 1.0, Camera: 1 - Resolution: 4K - Samples Per Pixel: 32 - Renderer: Path Tracer - Acceleration: CPU

OSPRay Studio 1.0, Camera: 3 - Resolution: 4K - Samples Per Pixel: 1 - Renderer: Path Tracer - Acceleration: CPU

OSPRay Studio 1.0, Camera: 3 - Resolution: 4K - Samples Per Pixel: 16 - Renderer: Path Tracer - Acceleration: CPU

OSPRay Studio 1.0, Camera: 3 - Resolution: 4K - Samples Per Pixel: 32 - Renderer: Path Tracer - Acceleration: CPU

Embree

Embree 4.3, Binary: Pathtracer ISPC - Model: Asian Dragon

Embree 4.3, Binary: Pathtracer ISPC - Model: Asian Dragon Obj

Embree 4.3, Binary: Pathtracer ISPC - Model: Crown

Intel Open Image Denoise

Intel Open Image Denoise 2.2, Run: RT.hdr_alb_nrm.3840x2160 - Device: CPU-Only

Intel Open Image Denoise 2.2, Run: RT.ldr_alb_nrm.3840x2160 - Device: CPU-Only

Intel Open Image Denoise 2.2, Run: RTLightmap.hdr.4096x4096 - Device: CPU-Only

OpenVKL 2.0.0, Benchmark: vklBenchmarkCPU ISPC

7-Zip Compression

7-Zip Compression 22.01, Test: Compression Rating

7-Zip Compression 22.01, Test: Decompression Rating

Parallel BZIP2 Compression 1.1.13, FreeBSD-13.0-RELEASE-amd64-memstick.img Compression

PyBench 2018-02-16, Total For Average Test Times

Numpy Benchmark

SVT-AV1 2.0, Encoder Mode: Preset 8 - Input: Bosphorus 4K

WebP Image Encode 1.2.4, Encode Settings: Quality 100, Highest Compression

libavif avifenc

libavif avifenc 1.0, Encoder Speed: 0

libavif avifenc 1.0, Encoder Speed: 2

libavif avifenc 1.0, Encoder Speed: 6, Lossless

libavif avifenc 1.0, Encoder Speed: 10, Lossless

ASTC Encoder

ASTC Encoder 4.7, Preset: Thorough

ASTC Encoder 4.7, Preset: Very Thorough

ASTC Encoder 4.7, Preset: Exhaustive

GraphicsMagick 1.3.43, Operation: Noise-Gaussian

Liquid-DSP

Liquid-DSP 1.6, Threads: 1 - Buffer Length: 256 - Filter Length: 32

Liquid-DSP 1.6, Threads: 1 - Buffer Length: 256 - Filter Length: 57

Liquid-DSP 1.6, Threads: 1 - Buffer Length: 256 - Filter Length: 512

Liquid-DSP 1.6, Threads: 64 - Buffer Length: 256 - Filter Length: 32

Liquid-DSP 1.6, Threads: 128 - Buffer Length: 256 - Filter Length: 32

Liquid-DSP 1.6, Threads: 128 - Buffer Length: 256 - Filter Length: 512

Liquid-DSP 1.6, Threads: 256 - Buffer Length: 256 - Filter Length: 57

Liquid-DSP 1.6, Threads: 256 - Buffer Length: 256 - Filter Length: 512

srsRAN Project

srsRAN Project 23.10.1-20240325, Test: PUSCH Processor Benchmark, Throughput Total

srsRAN Project 23.10.1-20240325, Test: PUSCH Processor Benchmark, Throughput Thread

srsRAN Project 23.10.1-20240325, Test: PDSCH Processor Benchmark, Throughput Total

srsRAN Project 23.10.1-20240325, Test: PDSCH Processor Benchmark, Throughput Thread

TensorFlow 2.16.1, Device: CPU - Batch Size: 512 - Model: ResNet-50

OpenVINO

OpenVINO 2024.0, Model: Face Detection FP16-INT8 - Device: CPU

OpenVINO 2024.0, Model: Face Detection FP16-INT8 - Device: CPU (2)

OpenVINO 2024.0, Model: Person Detection FP16 - Device: CPU

OpenVINO 2024.0, Model: Person Detection FP16 - Device: CPU

OpenVINO 2024.0, Model: Weld Porosity Detection FP16-INT8 - Device: CPU

OpenVINO 2024.0, Model: Weld Porosity Detection FP16-INT8 - Device: CPU

OpenVINO 2024.0, Model: Vehicle Detection FP16-INT8 - Device: CPU

OpenVINO 2024.0, Model: Vehicle Detection FP16-INT8 - Device: CPU

OpenVINO 2024.0, Model: Person Vehicle Bike Detection FP16 - Device: CPU

OpenVINO 2024.0, Model: Person Vehicle Bike Detection FP16 - Device: CPU

OpenVINO 2024.0, Model: Machine Translation EN To DE FP16 - Device: CPU

OpenVINO 2024.0, Model: Machine Translation EN To DE FP16 - Device: CPU

OpenVINO 2024.0, Model: Face Detection Retail FP16-INT8 - Device: CPU

OpenVINO 2024.0, Model: Face Detection Retail FP16-INT8 - Device: CPU

OpenVINO 2024.0, Model: Handwritten English Recognition FP16-INT8 - Device: CPU

OpenVINO 2024.0, Model: Handwritten English Recognition FP16-INT8 - Device: CPU

OpenVINO 2024.0, Model: Road Segmentation ADAS FP16-INT8 - Device: CPU

OpenVINO 2024.0, Model: Road Segmentation ADAS FP16-INT8 - Device: CPU

OpenVINO 2024.0, Model: Person Re-Identification Retail FP16 - Device: CPU

OpenVINO 2024.0, Model: Person Re-Identification Retail FP16 - Device: CPU

ONNX Runtime

ONNX Runtime 1.17, Model: GPT-2 - Device: CPU - Executor: Standard

ONNX Runtime 1.17, Model: GPT-2 - Device: CPU - Executor: Standard

ONNX Runtime 1.17, Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Standard

ONNX Runtime 1.17, Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Standard

ONNX Runtime 1.17, Model: yolov4 - Device: CPU - Executor: Standard

Xmrig 6.21, Variant: GhostRider - Hash Count: 1M

Helsing 1.0-beta, Digit Range: 14 digit

Stockfish 16.1, Chess Benchmark

Primsieve

Primesieve 12.1, Length: 1e12

Primesieve 12.1, Length: 1e13

Y-Cruncher

Y-Cruncher 0.8.3, Pi Digits To Calculate: 500M

Y-Cruncher 0.8.3, Pi Digits To Calculate: 1B

Y-Cruncher 0.8.3, Pi Digits To Calculate: 10B & OpenVINO, Model: Face Detection FP16

8 Likes

Let me see how much money I have saved up for this…
broke-poor

2 Likes

That are a lot of graphs…if you can measure scrolling down all the graphs in seconds, this test is probably the record so far.

I guess AMD didn’t like all the Xeon 6 attention lately. I’ll check the results later…how long does the entire suite run?

does L1T have a standard run with selected benchmarks or is it run all stuff that’s out there?

Maybe we get an upgrade to Siena and Bergamo as we did with the Genoa generation…Zen5 Siena certainly is more within John Travoltas budget. And AMD plans to keep SP6 for more than a generation.
And the 9575F is only 15k$. You saved some money on that SSD lately, so you’re getting closer

edit: Amber streamlined the graphs and shortened the scroll time by 80%. She must be using one of these new Turin chips!

2 Likes

Oh I wish!

2 Likes

Yes, but my personal EPYC Genoa build just shifted price by $3k with the CPU alone. Faster RAM is pricier, but necessary…

@JayVenturi How will you reinforce the motherboard for the air coolers you’ll inevitably place on these 500 watt TDP monsters?

1 Like

@JayVenturi said he must use a custom water Loop for his next EPYC upgrade. I want the 192-core EPYC, but I can’t justify the $15,000 price tag, especially since I want to purchase two, which are $30,000 just for CPUs.

If I look at the Xeon 6 with 500W, seems like good old passive cooling block + enough 11k RPM fans will do the job just as before.

Interesting note…Turin uses Zen5c cores. So that’s more an updated Bergamo rather than updated Genoa. Seems like AMD can’t scale “standard” cores any higher. Zen4c and Zen5c becoming the standard soon?
Or they had bad experience with Bergamo+Genoa marketing and just merged everything under Turin

So far only 5 of the Turin SKUs use Zen 5c cores, the majority are normal Zen 5 cores.

Yes, BUTTTT more cores better
It would be fiscally irresponsible to forego potential performance
And I am goin single socket. When I can now single socket 192 cores vs 128…
well that’s a 50% bump, which is 200% more than the 96 core I had in the cart!

images200

1 Like

Imagine the bump when using a 2P system. Imagine all the threads in top were actual cores and not SMT cores. Where others have threads, you have cores.
A single 192-Core System is really only a poor mans workstation. Life starts >200

400% more cores are 800% more performance just because of dopamine and serotonine enhancing the perceived performance by x2. Placebo purchase is a thing

Ok, so they just merged Bergamo+Genoa into one and the higher core counts are Zen5c and everything below is Zen5. Makes sense. Because a 24-core Zen5c for 4x the price of a 24-core Siena would be outrageous :wink:

1 Like

I’m dyin here
My EPYC build was supposed to be a meme machine home server build
Duplicating what we’ve been building for customers.
Did some napkin math and it would have been able to replace at least 1 of my prod servers at the house.

Now, it can replace…still 1

1 Like

@Level1_Amber you forgot the most important slide

1 Like

Is this the part where I post a picture of two Rolls Royce turbine coolers dangling of some home workstation?

…going to to have to come up with something new for dual 500W. It is feasible to go air, but no longer viable in a relatively small footprint.

Try holding, with yer bare palm, a full on 500w glass floodlamp, now imagine trying to dissipate the heat of TWO of those (1000W) just using air.

Please included the words “silently” and “small form factor” in the build and I say bullocks !

It will have to be a very well thought out liquid cooling or phase change systems.

…not to mention PRICEY. I don’t know if I can add a third and fourth full time job for just this habit.

I will be available to gladly help other folks spend THEIR money on this.

2 Likes

We always used the Dynatron J12’s in 4u cases

Dynamic fan control kept it under control…during installs

Then we got the calls, “there’s a LOUD noise comin from the server closet”

looks like dynatron already has a solution for the punishment gluttons:

4 Likes

…comes with ear plugs?

Looks like a screamer

1 Like

fixed ;p I added the official press deck slides to the post:

4 Likes

Oh for Turin, you have no idea xD Also, the AC couldn’t keep up for the room. When you walked by the server closet it was a waft of heat

2 Likes

janitors, cleaning personnel or generally “the uninitiated”…treating the data center like some kind of witchcraft or a nuclear power plant where every flashing light or noise can result in an immediate meltdown. my standard answer: “If we can log into the AS/400 and nothing is burning → it’s fine! and don’t plug any cables”

It’s a fan row that fits into 1U. So probably 40mm fans with 6A running at 11k RPM@100%. With 2U, you can at least physically use 80mm fans and keep RPM at a sane level. But I like that Dynatron AIO…provides a lot more surface.

2 Likes

The difference between you and me is that my new virtualization server is for personal use, while I hope your new server is used to make money. Even if I were to go with just one EPYC 192 core, my final budget would be between 20,000 and 25,000 dollars, and I couldn’t justify spending that much money on personnel use.

1 Like