What's the fastest processor for single threaded/single process of running sha256sum?

Yeah…which, I’m not sure if that would cause other problems for me. (Hardware compatibility issues with NFSoRDMA or software compatibility issues for the CAE programs that I use that hasn’t been certified to run on CentOS 8 yet.)

It’s one of those “you fix one problem, but break or create many other problems.”

Yayyyy Linux!!!

2 Likes

It’s not a unique problem to Linux but yeah I feel your pain.

For a true apples to apples comparison it needs to stay with your exact hardware setup.

@alpha754293 why don’t you just get a live USB of some newer version of fedora running the 5.x kernel, and then run the test?

1 Like

My bad, i’m relatively new to linux

No you’re fine. It’s one of those things you pick up.

Rhel freezes their kernel and backports patches.

Each major release is supported for 10 years which is why it’s used so much in the enterprise.

1 Like

Could it be that comparing my result to OP’s is “flawed”? My OpenSSL version is newer.

Your hardware, and software versions are different than his.

It would be interesting if you spun up a CentOS 7.8 live USB and ran the test again to see your performance delta.

I downloaded the centos-kde iso from here but on installation it asked me to install a kernel and I don’t know how to do that (lol noob)

lots of googling later i maybe found something but currently booted to windows not linux =/

1 Like

I don’t know, that’s why I gave the disclaimer. It is a pretty solid way to judge maximum single core performance between CPUs in general. And when averaging out a couple runs back to back it is also very consistent. But I have no experience with benchmarking hashing algorithms and most of my systems are packed up for moving currently. So I can’t even run stuff myself right now.

In case its relevant, here’s what my 2700x puts out

OpenSSL 1.1.1d FIPS  10 Sep 2019
built on: Thu Oct  3 00:00:00 2019 UTC
options:bn(64,64) md2(char) rc4(8x,int) des(int) aes(partial) idea(int) blowfish(ptr) 
compiler: gcc -fPIC -pthread -m64 -Wa,--noexecstack -Wall -O3 -O2 -g -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS -fexceptions -fstack-protector-strong -grecord-gcc-switches -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1 -m64 -mtune=generic -fasynchronous-unwind-tables -fstack-clash-protection -fcf-protection -Wa,--noexecstack -Wa,--generate-missing-build-notes=yes -specs=/usr/lib/rpm/redhat/redhat-hardened-ld -DOPENSSL_USE_NODELETE -DL_ENDIAN -DOPENSSL_PIC -DOPENSSL_CPUID_OBJ -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5 -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DKECCAK1600_ASM -DRC4_ASM -DMD5_ASM -DAESNI_ASM -DVPAES_ASM -DGHASH_ASM -DECP_NISTZ256_ASM -DX25519_ASM -DPOLY1305_ASM -DZLIB -DNDEBUG -DPURIFY -DDEVRANDOM="\"/dev/urandom\"" -DSYSTEM_CIPHERS_FILE="/etc/crypto-policies/back-ends/openssl.config"
The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes  16384 bytes
sha256          248651.25k   634125.85k  1311330.13k  1773534.55k  1985853.13k  1996559.70k

IIRC, Zen is really good at hashing…

Yes and no, I do seem to recall zen is particularly good at openssl related tasks, maybe its the huge cache vs. intel, maybe its just instruction set optimisation. Either way, it was one of the benchmarks AMD cherry picked to illustrate zen performance back in the day IIRC.

In any case, if you’re wanting fast hashing performance maybe checking performance on a Zen2 part might be worth it. For single thread, even something like a 3300x. Apparently they clock like a champ.

For what it’s worth, here’s Icelake in a MacBook Air. i7 quad core at 1.2ghz - 3.8ghz boost.

jrose@Jethros-Air ~ % openssl speed sha256
Doing sha256 for 3s on 16 size blocks: 12829129 sha256's in 2.98s
Doing sha256 for 3s on 64 size blocks: 7069845 sha256's in 2.98s
Doing sha256 for 3s on 256 size blocks: 2993711 sha256's in 2.98s
Doing sha256 for 3s on 1024 size blocks: 920891 sha256's in 2.99s
Doing sha256 for 3s on 8192 size blocks: 121577 sha256's in 2.98s
LibreSSL 2.8.3
built on: date not available
options:bn(64,64) rc4(16x,int) des(idx,cisc,16,int) aes(partial) blowfish(idx) 
compiler: information not available
The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
sha256           68852.37k   151846.20k   257157.92k   315589.15k   333756.95k
jrose@Jethros-Air ~ %
1 Like

hashcat is one such implementation, but not sure how useful it’d be to you.

Are you sure you need sha256? Also, why limit yourself to single core?

(I’m thinking it might be faster to encrypt/decrypt using aes-gcm or some such thing, not sure, but it might be)

It appears to be more of a Linux problem in my experience than it is with Windows.

Like I can upgrade the entire Windows OS and my CAE applications would still be certified to run on it without any issues.

With Linux, that’s generally NOT the case due to specific incompatibilities, not only BETWEEN packages, but also between different versions of the same package (which just seems crazy to me).

In other words, suppose that a Linux application needed a specific version of a package or another application or software or something, you would think that the newer version would be able to do everything that the older versions can do, but that is certainly not the case, especially with some of the more “core” or basic libraries like libpng and/or libjpeg. It HAS to use that specific, older version for the application to work. And then you’re on rpmfind.net HOPING that someone would have uploaded that specific version of the package individually so that you can download it and force it to install on your system.

I think that the list of packages now that I have, post OS install, is somewhere between like maybe 30-50 packages now. (I’ve dumped the names of all of the packages that I need into a OneNote notebook because it got to be too many for me to remember off the top of my head.)

With Windows, I don’t have this kind of a problem.

My application doesn’t run on MacOS.

Again, because not all of the latest and greatest supports NFSoRDMA.

And it’s also my cluster headnode, so that system is used to manage my cluster, which again, goes back to the whole point about application compatibility.

No worries.

dammmmnnnn…

Those are some really good numbers.

Yeah…that’s why I wasn’t quite so sure.

The latest benchmarks for the Intel Core blah blah series 10XYZ series processors – according to Anandtech, the single core Cinebench R20 score is maybe like 1 point better than the Ryzen 3950X, which is statistically insignificant cuz it’s well within the noise space/test-to-test variation.

That almost 2 GB/s hash rate for 8 kiB blocks – that’s super impressive!!!

Yeah…my Core i7 3930K performs more like that.

Thank you everybody for all of your contributions.

It would appear that Ryzen (and maybe EPYC? by extension) is just kicking Intel’s arse when it comes to hashing.

Sheesh…that’s a huge difference.

If there are other people who have and/or would like to run their test, that would be greatly appreciated.

The more data, the better.

Thank you, everybody!

Have you decided on a budget for a new system?

1 Like

Yea. MD5 has collisions.

And so does sha128 technically (although the official SHA-1 spec shows that the minimum bit size for SHA is actually 160 bits, but even then, at sha160, there are collisions with that as well). sha256 is more secure against collisions.

Mostly because I haven’t figured out a way to parallelise the sha256sum tool/application and I’m not sure that one exists when it is working on a single, large file.

I already have a script that will run it in parallel for many, many files, but I don’t have a way to parallelise it (yet), for a single, large file.

By default, when you fire up the sha256sum command, it only uses one core. I’m open to parallelising it (against a single, large file), but I don’t know enough about the SHA-2 algorithm to be able to do that myself (because I don’t know, for example, if you were to break up the file into n number of pieces, and then “sum up” the results of the individual pieces, whether you’d get the same answer if you just ran the sha256sum algorithm on the large file, without breaking it up.

Superficially, I wouldn’t think that you would get the same answer, but I don’t know enough about it to know how to begin to parallelise it. And I also think/trust that the people who do this for a living, the experts, if they haven’t parallelised it by now, there’s probably a really good reason for that.

Not quite yet.

There’s a couple of different options and also factors that are outside of this.

So…for example, Ryzen 3950X might be able to have a really fast single core performance, but it doesn’t supply enough PCIe lanes for a GPU AND a 4x EDR Infiniband network card and also a 12 Gbps SAS RAID HBA. (My total demand between just those three devices alone require 48 PCIe 3.0 lanes, which the Ryzen 3950X can’t supply.)

So…that automatically puts me into AMD EPYC territory due to the PCIe 3.0 lane demands, but now I can’t get as high, in terms of single threaded performance.

My “dream” HEDT processor would be something like a 3950X, but with like 64 PCIe 3.0 (or 4.0 – I mean, it’s irrelevant to me which generation, but 3.0 minimum, because I don’t have any 4.0 devices yet) lanes, so that I would be able to do it all with one processor.

But no such product exists.

There were older Intel processors that supplied up to 48 PCIe 3.0 lanes, but that also means that they’re not that much faster than what I currently have, which, kind of defeats the purpose of trying to get/use those.

And then, on TOP of that, because of the way that AMD EPYC is architected, using the chiplet/CCX design, so in order to maximize the total memory bandwidth, I think that I would need like a minimum of either 16 or 32 cores (on the EPYC) so that all of the CCXs would be directly connected to the IO die, etc., etc.

So, there are a lot of factors that go into that.

And if I go EPYC, now I am looking at a system that’s close to $8k vs. I might be able to get by with like a sub-$2k system if I were using the Ryzen 3950X.

So…yeah…

sigh

dilemmas, dilemmas, dilemmas.

I’m trying to be methodical in my approach, when I am thinking about how to find a solution for this problem, but there just isn’t a product like that, in the market space, that actually does what I ACTUALLY want it to do (which is have the really fast single core performance, AND have like 64 PCIe 3.0 lanes at minimum.)

I can run the test on my Epyc server.

One moment.

1 Like

Well, what about Threadripper? 3960X for example. Clocks are pretty good, IPC is pretty much on par with AM4 Ryzen 3000, quad memory, lots of PCIe4…

1 Like

Yeah Threadripper makes so much more sense.

The Epyc 7F72, which is the closest Epyc chip to the 3960X (i think) costs $2500-2600 making the TR 3960X look like a good deal.