Calculate diminishing returns per socket

I understand that when it comes to multisocket systems the more sockets that are added the more diminished the returns are in terms of speed.

The next machine i’m building will be a quad socket system. Now i am already aware that quad socket systems are of course not 4x the speed of a single socket system running the same cpu. However I don’t know 100% the calculation used to determine the final speed of the system relative to it’s single socket counterpart.

I do however currently have a dual socket xeon system. I know that the second xeon in my current system add’s roughly 48% speed to the original single xeon setup. Working under the assumption that each doubling of the number of chips adds 48% speed to the previous number of chips , that would then mean that a quad socket system would add 125% speed to the system. Meaning the system is a little over 2x the speed it originally was with 4 cpu’s than 1.

So if the original machine was 100 with a single cpu , adding another cpu would be equal to 148. then adding another 2 cpu’s to 148 would be about 220.

I only own 1 quad socket system , however , it’s from the early 2000’s. when i tested it with cpu benchmarks it also showed this same trend of being just over twice the speed using all 4 sockets vs just a single socket.

So if anyone could shed some light on the subject or maybe confirm with their own quad socket setup that would be helpful. Normally it would be easy to look up benchmark info of various systems and cpu’s but as you can imagine not many people have a use for a quad socket system so benchmarks are hard to come by.

I was reading this paper the other day:

Latch-free Synchronization in Database Systems: Silver Bullet or Fool’s Gold?

There are some benchmarks done on a quad E7-8850 (10 core) in the paper. They are testing various locking methods, which stresses the CPU to CPU synchronization.

It really depends on the type of workload you are running, though.

2 Likes

Agree, this is very much workload dependent. In an RDBMS system like SQL server the Database Engine is NUMA aware and even has the capability to allow you to set a soft-NUMA configuration to make up for poorly configured VM’s etc.

The database engine can also set Max Degree of Parallelism, which allows a single query to execute only over so many threads e.g. 4 is a good default option, and then of course if you have hundreds of concurrent users their queries will use 4 threads all on the same NUMA node. The scaling in this example can be very good, and adding more CPU’s to scale a single server up can go along way before sharding or partitioning the database across multiple servers is required.

1 Like

If your workload is read heavy, like a reddit or a cloudflare, adding CPUs can really scale.

Edit: It seems a lot of people use the Yahoo! Cloud Serving Benchmark for database loads.

1 Like

Whilst this thread has so far taken a database slant (probably because quad socket systems will be hosting databases) I have used the TPC benchmark results in the past to compare Servers for database workloads.

http://www.tpc.org/tpce/default.asp

I think the TPC-E results are the ones of most use for OLTP style workloads. Even though the OP might not be interested in databases there might be some useful info in here.

1 Like