Benchmarks for Intel Xeon e5-4600 v2 CPUs

drbsg · December 18, 2022, 12:30pm

I am currently considering my options for a low cost machine that I can put a lot of memory in for electromagnetic simulations. One of the options is a used Dell PowerEdge R820, which has 48 DIMM slots for cheap DDR3 RAM. For about £1500 I could have a system with ~1TB RAM and 48 cores.

However, how good are those cores? I have had a look on the benchmarking sites, and I haven’t found any 4 CPU results. It seems that the e5-4600 CPUs are quite rare. Does anyone here know where I can find some benchmarks for them, or perhaps is able and willing to run the Phoronix test suite and post the results?

GigaBusterEXE · December 18, 2022, 9:06pm

The 4 just refers to how many can be used at once
For example

1603v2 can only be used in single socket config
2603v2 can be used in dual socket config
4603v2 can be used in quad socket config

They’re all ivybridge under the hood and will perform similarly at the same frequency and core counts, and L3 cache

But be aware it’s not as simple as a 48 core system with 1TB of ram

All that has to talk to each other through the qpi
And will be very slow if it’s being used for one task

rrubberr · December 18, 2022, 9:25pm

These DDR3-era systems are really slow by today’s standards. For example, I’ve recently put together a dual 16 core Broadwell system that is 50% faster than quad CPU V2 machines in the same benchmark.

In terms of power, heat, and performance, something slightly newer like Broadwell is a better choice. Another commenter mentioned inter-socket bandwidth as well, and that will clearly be a bigger issue with four CPUs than with two. QPI link speed was also raised with Broadwell.

Plenty of Broadwell SKUs support >1TB per socket, and DDR4 speeds below 2666 are pretty cost effective on eBay these days.

Ingolf2k · December 18, 2022, 9:40pm

What kind of solver are you using? Is it factorization with a direct solver or an iterative approah like GMRES etc.? Usually iterative solvers can benefit from the many albeit older cores combined with a good preconditioner. I have a dual e5-2690 setup at home, and it is quite a bit slower compared to the newer amd and intel stuff (I have a threadripper 5995wx at work, mainly solving the wave equation for very large problems), but not useless…

twin_savage · December 18, 2022, 9:49pm

What are you running for your EM sims? I have some experience with ansys and comsol hardware choices, I use to run a 4 socket Xeon E5-4657L v2 system for EM sim.

Most of the benchmarks, including the phoronix test suite, will be highly misleading for large sparse matrix workloads. Basically what it’ll come down to is memory bandwidth (both single threaded and multithreaded) which sadly is not increasing at the same rate as “raw” cpu core performance.

Getting cpus with 3 QPI links (e7-4800 series) would help because it would greatly increase (50%) the socket to socket memory bandwidth; that’d be a dell r920.

In the end I went back to running many dual socket systems instead of quad socket systems because I was only seeing a ~20% speed increase in the quad E5-4657L v2 system compared to the dual E5-2650 v2 systems while running a ~200GB memory solver (I think it was ~50 million degrees of freedom full EM (magnetic and electric fields) problem).

twin_savage · December 18, 2022, 10:31pm

I’ve noticed when I get into the multiple hundreds of gigabytes working memory in an iterative solver, if I switch to a direct solver I will get slower solution convergence because of the sheer amount of memory and memory bandwidth the direct solver uses.
It’s like the iterative solver’s smaller memory footprint can go faster by “shifting” some of the workload to CPU compute instead of relying on pure memory bandwidth so much.

Speaking of good preconditioner (not totally the preconditioner’s fault, there is some weird turbulence going on); look at this, I have been waiting over a week for this stupid thing to solve with that huge scare in the middle there:

drbsg · December 19, 2022, 6:03am

Thank you for posting those benchmarks. It is clear that you lose a lot of performance going from 2 to 4 sockets. The rule of thumb I have been using, based on the benchmarks I had seen so far, was that a dual socket Xeon v2 wasn’t a lot slower than a similar core count dual socket Xeon v3.

I agree, but this is a personal hobby project and I don’t have much of a budget. The factor of two difference in price for 16GB DIMMs between DDR3 and DDR4 matters. My current system is a dual Xeon e5-2697 v3 and with 64GB DIMMs I can get 1TB, but even today that costs nearly £3000.

einfachfaust · December 19, 2022, 6:20am

So, weird Idea, but what about a small Blade System? Maybe its better having 4 Nodes with each. 256GB or 512GB then to have one which will be a lot smaller.

I have an ASUS RS724Q in stock with me, it takes 16x DDR3 (32GB max)

Price wise, the 32GB Modules cost around 27€ ea, totaling to 1728€ + the Node for 120€ and maybe some better CPUs (E5-2670v0 is currently installed).

For connecting you could use the integrated Mellanox Connect X3 Infiniband 40Gbps Port.

Just an Idea, or you want to consider something like Dell R620. But with splitting you have the same or more Memory and CPU while having higher bandwith (and power draw tbh)

Given your simulations can be split to multiple nodes.

drbsg · December 19, 2022, 6:25am

I am running the MEEP FDTD solver, trying to scale up from a 2D to a 3D problem. I am also experimenting with running GRChombo for general relativity simulations.

I suspected that this might be the case - I have been using the OpenFOAM benchmarks as a proxy, as CFD has similar issues with memory bandwidth.

This is an option that I am also considering - both pieces of software support OpenMPI. A cluster of used R620/R630s is something I could get my hands on easily, and running a cluster has a certain appeal, but it raises a number of non-technical issues.

drbsg · December 19, 2022, 6:45am

That looks like an interesting machine - I had a look at some of the large Dell blade servers, but I wasn’t familiar with this. The integrated Infiniband looks useful.

twin_savage · December 19, 2022, 8:18am

That is interesting, I’ve never actually done time domain with EM; I always left that up to the wave optics guys, but I suppose it pretty important for GR.
Looks like MEEP uses BiCGSTAB for its solver, which is actually one of the most memory efficient iterative solvers if not a little unstable compared to GMRES (which is a memory hog in the realm of iterative solvers comparatively).

In my very limited reading/understanding of MEEP, it uses the FDTD method as opposed to the Finite Element Method in most other simulation software which really lends itself to solving time domain problems, but restricts itself to using more uniform grid meshes that aren’t efficiently applied to irregular geometry as something like a tetrahedral mesh a FEM program would use.

OpenFOAM is an okay proxy, although most of the benchmarks for it online are kind of small problems compared to what we’re talking about so we don’t see the memory subsystem taxed as much.

The following blocks are probably closer benchmark approximations; one of a dual socket e5-2650 v2 (16 total cores @ 3.0GHz) with 256GB of 1866MHz RAM and one of a dual socket e5-2697A v4 (32 total cores @ 3.1GHz) with 256GB of 2400MHz RAM. both using an FGMRES solver, the solve times are the fifth line from the bottom. there’s a 30% reduction in solve time by going from Ivy bridge to Broadwell and I’m pretty sure the majority of that improvement comes from the increase in RAM speed rather than the doubling of cores.

<---- Compile Equations: EM 1 in Study 1/Solution 1 (sol1) ---------------------
Started at Oct 29, 2022, 12:38:41 PM.
Geometry shape function: Linear Lagrange
Running on 2 x Intel(R) Xeon(R) CPU E5-2697A v4 at 2.60 GHz.
Using 2 sockets with 32 cores in total on WIN256-PC.
Available memory: 262.04 GB.
Time: 108 s. (1 minute, 48 seconds)
Physical memory: 7.61 GB
Virtual memory: 7.94 GB
Ended at Oct 29, 2022, 12:40:29 PM.
----- Compile Equations: EM 1 in Study 1/Solution 1 (sol1) -------------------->


<---- Stationary Solver 5 in Study 1/Solution 1 (sol1) -------------------------
Started at Oct 29, 2022, 7:32:38 PM.
Linear solver
Number of degrees of freedom solved for: 43351744.
Nonsymmetric matrix found.
Scales for dependent variables:
Magnetic vector potential (comp1.A): 3.9
Electric potential (comp1.V): 0.95
Terminal voltage (comp1.mef.mi1.term1.V0_ode): 12
Orthonormal null-space function used.
Iter      SolEst     Damping    Stepsize #Res #Jac #Sol LinIt   LinErr   LinRes
   1       0.093   1.0000000       0.093    1    1    1   203  0.00095  9.5e-08
Solution time: 8490 s. (2 hours, 21 minutes, 30 seconds)
Physical memory: 166.43 GB
Virtual memory: 168.9 GB
Ended at Oct 29, 2022, 9:54:08 PM.
----- Stationary Solver 5 in Study 1/Solution 1 (sol1) ------------------------>

<---- Compile Equations: EM 1 in Study 1/Solution 1 (sol1) ---------------------
Started at Nov 7, 2022, 5:13:12 PM.
Geometry shape function: Linear Lagrange
Running on 2 x Intel(R) Xeon(R) CPU E5-2650 v2 at 2.60 GHz.
Using 2 sockets with 16 cores in total on DESKTOP-TEH272H.
Available memory: 262.08 GB.
Time: 111 s. (1 minute, 51 seconds)
Physical memory: 7.6 GB
Virtual memory: 7.92 GB
Ended at Nov 7, 2022, 5:15:03 PM.
----- Compile Equations: EM 1 in Study 1/Solution 1 (sol1) -------------------->


<---- Stationary Solver 5 in Study 1/Solution 1 (sol1) -------------------------
Started at Nov 8, 2022, 1:57:03 AM.
Linear solver
Number of degrees of freedom solved for: 43351744.
Nonsymmetric matrix found.
Scales for dependent variables:
Magnetic vector potential (comp1.A): 3.8
Electric potential (comp1.V): 0.91
Terminal voltage (comp1.mef.mi1.term1.V0_ode): 11
Orthonormal null-space function used.
Iter      SolEst     Damping    Stepsize #Res #Jac #Sol LinIt   LinErr   LinRes
   1       0.062   1.0000000       0.062    1    1    1   208  0.00093  9.3e-08
Solution time: 11965 s. (3 hours, 19 minutes, 25 seconds)
Physical memory: 162.77 GB
Virtual memory: 163.75 GB
Ended at Nov 8, 2022, 5:16:28 AM.
----- Stationary Solver 5 in Study 1/Solution 1 (sol1) ------------------------>

And here’s what VTune says is going on during simulation (this is actually from a Skylake system but it’s largely the same as the previous generations):

system · September 19, 2023, 2:19am

This topic was automatically closed 273 days after the last reply. New replies are no longer allowed.