[Solved] AMD EPYC 7413 slow down to around 400mhz when running iperf

Hi level1,

we are trying to setup a 40gbit connection between two servers and get weird cpu behaviour when using iperf. It is also only using around 10Gbit/s of the possible 40.

Server specs:

AMD EPYC 7413
8x MultiBitECC 3200 MHz 16384 MB Memory
Supermicro H12SSL-CT
Intel XL710 40GBe
Ubuntu 20.04.3 LTS 5.4.0-84-gene

The Servers are connected directly to each other via fibre. No switches.

Example

host1# iperf -s
host2# iperf -c host1 -i 1 -t 120
[ ID] Interval Transfer Bandwidth
[ 3] 0.0- 1.0 sec 1.39 GBytes 12.0 Gbits/sec
[ 3] 1.0- 2.0 sec 1.00 GBytes 8.61 Gbits/sec
[ 3] 2.0- 3.0 sec 1.03 GBytes 8.88 Gbits/sec
[ 3] 3.0- 4.0 sec 1.04 GBytes 8.92 Gbits/sec
[ 3] 4.0- 5.0 sec 1021 MBytes 8.56 Gbits/sec
[ 3] 5.0- 6.0 sec 1.05 GBytes 9.01 Gbits/sec
[ 3] 6.0- 7.0 sec 1.02 GBytes 8.78 Gbits/sec
[ 3] 7.0- 8.0 sec 1.02 GBytes 8.74 Gbits/sec
[ 3] 8.0- 9.0 sec 1.01 GBytes 8.69 Gbits/sec
[ 3] 9.0-10.0 sec 1.02 GBytes 8.75 Gbits/sec
[ 3] 10.0-11.0 sec 1.05 GBytes 9.03 Gbits/sec
[ 3] 11.0-12.0 sec 1015 MBytes 8.51 Gbits/sec
[ 3] 12.0-13.0 sec 1.02 GBytes 8.72 Gbits/sec
[ 3] 13.0-14.0 sec 1014 MBytes 8.51 Gbits/sec
[ 3] 14.0-15.0 sec 974 MBytes 8.17 Gbits/sec
[ 3] 0.0-15.0 sec 15.6 GBytes 8.92 Gbits/sec

Any suggestion to why either iperf send the CPU sleeping or how I could improve single thread tcp transmission speed? Running multiple tcp threads utilizes the bandwidth better but is not our use case.

thank you

btw. I asked the questen on server fault too and was suggested to ask here.

EDIT:
here are some example readings

without iperf running

cpupower -c 0-47 monitor
| Mperf || Idle_Stats
CPU| C0 | Cx | Freq || POLL | C1 | C2
0| 6,16| 93,84| 2822|| 0,00| 3,61| 90,27
24| 0,01| 99,99| 1532|| 0,00| 0,00|100,00
1| 0,04| 99,96| 1982|| 0,00| 0,00| 99,97
25| 0,04| 99,96| 1502|| 0,00| 0,00| 99,96
2| 0,00|100,00| 1498|| 0,00| 0,00| 100,0
26| 0,00|100,00| 1127|| 0,00| 0,00|100,00
3| 0,00|100,00| 1614|| 0,00| 0,00| 100,0
27| 0,00|100,00| 1567|| 0,00| 0,00|100,00

with iperf running

| Mperf || Idle_Stats
CPU| C0 | Cx | Freq || POLL | C1 | C2
0| 2,88| 97,12| 413|| 0,00| 0,17| 96,82
24| 12,30| 87,70| 416|| 0,00| 0,49| 87,13
1| 0,42| 99,58| 412|| 0,00| 4,21| 95,39
25| 5,50| 94,50| 420|| 0,00| 0,38| 94,09
2| 0,08| 99,92| 407|| 0,00| 0,00| 99,92
26| 12,54| 87,46| 463|| 0,00| 0,65| 86,73
3| 0,02| 99,98| 408|| 0,00| 0,00| 99,98
27| 15,66| 84,34| 440|| 0,00| 0,00| 84,27

1 Like

cpu power service running and set to performance?

tuned-adm profile set?

Yes, I tried different cpu governor.

I also used taskset -c 0 iperf -s to use only a specific core. Then only that single core showed the behaviour.

I am going to try tuned-adm tomorrow.

1 Like

I just install tuned-adm and set the profile to “network-throughput”.
At first the speed improved to 20Gbit/s.
However changing back to “balanced” showed the same 20 Gbit/s. After a while they dropped back to 8 Gbit/s.
When changing back to network-throughput the speeds also been low.

The cpu again throttled to 400Mhz. Where does this throttling come from?

Double check thermals? Of both the nic and cpu?

What chassis is this? (Diy or supermicro) So the fans ramp at all?

1 Like

I have a bit trouble getting sensors running but I don’t think thermals are the issue here.

If some other workload utilizes the cpu the clocks increase and so do the gbit/s. The Fans are running. I can even hear them with from outside the server room. ^^

I check the chassis tomorrow but I think its custom, the seller does not specify any specific model.

Just for kicks I booted a Fedora 34 Workstation Live CD. Same Problem there.

Cat proc to confirm the CPU perf governor. It smells like the sleep/wake problem.

You can also change the perf bias in bios to performance and that may help a ton.

If you can boost the nic speed by running nqueens or something then it’s the sleep wake issue. What’s happening is the cpu is juuuust idle enough to go to sleep then wake up takes a bit.

Perf governor tweak will for sure fix it if that is indeed it.

It seems to work now. I am getting 20 to 28 Gbit/s with Iperf single thread tcp for 10 Minutes. The CPU does not clock down during the Test. (iperf -c $IP -i 1 -t 600)

I changed Global C-State control in Bios from auto to disbaled. I did this previously without it having any effect. I Could not find any other CPU Performance options.(Manual)

I am not convinced that this was underlying issue, since I already tried that, but it keeps working. ^^

Thank you

Still strange that the CPU only clocks down when iperf is started. CPU benches had no effect. I tried sysbench and hardinfo. Since it is going to be a production machine I don’t want to install a lot/komplex benchmarking software.

Here is an example of the cpu going down next to iperf output

Time Clock(Mhz)
14:01:06 1841.117
14:01:07 1832.699
14:01:08 1987.393 Iperf start Gbits/sec
14:01:09 1789.887 1 sec 972 MBytes 8.15
14:01:10 399.995 2 sec 909 MBytes 7.63
14:01:11 399.996 3 sec 903 MBytes 7.57
14:01:12 399.995 4 sec 894 MBytes 7.5
14:01:13 399.995 5 sec 920 MBytes 7.72
14:01:14 399.996 6 sec 898 MBytes 7.53
14:01:16 399.995 7 sec 911 MBytes 7.64
14:01:17 399.996 8 sec 914 MBytes 7.67
14:01:18 399.995 9 sec 916 MBytes 7.68
14:01:19 399.996 10 sec 908 MBytes 7.62
14:01:20 399.995 11 sec 907 MBytes 7.61
14:01:21 399.996 12 sec 897 MBytes 7.52
14:01:22 399.996 13 sec 909 MBytes 7.62
14:01:23 399.995 14 sec 917 MBytes 7.69
14:01:24 399.995 15 sec 900 MBytes 7.55
14:01:25 2209.523 iperf done

oh, the speed is not negatively impacted? if so then this is normal behavior?

@flaep should get 4 times this speed. He thinks he is only getting 7-8Gbits/sec because the CPU is clocking down to 400MHz.

1 Like

Ohh. Well 40 gbit is 4x10gb interfaces bonded. It might be that the switch port or possibly the nic is misconfigured

Need to run the tools with the nic to confirm it’s 1x40gbps link and not 4x10gbps link possibly. Or 1x10gbps link on a 40gb port.

I’ve seen all those problems before.

The tell is that it’s 10gbps whether the cpu is fast or slow.

Sorry there was a typo. I now get 20 to 28 Gbit/s with a single transmission.

@wendell The adapter is set to 1x40Gbit and there is no switch. The Servers are connected directly to each other.

The last bit in my previous post was just an example of the cpu clocking down once iperf is started.

It is still kind of weird.

If I pin the iperf server to a single core this core stays at 400 mhz
I use the following command taskset -c 0 iperf -s

Another issue is that i cant get the sensors to work.
sensor-detect only finds bnxt_en-pci-4601 and bnxt_en-pci-4600

The board has an IPMI processor, you should get thermals from there, install ipmi-sensors, or look at temps from the management GUI …

root@pve:~# ipmi-sensors
ID | Name           | Type              | Reading    | Units | Event
1  | 3VSB           | Voltage           | 3.36       | V     | 'OK'
2  | 5VSB           | Voltage           | 5.01       | V     | 'OK'
3  | VCPU           | Voltage           | 1.15       | V     | 'OK'
4  | VSOC           | Voltage           | 0.86       | V     | 'OK'
5  | VCCM ABC       | Voltage           | 1.21       | V     | 'OK'
6  | VCCM EFG       | Voltage           | 1.20       | V     | 'OK'
7  | VPPM ABC       | Voltage           | 2.56       | V     | 'OK'
8  | VPPM EFG       | Voltage           | 2.56       | V     | 'OK'
9  | LAN_1.88V      | Voltage           | 1.80       | V     | 'OK'
10 | 1.8VSB         | Voltage           | 1.84       | V     | 'OK'
11 | 1.8V           | Voltage           | 1.83       | V     | 'OK'
12 | BAT            | Voltage           | 3.14       | V     | 'OK'
13 | 3V             | Voltage           | 3.36       | V     | 'OK'
14 | 5V             | Voltage           | 5.10       | V     | 'OK'
15 | 12V            | Voltage           | 11.90      | V     | 'OK'
16 | LAN_1.0V       | Voltage           | 0.99       | V     | 'OK'
17 | PSU1 VIN       | Voltage           | N/A        | V     | N/A
18 | PSU2 VIN       | Voltage           | N/A        | V     | N/A
19 | PSU1 IOUT      | Current           | N/A        | A     | N/A
20 | PSU2 IOUT      | Current           | N/A        | A     | N/A
21 | MB_A Temp      | Temperature       | 28.00      | C     | 'OK'
22 | MB_B Temp      | Temperature       | 31.00      | C     | 'OK'
23 | Card Side Temp | Temperature       | 37.00      | C     | 'OK'
24 | X710 Temp      | Temperature       | 38.00      | C     | 'OK'
25 | CPU Temp       | Temperature       | 46.00      | C     | 'OK'
26 | CPU Power      | Power Supply      | 57.00      | W     | 'OK'
27 | TR1 Temp       | Temperature       | N/A        | C     | N/A
28 | TEMP M_2_1     | Temperature       | N/A        | C     | N/A
29 | TEMP M_2_2     | Temperature       | N/A        | C     | N/A
30 | DDR4_A Temp    | Temperature       | 38.00      | C     | 'OK'
31 | DDR4_C Temp    | Temperature       | 38.00      | C     | 'OK'
32 | DDR4_D Temp    | Temperature       | 37.00      | C     | 'OK'
33 | DDR4_E Temp    | Temperature       | 42.00      | C     | 'OK'
34 | DDR4_G Temp    | Temperature       | 42.00      | C     | 'OK'
35 | DDR4_H Temp    | Temperature       | 40.00      | C     | 'OK'
36 | PSU1 Temp      | Temperature       | N/A        | C     | N/A
37 | PSU2 Temp      | Temperature       | N/A        | C     | N/A
38 | FAN1           | Fan               | 1000.00    | RPM   | 'OK'
39 | FAN2           | Fan               | 1100.00    | RPM   | 'OK'
40 | FAN3           | Fan               | 500.00     | RPM   | 'OK'
41 | FAN4           | Fan               | 500.00     | RPM   | 'OK'
42 | FAN5           | Fan               | 700.00     | RPM   | 'OK'
43 | FAN6           | Fan               | 700.00     | RPM   | 'OK'
44 | PSU1 PIN       | Power Supply      | N/A        | W     | N/A
45 | PSU2 PIN       | Power Supply      | N/A        | W     | N/A
46 | PSU1 POUT      | Power Supply      | N/A        | W     | N/A
47 | PSU2 POUT      | Power Supply      | N/A        | W     | N/A
48 | ChassisIntr    | Physical Security | N/A        | N/A   | 'OK'
49 | CPU_PROCHOT    | Processor         | N/A        | N/A   | 'OK'
50 | CPU_THERMTRIP  | Processor         | N/A        | N/A   | 'OK'
51 | PSU1 Status    | Power Supply      | N/A        | N/A   | 'OK'
52 | PSU1 AC lost   | Power Supply      | N/A        | N/A   | N/A
53 | PSU2 Status    | Power Supply      | N/A        | N/A   | 'OK'
54 | PSU2 AC lost   | Power Supply      | N/A        | N/A   | N/A
1 Like

@MadMatt so it is expected to not get readings from epyc processors?

The sensors are not reporting data to the motherboard directly (and to your booted os), but they are instead connected to the BMC board that provides IPMI services (the one that lets you remote connect and perform operations remotely) …
https://www.supermicro.com/en/solutions/management-software/bmc-resources

1 Like

thank you MadMatt.

Impi/BMC is already setup.

I never used one of these boards. I was just expecting the usual sensors like on other boards.

Ok, so to recap what @wendell said, can you check the temps at the time you have the slowdowns? The fans spinning aren’t really a good indicator of whether your proc is thermal throtthling or not …

ok, up to 28 gigabit with a single thread? Try -P 4 to run with 4 threads at onnce. 28 gigabit on just one thread is not awful?

@MadMatt the temperatures in ipmi are around 35°C when running iperf, mostly less. So thermal throttling is not the issue.

@wendell When using multiple threads we can reach up to 38Gbit/s. But i think this is due to using multiple slow cores. Should we be able to get that speed with a single tcp thread?

In our use case we only have few applications that are mostly single threaded.