Lost with 10Gb tunning of PopOS 18.04 LTS using x470 Taichi Ultimate 10 GB Aquantia, 10gb RJ45 X550-T2, NETGEAR 10-PORT GS110MX-100NAS and D-Link DGL-4500, Help

Hi all,

I am trying to upgrade to 10 Gb network but am getting completely lost. While an iperf3 test shows that I should be able to achieve up to 10Gb (e.g., https://fasterdata.es.net/performance-testing/network-troubleshooting-tools/iperf/disk-testing-using-iperf/), I am not able to go beyond 1Gb when copying a 7Gb Centos image. I setup a pair of machines running PopOS 18.04 and I am copying a CentOS image file via samba. One host has an nvme and the other has an SSD. One host has 64 GB ESS RAM and the other has 32 GB ECC RAM. The router is very old and probably needs to be replaced so I am open to suggestions for that.

Specifically, I am getting stuck when attempting to tune for 10Gb because I am just getting lost when going through the following sources (https://kb.netapp.com/app/answers/answer_view/a_id/1030783/~/how-to-configure-the-tcp-window-size-on-my-linux-%2F-unix-network-file-system, https://netbeez.net/blog/tcp-window-size/, https://www.kernel.org/doc/ols/2009/ols2009-pages-169-184.pdf, https://superuser.com/questions/992919/how-to-check-the-tcp-congestion-control-algorithm-flavour-in-ubuntu, https://gist.github.com/jedi4ever/903751). I sort of understand that I have to adjust the kernel parameters and I should probably bind samba to the 10GB interface. I just need a point in the right directions.

The equipment setup is as follows:

  1. ROUTER - D-Link DGL-4500 Extreme-N Selectable Dual-Band Gaming Router (https://www.amazon.com/gp/product/B000Z7AKGC/ref=ppx_yo_dt_b_search_asin_title?ie=UTF8&psc=1)
  2. UNMANAGED SWITCH - NETGEAR 10-PORT GS110MX-100NAS with 2 10Gb RJ45 Ports (https://www.amazon.com/gp/product/B076642YPN/ref=ppx_yo_dt_b_search_asin_title?ie=UTF8&psc=1)
  3. HOST 1 - x470 Ultimate, 10Gb Aquantia
  4. HOST 2 - ThinkServer TS140 with 10GB X550-T2 (https://www.amazon.com/gp/product/B075Q9Q6QZ/ref=ppx_yo_dt_b_search_asin_title?ie=UTF8&th=1)

Finally, I have developed the following questions

  1. Do I need to configure samba so it is bound to the 10Gb interfaces even though I defined a route from HOST1 to HOST2 and vice-versa and specified that traffic should go via the 10Gb interface
  2. Did I need to define the above routes in the first place
  3. Given that I am using a 10 Gb un-managed switch and a router that is only 1Gb capable, is there a way for me to make sure that the traffic never really reaches the router and just goes through the switch? Or is this just part of TCP? This is really, really where I am confused.

Not sure what else to include here in as far as configuration so sorry about that. Any help would be great.

Hi. Your setup is scarily similar to mine and I dont have any issues hitting multi-hundred MBps on file copies, so there is no logical reason why your setup shouldn’t work.

Couple of things to check - CAT6 cables - have you tried swapping these out?
Smaller files - does it have this behaviour with a smaller file?

Hi @Airstripone, I am using CAT6A and CAT7 cable. Don’t have any extra ones though. Which cables are you using? Have not tried smaller files. How small are we talking about? Did you end up binding samba to the 10gb interface?

My runs are Cat6 to the two 10Gig ports. It could be you have a faulty cable and it is falling back to a lower input speed when copying large files, hence the suggestion to test another cable.

Unless I have misunderstood your setup I believe you only have one active connection to each pc? If so there should be no reason to define special routes, the packet switching should simply route the data across the 10Gbps connections.

Could you run a network interface monitoring app (like gnome system monitor) whilst you copy the files (multiple, say 20GB mixed load of large and small) and take a screenshot of the load characteristics. Or run one in terminal and graph afterwards? If the load is not stable, it will likely be a cable issue or your NIC is faulty.

Finally have you tried a different data protocol, say FTP?

What about the rest of the connections? Are they also Cat 6? I ordered some cables (https://www.amazon.com/gp/product/B07Y5CSTCX?pf_rd_p=ab873d20-a0ca-439b-ac45-cd78f07a84d8&pf_rd_r=XGDY382ZRWJ0WZ3ZH1XY&th=1) so will try that as well.

Correct. The 10gb are connect by Cat 7 to the switch and the switch is connected via Cat 6 to the router.

Have not tried yet. Will try today if I get a chance.

As long as what is connected to the switch are all on the same subnet, it shouldn’t hit the router

For tuning. I’d just load your appropriate drivers, set your interface IPs, and enable Jumbo frames for a start and then do tuning from there. The 10g network and 1g network should be on different subnets and the 10g switch doesn’t really need to be connected to the internet, just to prevent access of the file share from reaching the internet. Something like this (sorry for transparency)

Untitled%20Diagram

1 Like

Hi @2bitmarksman,

thanks,

I kind of have that setup just have to connect the machines directly to the 1g router using another NIC. I will do that.

I did set jumbo frame, or a least I think I did. I set MTU to 9000.

Also, what can I do about getting drivers for the 550-T2 (https://www.amazon.com/gp/product/B075Q9Q6QZ/ref=ppx_yo_dt_b_search_asin_title?ie=UTF8&th=1) as it is not genuine intel. Am I SOL here? Though everything was fine since basic iperf3 test was showing be approx 9 Gb.

Found this driver - (https://downloadcenter.intel.com/download/22283/Intel-Ethernet-Adapter-Complete-Driver-Pack?product=88209) will try to get it when back home.

You shouldn’t need to do anything special, it should ‘just work’

My gut feel is one of your cables is damaged. I had that with a run that worked fine at first but after a day or so built up some weird impedance and dropped to 1Gig or even 100Mb if it was feeling cruel.

Given you have ordered new cables try one of those. Make sure there are no coils in your wires and do a mixed load test… It may even be the file you are copying as test is funky.

Note even with “max speed” I only get 350MBps sustained from my thinkserver to my prod machine due to disk speed limits… Don’t expect 10Gig in real world without NVMe end to end. But your problem sounds more like protocol falling back to 1Gig. Annoying when you have spent $$$ on 10Gig kit I know!

@Airstripone,

I can’t wait for the cable to arrive now. I will definitely go through this one more time. So, iperf3 is not sufficient for diagnostics? I am looking at other tools https://www.tecmint.com/linux-network-bandwidth-monitoring-tools/ but they seem like the same thing or overkill with full fledged monitoring. Will have to check for the gnome app on the PopOS system as I do not see it on this Fedora 30 Xfce which I tend to run in lieu of gnome.

Yeah I wouldn’t over engineer it. I just use gnome system monitor because it is in the repo and I know it works. You can try conky as well but it never liked my threadripper system for some reason.

Basically you are looking for a visualisation of your data transfer. If it is jumpy, like a saw tooth, then you have something causing either packet drops or a bottleneck (maybe the CPU on the server or a misconfig of jumbo frames). If it is stable but capped the issue is likely the network. If it starts high then drops to a sustained low rate then it is the disks.

Unfortunately the only way to be sure is to test various combinations. One thing to check is that you are using SMBv3 and it isn’t falling back to legacy signalling. This can cause drama.

I hope the new cables work!

1 Like

It’d be nice if you could check iperf from the netgear switch, but I don’t think it comes with such a module :T

:frowning: it is unmanaged. I wanted easier lol. Learning a lesson here.

If it is a recent purchase consider returning it if you still can. The EMX version is only a few $ more and has a handy console. Again shouldn’t impact the actual switching, but may give you more diagnostics.

I apologize if this is too rudimentary, but I’ve found it solves many problems. The difference between Gb and GB (bits and bytes) is a factor of 8. Your network card is 10Gb (1.2GB), but many programs like FTP and SMB report speed in bytes. if you are able to achieve 1GBps, that is very close to the max speed, I would be happy with this result.

If you are achieving 9+Gbps with iperf(3), but SMB is falling short, I would say your network card and driver are fine, it’s likely other systems, like hard drive access, etc, that are limiting your performance.

Not recent unfortunately:(

Just replaced all cables.

For some reason, iperf3 gets me close

u0@LS0000:~/Downloads/The_Lord_of_the_Rings_The_Return_of_the_King_2003_Extended$ iperf3 -c 21.83.0.235 -B 21.83.0.236 -F The_Lord_of_the_Rings_The_Return_of_the_King_2003_Extended_1080p.MKV 

Connecting to host 21.83.0.235, port 5201
[ 4] local 21.83.0.236 port 34047 connected to 21.83.0.235 port 5201
[ ID] Interval Transfer Bandwidth Retr Cwnd
[ 4] 0.00-1.00 sec 1.16 GBytes 9.93 Gbits/sec 12 1.46 MBytes
[ 4] 1.00-2.00 sec 1.15 GBytes 9.91 Gbits/sec 0 1.46 MBytes
[ 4] 2.00-3.00 sec 1.15 GBytes 9.90 Gbits/sec 0 1.48 MBytes
[ 4] 3.00-4.00 sec 1.15 GBytes 9.90 Gbits/sec 0 1.48 MBytes
[ 4] 4.00-5.00 sec 1.15 GBytes 9.90 Gbits/sec 0 1.49 MBytes
[ 4] 5.00-6.00 sec 1.15 GBytes 9.90 Gbits/sec 2 1.49 MBytes
[ 4] 6.00-7.00 sec 1.15 GBytes 9.90 Gbits/sec 0 1.49 MBytes
[ 4] 7.00-8.00 sec 1.15 GBytes 9.91 Gbits/sec 0 1.53 MBytes
[ 4] 8.00-9.00 sec 1.15 GBytes 9.90 Gbits/sec 0 1.53 MBytes
[ 4] 9.00-10.00 sec 1.15 GBytes 9.90 Gbits/sec 0 1.53 MBytes


[ ID] Interval Transfer Bandwidth Retr
[ 4] 0.00-10.00 sec 11.5 GBytes 9.90 Gbits/sec 14 sender
Sent 11.5 GByte / 36.4 GByte (31%) of The_Lord_of_the_Rings_The_Return_of_the_King_2003_Extended_1080p.MKV
[ 4] 0.00-10.00 sec 11.5 GBytes 9.90 Gbits/sec receiver

iperf Done.

but copying the file starts at around 1.2GiB and drops to 120 and sometimes I see a dips below that. But generally it stays around 120 MiB. This is going from a Samsung 970 Pro NVMe to an Intel or Samsung SSD.

Is this not automatically negotiated? (https://www.samba.org/samba/docs/current/man-html/smb.conf.5.html#SERVERMAXPROTOCOL stating “Normally this option should not be set as the automatic negotiation phase in the SMB protocol takes care of choosing the appropriate protocol.”)

I see the following for version info but not sure if this is helpful at all

sudo smbstatus -v

using configfile = /etc/samba/smb.conf
Samba version 4.7.6-Ubuntu
Protocol NT1

Using scp to copy files does not make much of a difference in that i still get aroun 120MiB


I don’t get it. Why can iperf do it?

Thanks. I also thought that I was reading something wrong but I checked again.

iperf3 tests produce 1.16 GBytes 9.93 Gbits/sec.
Copying the file starts at 1.2GiB and drops to 120MiB.

No?

I also took a benchmark of the SSD and NVME. May be it is not realistic to expect higher speeds with an avg. ssd write of 320 MB/s?

SSD (approx. 320 MB/s)

NVME (approx. 3.5GB/s)

How large is the file. If the file is large enough, it’ll fill up the target SSD’s dram buffer/caches and be stuck waiting until there’s cells free to dump to the cells. I’ve noticed this when copying files over 4GB generally

Can you create a ramdisk? That behaviour sounds like the disk controller is throttling, maybe too full to use the non-tlc cache. Does it do the same going the other way?

Really dumb question, where are the SSDs plugged in? They aren’t on Sata1 ports? That would cap out at 150MBps, approx 130MiB after overhead losses.

And yes SMB should autonegotiate but Ubuntu is … Well Ubuntu. Try a proper distro. (Triggers many people :slight_smile:)

It’s not actually hitting the controller that hard, it’s just emulated load.

Try the ramdisk option and we can eliminate the network. Hopefully the cables weren’t too expensive