Enabling jumbo frames causes system to crash

I have two computers both running Ubuntu server 16.04 which are connected via chelsio 10gb NICs. If I set the MTU for the NICs to 9000 one of the servers will crash. It happens after a fee hours or even a couple of days.

I'm really not sure what the issue is or why it's only one of the systems which does it. I'm thinking if reinstalling Ubuntu on the one which is crashing but thought I'd ask here in case anyone had any ideas on what I could look at to try and fix it.

Did you check the drivers? Using 9000 MTU makes crash one of the two servers "randomly" or you're seeing a pattern? They have the same hardware configuration in general or they're different? Direct connection or going through a router? Cables are all in good shape?

I'm using whatever drivers are in the kernel, but it happens with other cards not just the one I'm currently using, so I don't think it's the driver or the hardware. It's pretty random, usually happens after a few hours, sometimes during heavy load other times when there's no load at all. The servers are connected to each other directly and the cable is fine. I don't seem to have any problems at all if i leave the MTU at 1500 but I would like to use jumbo frames for the added performance.

Do you actually get better performance with jumbo frames? It's been pretty hit-or-miss in my experience. What are you using to test the performance?

I might be wrong but that looks to me like a problem with the NICs. How are you enabling 9000 MTUs? Looking at the product pages for some adapters none of them mention support for jumbo frames. Maybe the cards can keep up for a bit but after all the buffers get saturated, the cards overwhelmed and everything crashes.

have you tried a slightly higher setting like 4000 ? Also check /var/log/syslog for network logs, it might show something ( it might not )

I'm running iscsi over it and with 4 disks being accessed I get about 4gb/s with 9000 MTU vs about 2.5 with 1500. I also see lower CPU overhead.

They both support up to 9600 MTU (which I've tried) I have tested lower MTUs but not on this card but I have with the other cards I've used and I get the same issues. I think the problem is with the computer and not the nic as it has happened with other cards.

I'm enabling jumbo frames using ifconfig or by adding the setting to /etc/network/interfaces

I've tried that on a different card but get the same result. I've had a look at the logs but don't see anything, next time it crashes I'll try to take a closer look.

That's a different story than. You're getting crashes even with other cards. I'm starting to think that your machine might be underpowered and not be able to handle high MTUs. Sounds reasonable to you?

I don't think that's it, even if it was it shouldn't cause a crash. It crashes irrespective of load which makes me think it's a configuration error or a problem with the OS. I might try a reinstall tomorrow.

Before doing it swapping PCI slot (if you have some more)? The machines are rock solid without the NICs inside?

Only one slot. The other machine is fine and this one is fine unless I set the MTU to anything other than 1500.

what kernal are you running. uname -r in terminal

4.4, Ubuntu server 16.04

Maybe related, perhaps not strictly related but you never know.

https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1528466

4.4.0-28.47 fixes this.

I don't think that's the problem, and it doesn't affect the other machine which is running the same version of Ubuntu.

How have your updates been applied on each server ?

By using apt. It's up to date if that's what you're asking.