FreeNAS Troubleshooting

Fuck me.

Did some stupid bullshit with creating bridges on my FreeNAS box and took my whole network down due to a loop.

Long story short I had to nuke my entire config and wipe the database. Thank god I had taken a full config backup earlier today when I upgraded to 11.3-U1 but now my VM’s are still fucked up but at least I didn’t loose any of my data.

God damn scared for a minute there I was not expecting to do a full disaster recovery tonight.

2 Likes

@freqlabs off the top of your head would you know why all VM’s on FreeNAS fail to connect to the network after an upgrade of FreeNAS 11.3 -> 11.3-U1?

All their link states are up but there is no connection to the outside.

I have them all routed through a failover lagg0. Worked fine before the upgrade but after which they stopped working. See above for how my night went lol.

Show ifconfig

I stepped away from it tonight. I’ll post it tomorrow.

But I access the web gui from the same ip on lagg0 and I have it set up as static so I can confirm its working.

I suspect it’s something underneath since my vms don’t share the hosts ip they have their own.

I had recent network problems in VMs with pfsense related to tcp checksum offload (i think) and other network adapter hardware acceleration options (inside the guest).

Worth checking to see if that sort of thing is turned on, and turn it off.

The benefit just isn’t there any more anyway.

Yep and LRO.

They are actually pretty beneficial features but the bridge interface is just not compatible with them. It should disable them automatically BUT some things can prevent that, like vlans for example. The vlan interface prevents the bridge from disabling the capabilities on the physical nic. This is a “security feature” I haven’t got around to fixing yet.

Oh I’m talking about the host.

I don’t think that sort of thing is turned on by default in Centos 8.1?

I’ll dig into it more tomorrow. Thanks .

I will have to read up on this

Ah yeah that makes sense. Still a weird thing to look at but I can see its use.

As requested.

root@freenas[~]# ifconfig
igb0: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metric 0 mtu 1500
        description: member of lagg0
        options=2400b9<RXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,VLAN_HWTSO,RXCSUM_IPV6>
        ether 00:25:90:86:8e:f8
        hwaddr 00:25:90:86:8e:f8
        nd6 options=9<PERFORMNUD,IFDISABLED>
        media: Ethernet autoselect (1000baseT <full-duplex>)
        status: active
igb1: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metric 0 mtu 1500
        description: member of lagg0
        options=2400b9<RXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,VLAN_HWTSO,RXCSUM_IPV6>
        ether 00:25:90:86:8e:f8
        hwaddr 00:25:90:86:8e:f9
        nd6 options=9<PERFORMNUD,IFDISABLED>
        media: Ethernet autoselect (1000baseT <full-duplex>)
        status: active
igb2: flags=8c02<BROADCAST,OACTIVE,SIMPLEX,MULTICAST> metric 0 mtu 1500
        options=6403bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,TSO6,VLAN_HWTSO,RXCSUM_IPV6,TXCSUM_IPV6>
        ether 00:25:90:86:8e:fa
        hwaddr 00:25:90:86:8e:fa
        nd6 options=1<PERFORMNUD>
        media: Ethernet autoselect (1000baseT <full-duplex>)
        status: active
igb3: flags=8c02<BROADCAST,OACTIVE,SIMPLEX,MULTICAST> metric 0 mtu 1500
        options=6403bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,TSO6,VLAN_HWTSO,RXCSUM_IPV6,TXCSUM_IPV6>
        ether 00:25:90:86:8e:fb
        hwaddr 00:25:90:86:8e:fb
        nd6 options=1<PERFORMNUD>
        media: Ethernet autoselect (1000baseT <full-duplex>)
        status: active
lo0: flags=8049<UP,LOOPBACK,RUNNING,MULTICAST> metric 0 mtu 16384
        options=680003<RXCSUM,TXCSUM,LINKSTATE,RXCSUM_IPV6,TXCSUM_IPV6>
        inet6 ::1 prefixlen 128
        inet6 fe80::1%lo0 prefixlen 64 scopeid 0x5
        inet 127.0.0.1 netmask 0xff000000
        nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
        groups: lo
lagg0: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metric 0 mtu 1500
        description: lagg0
        options=2400b9<RXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,VLAN_HWTSO,RXCSUM_IPV6>
        ether 00:25:90:86:8e:f8
        inet 192.168.1.10 netmask 0xffffff00 broadcast 192.168.1.255
        nd6 options=9<PERFORMNUD,IFDISABLED>
        media: Ethernet autoselect
        status: active
        groups: lagg
        laggproto failover lagghash l2,l3,l4
        laggport: igb0 flags=5<MASTER,ACTIVE>
        laggport: igb1 flags=0<>
tap0: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metric 0 mtu 1500
        description: Attached to Base_clone2
        options=80000<LINKSTATE>
        ether 00:bd:76:57:f8:00
        hwaddr 00:bd:76:57:f8:00
        nd6 options=1<PERFORMNUD>
        media: Ethernet autoselect
        status: active
        groups: tap
        Opened by PID 1982
bridge0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
        ether 02:84:f1:56:a9:00
        nd6 options=1<PERFORMNUD>
        groups: bridge
        id 00:00:00:00:00:00 priority 32768 hellotime 2 fwddelay 15
        maxage 20 holdcnt 6 proto rstp maxaddr 2000 timeout 1200
        root id 00:00:00:00:00:00 priority 32768 ifcost 0 port 0
        member: vnet0.2 flags=143<LEARNING,DISCOVER,AUTOEDGE,AUTOPTP>
                ifmaxaddr 0 port 13 priority 128 path cost 2000
        member: tap3 flags=143<LEARNING,DISCOVER,AUTOEDGE,AUTOPTP>
                ifmaxaddr 0 port 12 priority 128 path cost 2000000
        member: tap2 flags=143<LEARNING,DISCOVER,AUTOEDGE,AUTOPTP>
                ifmaxaddr 0 port 11 priority 128 path cost 2000000
        member: vnet0.1 flags=143<LEARNING,DISCOVER,AUTOEDGE,AUTOPTP>
                ifmaxaddr 0 port 10 priority 128 path cost 2000
        member: tap1 flags=143<LEARNING,DISCOVER,AUTOEDGE,AUTOPTP>
                ifmaxaddr 0 port 9 priority 128 path cost 2000000
        member: lagg0 flags=143<LEARNING,DISCOVER,AUTOEDGE,AUTOPTP>
                ifmaxaddr 0 port 6 priority 128 path cost 20000
        member: tap0 flags=143<LEARNING,DISCOVER,AUTOEDGE,AUTOPTP>
                ifmaxaddr 0 port 7 priority 128 path cost 2000000
tap1: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metric 0 mtu 1500
        description: Attached to Base_clone3
        options=80000<LINKSTATE>
        ether 00:bd:d2:69:f8:01
        hwaddr 00:bd:d2:69:f8:01
        nd6 options=1<PERFORMNUD>
        media: Ethernet autoselect
        status: active
        groups: tap
        Opened by PID 2436
vnet0.1: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metric 0 mtu 1500
        description: associated with jail: nextcloud_1 as nic: epair0b
        options=8<VLAN_MTU>
        ether 00:25:90:dd:f1:90
        hwaddr 02:7b:d0:00:0a:0a
        nd6 options=1<PERFORMNUD>
        media: Ethernet 10Gbase-T (10Gbase-T <full-duplex>)
        status: active
        groups: epair
tap2: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metric 0 mtu 1500
        description: Attached to Cockpit
        options=80000<LINKSTATE>
        ether 00:bd:c0:7b:f8:02
        hwaddr 00:bd:c0:7b:f8:02
        nd6 options=1<PERFORMNUD>
        media: Ethernet autoselect
        status: active
        groups: tap
        Opened by PID 2710
tap3: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metric 0 mtu 1500
        description: Attached to Nginx
        options=80000<LINKSTATE>
        ether 00:bd:14:8e:f8:03
        hwaddr 00:bd:14:8e:f8:03
        nd6 options=1<PERFORMNUD>
        media: Ethernet autoselect
        status: active
        groups: tap
        Opened by PID 2937
vnet0.2: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metric 0 mtu 1500
        description: associated with jail: plex_1 as nic: epair0b
        options=8<VLAN_MTU>
        ether 00:25:90:d5:36:4f
        hwaddr 02:7b:d0:00:0d:0a
        nd6 options=1<PERFORMNUD>
        media: Ethernet 10Gbase-T (10Gbase-T <full-duplex>)
        status: active
        groups: epair

Ok that all looks fine. I take it the network in the VMs is fine?
How about ifconfig in a jail. Are they set to use DHCP or static addresses?

Everything is set to dhcp, vms and jails, I set the ip statically in pfense.

The vms Ethernet adapter is set to up.

The jails work however, it’s just the vms.

The vms use the virtio driver.

Oh. Do the macs in the VMs match the taps in the host?

I do not know, I shall check.

What is the relation to the taps?

The tap is the interface for the device bhyve uses on the host side. A tap is a network interface that is added to the bridge, and it’s also a device in /dev/ that bhyve opens for the VM.

I just double checked though and my VMs don’t actually match anyway, so nevermind.

Any error messages? :smiley:

Found something.

I got around the networking issue by removing the lagg0 and now just setup all the interfaces. I have attached the igb1 interface to this Nginx vm and when I went to go boot it the vm still has the IP of the lagg0 in the log file. So this is turn causes byhve vnc to fail and I cannot ping the vm.

[2020-02-29 13:40:55,094] (DEBUG) VMService.vm_21.run():179 - ====> NIC_ATTACH: igb1
[2020-02-29 13:40:55,145] (DEBUG) VMService.vm_21.run():291 - Starting bhyve: bhyve -A -H -w -c 2 -m 1024 -s 0:0,hostbridge -s 31,lpc -l com1,/dev/nmdm21A -l bootrom,/usr/local/share/uefi-firmware/BHYVE_UEFI.fd -s 4,virtio-net,tap3,mac=00:a0:98:01:57:28 -s 29,fbuf,vncserver,tcp=192.168.1.10:5987,w=1024,h=768,, -s 30,xhci,tablet -s 3:0,virtio-blk,/dev/zvol/tank/vms/Nginx-e6uvt 21_Nginx
[2020-02-29 13:40:55,196] (DEBUG) VMService.vm_21.run():304 - ==> Start WEBVNC at port 5887 with pid number 7623
[2020-02-29 13:40:55,197] (DEBUG) VMService.vm_21.run():313 - Nginx: 29/02/2020 13:40:55 ListenOnTCPPort: Can't assign requested address
[2020-02-29 13:41:02,839] (DEBUG) VMService.vm_21.run():313 - Nginx: rdmsr to register 0x140 on vcpu 0
[2020-02-29 13:41:02,840] (DEBUG) VMService.vm_21.run():313 - Nginx: wrmsr to register 0x140(0) on vcpu 0
[2020-02-29 13:41:02,949] (DEBUG) VMService.vm_21.run():313 - Nginx: rdmsr to register 0x140 on vcpu 1
[2020-02-29 13:41:02,950] (DEBUG) VMService.vm_21.run():313 - Nginx: wrmsr to register 0x140(0) on vcpu 1
[2020-02-29 13:41:03,936] (DEBUG) VMService.vm_21.run():313 - Nginx: rdmsr to register 0x34 on vcpu 0
[2020-02-29 13:41:04,299] (DEBUG) VMService.vm_21.run():313 - Nginx: Unhandled ps2 mouse command 0xe1
[2020-02-29 13:41:04,805] (DEBUG) VMService.vm_21.run():313 - Nginx: Unhandled ps2 mouse command 0x88
[2020-02-29 13:41:13,302] (DEBUG) VMService.vm_21.run():313 - Nginx: rdmsr to register 0x17f on vcpu 0
[2020-02-29 13:41:13,305] (DEBUG) VMService.vm_21.run():313 - Nginx: wrmsr to register 0x17f(0x2) on vcpu 0
[2020-02-29 13:41:13,305] (DEBUG) VMService.vm_21.run():313 - Nginx: rdmsr to register 0x17f on vcpu 0
[2020-02-29 13:41:13,306] (DEBUG) VMService.vm_21.run():313 - Nginx: rdmsr to register 0x17f on vcpu 1
[2020-02-29 13:41:13,306] (DEBUG) VMService.vm_21.run():313 - Nginx: wrmsr to register 0x17f(0x2) on vcpu 1
[2020-02-29 13:41:13,307] (DEBUG) VMService.vm_21.run():313 - Nginx: rdmsr to register 0x17f on vcpu 1
1 Like

I had an issue in FreeNAS once where my lagg mac address changed after an update. In general, laggs take some additional care and are probably not too thoroughly tested between updates.

Can you remove and re-add the console interface? I don’t remember if that’s possible through FreeNAS…

Yeah it broke so I removed it but now its hardcoded somewhere and FreeNAS still keeps wanting to use so the only thing I can think of is to destroy the VM and fully recreate it which will be a huge pain in the ass because I have to do this for 2 VMs and 2 jails.

No I dont think so either.

Can you revert or did you upgrade your pool?

I went from 11.3 -> 11.3-U1.

The pool did not upgrade any versions but it was stable when I upgraded from 11.2-U7 -> 11.3 for this I had to upgrade the pool.

I honestly didn’t expect this much trouble from a minor release.

1 Like