Very interested to see how they respond to that, i was looking to disable bonding and was a little annoyed the option was missing for me as well.
Anyways, i thought i might document my experiences with the X470D4U and maybe get some input.
Started off with the hardware:
X470D4U
AMD R5 3600
64gb ddr4 of crucial CT16G4DFD8266.C16FD1 from QVL
AQN-107 10Gbe NIC
LSI 9211-8i, in IT mode
2x nvme m.2 1tb drives
Right off the bat i had some interesting issues with instability with the IPMI/BMC controller, where the board would not boot and the BMC would lock up and freeze and become unresponsive (in some cases it would reset itself 45 seconds later but not often)
Only boot device being attempted to be used is either a memtest86 usb or the unraid usb.
With a whole lot of debugging and testing i could always reproduce these failures generally by having the remote control KVM window open while the system attempts to boot/reboot.
Contacted asrock rack support and managed to go through various fix attempts and debug attempts with no effect, i was sent a replacement BMC flash chip to test if it was a corruption issue, and it did not resolve the problem, i even sent in the board for them to investigate and try and reproduce the issue to no avail.
I can only hazard a guess there is some instability with usb devices and the BMC during booting if you interact with/use the KVM features at the same time.
Decided to mostly give up on the issue and work around it by ignoring it. and everything has been working generally 100% stable with great performance and minimal power consumption.
ā¦
Months without issue then comes one day where unraid has pretty much hard locked and only partially responsive, i was unable to get the unraid system to unmount the array and drives and was forced to power cycle the system manually.
There was a series of various errors in the dmesg output; such as some PCI-E errors Like:
[1100211.258510] pcieport 0000:00:01.1: AER: Corrected error received: 0000:01:00.0
[1100211.258516] atlantic 0000:01:00.0: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID)
[1100211.258518] atlantic 0000:01:00.0: device [1d6a:07b1] error status/mask=00000001/0000a000
[1100211.258519] atlantic 0000:01:00.0: [ 0] RxErr (First)
[1302904.252985] pcieport 0000:00:01.1: AER: Multiple Corrected error received: 0000:01:00.0
[1302904.252993] pcieport 0000:00:01.1: PCIe Bus Error: severity=Corrected, type=Data Link Layer, (Transmitter ID)
[1302904.252995] pcieport 0000:00:01.1: device [1022:1483] error status/mask=00001000/00006000
[1302904.252997] pcieport 0000:00:01.1: [12] Timeout
[1302904.253000] atlantic 0000:01:00.0: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Transmitter ID)
[1302904.253002] atlantic 0000:01:00.0: device [1d6a:07b1] error status/mask=000010c1/0000a000
[1302904.253003] atlantic 0000:01:00.0: [ 0] RxErr (First)
[1302904.253005] atlantic 0000:01:00.0: [ 6] BadTLP
[1302904.253006] atlantic 0000:01:00.0: [ 7] BadDLLP
[1302904.253007] atlantic 0000:01:00.0: [12] Timeout
[1302904.253008] atlantic 0000:01:00.0: Error of this Agent is reported first
[1342611.911300] pcieport 0000:00:01.1: AER: Corrected error received: 0000:01:00.0
[1342611.911307] atlantic 0000:01:00.0: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID)
[1342611.911308] atlantic 0000:01:00.0: device [1d6a:07b1] error status/mask=00000001/0000a000
[1342611.911310] atlantic 0000:01:00.0: [ 0] RxErr (First)
[1431745.380984] pcieport 0000:00:01.1: AER: Multiple Corrected error received: 0000:01:00.0
[1431745.380992] atlantic 0000:01:00.0: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID)
[1431745.380994] atlantic 0000:01:00.0: device [1d6a:07b1] error status/mask=00000041/0000a000
[1431745.380995] atlantic 0000:01:00.0: [ 0] RxErr (First)
[1431745.380996] atlantic 0000:01:00.0: [ 6] BadTLP
Which points to the AQN-107 NIC, these are similar pci-e errors as seen by the user @nx2l
and at the end of a dmesg output a btrfs error:
[1578930.466121] ------------[ cut here ]------------
[1578930.470635] kernel BUG at fs/btrfs/ctree.c:3242!
[1578930.475125] invalid opcode: 0000 [#2] SMP NOPTI
[1578930.479513] CPU: 11 PID: 30752 Comm: kworker/u64:7 Tainted: G D W O 4.19.107-Unraid #1
[1578930.484007] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./X470D4U, BIOS P3.30 11/04/2019
[1578930.488543] Workqueue: btrfs-endio-write btrfs_endio_write_helper
[1578930.493064] RIP: 0010:btrfs_set_item_key_safe+0xc0/0x136
[1578930.497604] Code: 00 4c 89 ef 48 8d 74 24 07 48 63 d2 48 6b d2 19 48 83 c2 65 e8 81 17 04 00 48 89 de 48 8d 7c 24 07 e8 95 f4 ff ff 85 c0 7f 02 <0f> 0b 48 8b 43 09 49 63 d4 b9 11 00 00 00 4c 89 ef 48 6b d2 19 48
[1578930.507253] RSP: 0018:ffffc9001b68bbc0 EFLAGS: 00010246
[1578930.512121] RAX: 0000000000000000 RBX: ffffc9001b68bca5 RCX: 000000000000006c
[1578930.517071] RDX: 0000000000000000 RSI: ffffc9001b68bca5 RDI: ffffc9001b68bb9f
[1578930.522017] RBP: ffff8884c44202a0 R08: 0000000000001000 R09: 0000160000000000
[1578930.527045] R10: ffff888000000000 R11: 0000000000000000 R12: 000000000000005a
[1578930.532082] R13: ffff888646607810 R14: 0000000000002d29 R15: ffff888f816bb800
[1578930.537128] FS: 0000000000000000(0000) GS:ffff888fce8c0000(0000) knlGS:0000000000000000
[1578930.542302] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[1578930.547485] CR2: 0000000002f65000 CR3: 00000003abc7a000 CR4: 0000000000340ee0
[1578930.552733] Call Trace:
[1578930.557975] __btrfs_drop_extents+0x5e2/0xb12
[1578930.563263] insert_reserved_file_extent.constprop.0+0x98/0x2cc
[1578930.568469] btrfs_finish_ordered_io+0x317/0x5d2
[1578930.573540] ? __switch_to_asm+0x35/0x70
[1578930.578462] ? __switch_to_asm+0x41/0x70
[1578930.583237] ? __switch_to_asm+0x35/0x70
[1578930.587858] normal_work_helper+0xd0/0x1c7
[1578930.592309] process_one_work+0x16e/0x24f
[1578930.596748] worker_thread+0x1e2/0x2b8
[1578930.601164] ? rescuer_thread+0x2a7/0x2a7
[1578930.605469] kthread+0x10c/0x114
[1578930.609594] ? kthread_park+0x89/0x89
[1578930.613652] ret_from_fork+0x22/0x40
[1578930.617653] Modules linked in: ext4 mbcache jbd2 macvlan xt_CHECKSUM ipt_REJECT ip6table_mangle ip6table_nat nf_nat_ipv6 xt_nat iptable_mangle ip6table_filter ip6_tables vhost_net tun vhost tap veth ipt_MASQUERADE iptable_filter iptable_nat nf_nat_ipv4 nf_nat ip_tables xfs md_mod nct6775 hwmon_vid k10temp bonding atlantic igb(O) edac_mce_amd kvm_amd ipmi_ssif kvm crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel pcbc aesni_intel aes_x86_64 crypto_simd cryptd mpt3sas i2c_piix4 nvme i2c_core ccp ahci wmi_bmof raid_class glue_helper scsi_transport_sas nvme_core libahci wmi button pcc_cpufreq ipmi_si acpi_cpufreq [last unloaded: atlantic]
[1578930.653701] ---[ end trace 2b07a24045c31257 ]---
[1578930.658298] RIP: 0010:__x86_indirect_thunk_rax+0x3/0x20
[1578930.662875] Code: 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 0f ae e8 <ff> e0 0f 1f 84 00 00 00 00 00 0f 1f 40 00 66 66 2e 0f 1f 84 00 00
[1578930.672370] RSP: 0018:ffffc90006883c28 EFLAGS: 00010202
[1578930.677088] RAX: 0000ac00e82e382c RBX: ffff888146124240 RCX: 0000000000000000
[1578930.681847] RDX: ffff888866b1a830 RSI: ffff888866b1a700 RDI: ffff88814b0f9bc0
[1578930.686550] RBP: ffff88814b0f9bc0 R08: 0000000000000001 R09: 0000000000000000
[1578930.691226] R10: 0000000000000001 R11: ffff888fce69fb40 R12: ffff88814b0f9c18
[1578930.695916] R13: ffff88814b0f9bc0 R14: ffffc90006883c98 R15: ffff888146124240
[1578930.700609] FS: 0000000000000000(0000) GS:ffff888fce8c0000(0000) knlGS:0000000000000000
[1578930.705373] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[1578930.710144] CR2: 0000000002f65000 CR3: 0000000004e0a000 CR4: 0000000000340ee0
After the restart i continued to get the PCI-E errors with the AQN-107, so i am unsure if the card is actually having issues or just continuing odd behavior of the board.
I re-formatted the cache drives which had btrfs uncorrectable errors, rebuilt the docker image file which became read only, and also pulled out the AQN-107 nic card for now to test and check, and hope it continues to be stable.
Overall i really want to love this board but its been some of the most peculiar bit of hardware i have worked with in a while.