SOLVED: Gigabyte X670 Aorus Elite AX (rev 1.0) - a nightmare MB

Solution:

Described in this reply: SOLVED: Gigabyte X670 Aorus Elite AX (rev 1.0) - a nightmare MB - #12 by EfPe

Original post:


Greetings,
I have foolishly built a system using this motherboard. It had some problems from the start (I do not exactly remember, since it was a year ago), but recently, it just continues to drive me mad and consume a lot of time to do anything with it.

If anyone has experienced similar issues and managed to solve them - please let me know how. It seems like my sanity is slowly deteriorating.

Hardware config:

  • 64GB ram - chosen from “supported dram” list to avoid problems - 2x G-SKILL F5-6000J3040G32G
  • CPU: Ryzen 7900 (no X)
  • GPU: XFX Radeon 6700 XT
  • A good PSU (850 Wats, quality brand, don’t remember which, difficult to check right now)
  • NVME 1TB WesternDigital - WDS100T3XHC-00SJG0 - for windows 11, in slot “C”
  • NVME 1TB Samsung 980 - for use with QubesOS, in slot “A”
  • NVME 1TB Samsung 990 Pro - for use with QubesOS, in slot “B”
  • MS Sculpt ergo keyboard + mouse
  • SVM, IOMMU: Enabled

Problems:

  • Sometimes it just doesn’t start (hangs at logo screen / text screen when logo is turned off)
  • Sometimes it starts, but does not detect M2 drive in slot C
  • Sometimes it starts, but keyboard and mouse do not work (for the purpose of bios update) - unable to select boot dev or enter bios
  • Sometimes it starts, but no USB flash drives are found (for the purpose of bios update)
  • Most of the time mouse works especially slowly in BIOS
  • Most of the time BIOS works especially slowly overall
  • Upon connecting SAS controller it takes ages to start booting (LSI SAS 9217) - currently disconnected.
  • Qubes OS fails to boot most of the time (it just reboots).

(When mentioning slow boot, I do not mean the “ram training” delay - I mean the amount of time it takes just to show a logo)

Attempted remedies:

  • Various BIOS versions from F11 to F 32g
  • Disabling secure boot
  • Disabling pcie power mamagement
  • Removing LSI controller
  • Turning on “fast boot”
  • Turning off “fast boot”
  • Turning EXPO on/off - no meaningful difference
  • Moving M2 drive used by win 11 from slot D to slot C (this has somewhat improved my chances of booting into windows)

The funny thing is, that when it manages to boot - it works! It is stable, nothing crashes, can sustain stress tests like prime95 or furmark with no issue.
For this reason I mainly use “sleep” function… dreading the time when winupdate comes my way, or power fails.

Shall I get a new MB?

1 Like

I am having a hard time with x670-e proart and what I learned so far about x670 motherboards is:

(I so regret that I could not find out before buying, though I was looking for some opinions…)

  • The x670 have problems when using more than 2 m.2 drives
  • Im having a boot issues which are intermittent and I cannot identify why this mobo would not boot randomly
  • When booted everything works rock solid as yours
  • I have a theory that at least in my case (asus) the latest bios might have fried something for good, because after trying out a brand new board on the oem flashed bios it was stable, when I flashed the latest one it turned into garbage. (even rollback doesnt help, it’s still a trash)
  • Imo this chipset is a disease and people that have no issues with it are either insanely lucky having configured their rigs the way that x670 accepts it or just won a lottery.
  • I also suspect that x670 has some problems with graphics cards / display output. But this is a 1% shot.
3 Likes

I know I ended up with troubles with a psu… BUT, I also had an issue with a usb device. In particular a Corsair Keyboard.

To this day sometimes I cant get to the bios screen. It boots, and is on bios but the screen is blank.

I have a samsung G80 Oled. Its slow to turn on and causes some white bios check lights sometimes.

My solution for a smother bios was to connect a dirt cheap non led keyboard and nothing else.

As for slow boot times… x670 by defaur does a memory map every boot, but cam be turned off after you have a stable overclock if you are going that route and should still be stable. I know you said thats not it, but it did cut out that delay you were talking about after the ram mapping.

Dunno if itll help, but has solved all of my issues.

2 Likes

interestingly enough, this may be a false assumption for every x670 based board to map memory every boot.
To prove my theory, I run bios on stock settings (with mem training enabled) and when I boot, the orange q-led (dram) flashes for barely half of a second. The whole boot takes no more than 5 seconds untill the POST. In around 20 seconds I’m in windows ready to go.

I have 64 GB of ram = 2 sticks of 32GB, rated at 5200 mhz but running on mobo’s stock 4800 mhz
And these times I consistently achieve with fast boot disabled.

I heard that people are having some issues with mem training every boot but it cannot be the rule of a thumb for every board out there. :smiley:

1 Like

True, but I have my 64gb (dual 32gb) at 6000 expo working properly. It take me less than 10 seconds from button push to login screen. That used to take 45-60 seconds with mem training on. Also fast boot disabled.

Just how I got it working once my voltages were in better control with the newest bios approved by ASUS.

Thanks for all the suggestions.
I would like the issue of ram training to be excluded from here - since I know this is not what fails.

The x670 have problems when using more than 2 m.2 drives

Great, I bough this board especially to use more than 2x m.2.

Imo this chipset is a disease and people that have no issues with it are either insanely lucky having configured their rigs the way that x670 accepts it or just won a lottery.

Does x870 behave any better?
I could go threadripper, but seems like no threadripper for Zen5, and in general seems like low-core-count threadrippers are just not made anymore.

Presently, with no additional HBA, it boot correctly ~80% of the time.

However, If I connect:

  • LSI HBA (no HDDs connected for now)
  • Additional low-power GPU (to later passthrough/vfio main GPU to gaming VM when I’m finished with qubes os setup)

Then it behaves like this:

  1. DRAM training (sometime) - this either occurs with blank screen, or with single cursor somewhere in the upper-ish left corner. This is not a problem for me. During this “DRAM” LED o stays on.
  2. “DRAM” led is now off, “VGA” led turns on and stays on despite GPU obviously working. It does so regardless of iGPU being enabled or not, or 2nd dGPU being present or not.
  3. Then a MB logo with del / F12 / end prompt appears. This is where things fail usually.
  4. If I try to enter bios then it works like 15% of the time, same with boot menu. Sometime ot takes 5-10 minutes to enter BIOS/boot menu.
  5. Disabling “show logo” gave me additional info on whether F12 / DEL was registered or not. Most of the time it registers and shows text “Entering setup” or “Entering boot menu”. However such text might be shown for up to 5-10 minutes. Sometimes it takes 60 seconds, sometimes 10 minutes. There is no rule for this.

I guess it tries to detect devices somewhere. But is does not report what exactly it tries to do… I tried disabling everything I knew I do not need, but there is so much undocumented stuff in this bios… Is there a place where all the acronyms used under “Settings” are explained?
If it gets confused by HBA, then maybe there is an option to disable searching for additional boot devices?

I mean, there is obviously large effort to make the bios more colorful and animated. Things like a good help system or an ability lo load profile after bios update are just ignored.

Funny world, I remember BIOSes for 386 / 486 were easier to “backup” than this uefi circus. If there was printer connected on LPT, then pressing “print screen” printed the actual screen on the actual printer :slight_smile:

wired keyboard advice is sound :slight_smile:

While it sucks to have issues there are few things I do get a bit surprised by.

Blockquote
64GB ram - chosen from “supported dram” list to avoid problems - 2x G-SKILL F5-6000J3040G32G

This is what it says on that page (at the bottom):

Yes, going out of spec is likely to cause issues than staying within spec.
That being said, I’ve honestly never payed attention to QVL as far as memory goes because most isn’t even obtainable or fit my requirements (mainly size). I’ve always gone by JEDEC specs (especially voltage) and I’ve never been bitten by it. I will admit that I’m a bit conservate about brands but as long as you go with a major brand that follows JEDEC specs you’re very likely to be fine. On top of that you have to take into account the specs of the memory controller which may be different depending on memory module setup.

Memory issues can manifest in many different ways but I do recall at least someone saying that devices would randomly disappear during operation and boot on the X670E ProArt when overclock was unstable. I would suggest if your modules support it do run at 5600 with SPD settings do see if any of the issues goes away.

You might also want to consider enabling memory training for each boot but that’ll will hurt boot up times a bit. I’d also recommend turning off any kind of fast boot or such as hardware especially if you have many devices needs time to properly initialize.

Some USB HIDs work poorly in some specific hardware combinations and/or via hubs. Have you tried switching USB ports (controllers) and/or wiring them via your monitor/a hub? Backfeeding can also cause interesting issues…

I’m not sure what “slow” means in terms of BIOS performance, most graphical ones are a bit sluggish and may become even more so if you have a high res monitor while some other brands caps output resolution.

Integration with certain hardware such as storage controllers can lead o slow boots and/or other BIOS related issues. That being said, workstation/server grade hardware isn’t in most cases fast to initiate and fast boot is not something that’s prioritized as target hardware more or less is supposed to stay always on.

Have no idea about Qubes OS but you’re kinda of bleeding edge hardware (as far open source OS/Distros is concerned) so issues are to be expected. If you want “rock solid” you’re probably looking at ~2+ year old hardware especially graphics. Given that Qubes OS 4.2.3 seems to target 6.10.X you’re likely going to run into more issues than going for 6.12 which is LTS and newer however I doubt it’ll be flawless.

I’m not sure about the M.2 claims as many runs more than 2 NVME devices just fine? I mean… the host cards themselves do 4x NVME (bifurcation) additionally to the existing ones.

As far as your choice for motherboards goes I’m personally not a fan of Gigabyte since many motherboards specially workstation ones seems to be rushed and poorly tested. I don’t have a solid choice to give you however I personally do have a few requirements which generally narrows down the selection quite a bit. As far as the AM5 platform goes I’ve only used a few Asus boards (both B650E and X670E based) and they’ve all been fine including the X670E ProArt mentioned earlier.

I have no idea if it’s your motherboard to blame or memory, or the combination of both including your CPU. Latest BIOS is likely the best way to go in terms of troubleshooting.

2 Likes

Thanks for advice, help and attention.

I am running without EXPO, at JEDEC speeds (around 4800MT/S)- just to exclude issues related to memory.
All the mentioned scenarios were conducted in non-EXPO mode, after loading “optimized defaults” (as advised by bios), or without loading these - same problems.
I do not think that memory is the issue here.

You might also want to consider enabling memory training for each boot but that’ll will hurt boot up times a bit. I’d also recommend turning off any kind of fast boot or such as hardware especially if you have many devices needs time to properly initialize.

I agree, I usually run with full-on “slow boot” options. I only tried to enable “fast” boot here to see it it helps in some way - it did not.

Have you tried switching USB ports (controllers)

I use ports on the front panel of my case (Fractal XL or something like that, a big hefty case). AFAIK there is no additional USB hub in the way.
However, I have not tried to switch directly to backpanel connections yet.

I’m not sure what “slow” means in terms of BIOS performance, most graphical ones are a bit sluggish and may become even more so if you have a high res monitor while some other brands caps output resolution.

Moving mouse takes ages, it runs at like 5 fps. Same goes for keyboard-based interaction. It seems like bios itself “draws” slowly. However this is also random - some times it works ok and snappy. Furthermore, while updating BIOS when in “slow” mode every step of update takes much more time (2-5x slower): verification, uploading to “Secure flash” etc

Integration with certain hardware such as storage controllers can lead o slow boots

Yes I know, it takes ages for a dell server to boot. However it does so in a reproducible way. This MB does not. Sometimes it boots ok with HBA, sometimes takes additional minutes. Sometimes just “hangs” on logo screen.

I’m not sure about the M.2 claims as many runs more than 2 NVME devices just fine?

NVME slots A & B are connected directly to the CPU. Slot C is connected to the “first” chipset. Slot “D” is connected to the “second” chipset.
My chances of successful boot greatly increased after abandoning slot D. Here is a block diagram of this chipset-centipede:

Gigabyte X670 Aorus Elite AX block diagram

As far as your choice for motherboards goes I’m personally not a fan of Gigabyte since many motherboards specially workstation ones seems to be rushed and poorly tested.

The world seems to switch preference between GB and Asus from time to time. I choose this MB based on recommendation on level1 YT, which praised its IOMMU layout and linux compat.
I have threadripper-based GB X390 Designare EX at work - never had problems with it. Never failed to boot, not even once.

I have no idea if it’s your motherboard to blame or memory, or the combination of both including your CPU. Latest BIOS is likely the best way to go in terms of troubleshooting.

I don’t think it’s a memory problem. I used latest BIOS, and older bioses too.

1 Like

I recall some options regarding PCIE speed in bios.
Block diagram in my prev. reply seems to indicate that one of the pcie slots on the double-centipeded chipset was downgraded to 3.0 in the latest HW revision. I will try to attempt this downgrade via BIOS (if possible).
Will report back, but can’t promise any deadlines.

Still, if someone has access to “real” manual / docs for this bios (one which explains all the options under “settings” - I do not care about voltages and such) and maybe even interactions between them, then please let me know.

I made some significant progress.

  • Everything works much better, faster, and “snappier” now.
  • No problems in BIOS too - speed is fine, detects all devices all the time, etc.
  • The system overall is now deterministic.

I will run this for several days, see if it is not a coincidence, then post instructions on what to set in BIOS.

My system is currently stable and working better than ever.
Currently I have the following devices connected to the MB:

  • Main GPU
  • Secondary GPU
  • 3x 1TB M.2 drives in slots A-C
  • LSI SAS controller

None of these spontaneously disappear anymore, nor do I experience any problems with boot devices missing / USB flash drives not detected.

Please let me know If these settings helped You

Here are the changes I made to get this MB to work properly:

  • Install BIOS version F31 - I know that there are a bunch of F32X available - but these have letter suffix, GB will replace them over time. I consider those “unstable”. Wanted to use a “stable” version to limit confounding factors.

  • Move M.2 drive from slot “D” to slot “C” (since slot “D” is double-centipeded through 2 chipsets)

  • Adjust BIOS settings:

    • Disable iGPU - I don’t use it
    • Disable EXPO and “DDR auto-boost” - wanted to get a stable system first
    • Disable PCIE power management / aspm
    • Disable “Spread spectrum” - I do not trust it. Caused me problems long time ago. IIRC It is only relevant when one is faced with regulatory compliance regarding EM spectrum
    • Set PCIE gen to 3 on everything other than GFX card
    • Set 2nd PROM21 chipset to a fixed port number (Port 8)

Problems for which I do not care:

  • Built-in wifi has stopped working (“Device cannot start”)

BIOS Screenshots:

Tweaker Main menu

Settings -> PRMO 21 Chipset common options

Settings -> NBIO Common Options

Settings -> IO

Settings -> Devices

1 Like

I’m glad you got your system stable! But I’m curious if you’ve considered to “un-disable” some settings one by one to optimize the performance and narrow down the root cause?

It’s sad to me that so many things had to be changed from optimal values, e.g. ASPM disabled, NVMe drives forced to get 3, one NVMe slot cannot be used, etc.

1 Like

Yes I did consider it, but after a few days of mostly staring at boot screen, I need a break :wink:

I am probably gonna RMA this board though.

Fair! Which board are you considering to replace it with?
I’m currently considering the ASUS ProArt X870E but after hearing a lot about issues with ASPM, multiple NVMe drives, etc on AMD X670E/X870E platforms I’m a bit scared lol.

I honestly do not understand MB market anymore. I see no reason to pay ~$200 more for “pro art” versus cheaper “ASUS PRIME X870-P WIFI”. (at least in Poland this is how these are priced).
This is not a recommendation - I only found this “prime” via sorting by price and picking maybe 6th cheapest.
What is the point of paying more for “creator” stuff, if it uses the same (possibly buggy) chipset and has, essentially, same IO capabilities?
I do not care about overclocking, I care about IO and stability. Maybe also about bios settings being persistent through updates (GB cannot do that).
I also do not care how pretty MB is. I do not intend to stare at it.
Seems like whole MB market migrated to spending time and money on pointless stuff instead of essentials.

Of course, I don’t yet know how IOMMU is layed-out on “prime”. What type of eth it has, how said eth works under linux… same for BT/WIFI.

2 Likes

The Prime is really limited in terms of I/O. It is probably fine for a lot of people. But I would like to have as much flexibility to install NVMe SSDs as we used to have with SATA disks when boards gave you more than enough ports. It’s 2025… it shouldn’t be too much to ask?

The I/O on the Prime works like this:
PCIe expansion:

  • 1 x PCIe 5 x16 (from CPU)
  • 2 x PCIe 4 x16 (x1, from chipset, disabled if M.2_3 is used)

NVMe:

  • M.2_1 (Gen 5 x4, from CPU)
  • M.2_2 (Gen 4 x4, from chipset)
  • M.2_3 (Gen 4 x4, from chipset, disables PCIe slots 2, 3)
  • M.2_4 (Gen 3 x2, from chipset, disabled if you use the SATA ports)

So if you want to have 4 NVMe SSDs, forget about having installing a second add-in card. That’s like a mini ITX system at that point!

The ProArt is not amazing either but it has more flexibility:
PCIe expansion:

  • 2 x PCIe 5 x16 (from CPU, x16/x0 or x8/x8 or x8/x4/x4 bifurcation)
  • 1 x PCIe 4 x16 (x4, from chipset)

NVMe:

  • M.2_1 (Gen 5 x4, from CPU)
  • M.2_2 (Gen 5 x4, from CPU, drops your PCIe slots 1, 2 to x8, x4 if used)
  • M.2_3 (Gen 4 x4, from chipset)
  • M.2_4 (Gen 4 x4, from chipset)

Hopefully I didn’t mess up the specs trying to summarize it.

The ProArt is seems more functional if you’re building a “prosumer workstation”. You also get 10 GbE + 2.5 GbE networking, unlike on the Prime, where you might not even be able to add a NIC if you want to use all 4 NVMe slots!

Well, that is assuming it all works as intended - my worry is that if one were to utilize all the hardware, there would be weird bugs as people have been reporting occasionally…

1 Like

Thanks, awesome breakdown!

I am however still not ready to get into details of all this mentally.

EDIT: I wish there was a low-core-count threadripper, maybe 16, and a MB priced similar to ProArt. That would make sense to me.

Threadripper or Threadripper Pro?

  • 7945wx, 5945wx, and 3945wx are 12 core
  • 7955wx, 5955wx, and 3955wx are 16 core

Zen and Zen+ Threadrippers were also available with eight cores.

Motherboard pricing’s around 3x upper end desktop (e.g. ProArt), though. Similar pricing’s not happening due to the additional layers needed to route the board, cost of the dielectrics needed to support PCIe 4.0 and 5.0 speeds, the redrivers/retimers/switches needed to get PCIe 5.0 across an ATX board, and higher per board NRE due to greater design complexity on a smaller addressable market.

3 Likes

Thanks for models and explanation as to why it is so expensive.

1 Like