:( Motherboard, some of RAM no longer detected, thinking of replacing motherboard (bought 2017)

Hi all,

Welcome any comments!

I write with sad news that my “finally ready” FreeNAS box, has a RAM issue. I’ve been happily learning FreeNAS for the last 9 months or so and was just about to put it into use (personal/work). And then…I saw a huge amount of RAM errors on the terminal :frowning:

Red indicates populated slots, inner ones (to CPU) had 8GB sticks.

“Bank 7” was the only hint I saw, so I took out what I assumed was bank 7 (2nd out from the right of the CPU) and restarted the machine. At this point, 3 RAM sticks on the other side (blue area) were not being detected by the UEFI or FreeNAS :frowning: (no photos sadly)

The only recent change I did around 8 hours previous, was to change the power C-state in an attempt to lower the power consumption (which worked, 140W to 110W). Surely doing that didn’t damage the RAM slots?!?!

The diagnostics I did was taking the sticks that weren’t being detected and put them in the working ones, all was well. The RAM is separated into 2 banks either side of the CPU as shown above, the 3 populated slots that stopped working were all on one side (left side, blue area). Interestingly i found that the one unpopulated slot on that side appears to work. It’s as if one side of RAM slots went pop but only those that were populated.

This machine originally had 2 x 8GB ECC REG 2400Mhz.

3 months ago I bought another 2 x 16GB ECC REG 2133Mhz.

2 weeks ago I got an additional 2 x 16GB ECC REG 2133Mhz.

Each time I buy RAM, I run MemTest strictly on the new RAM for between 5-7 days, 24 hours per day and during this, I was alerted to no errors.

Just had a thought, could it be the PSU? It is as old as the motherboard (around 3 years)?

So I’m thinking i need a new board (this board is known for manufacturing issues), as there’s no knowing when the other slots are going to fail! Also with these faults, the quad RAM facility won’t work. So far I’ve found some SuperMicro boards, I’m going that way as I’ve already got one and it’s superb. So far I’ve found the following

All have 8 memory slots and take Reg RAM up to the speed of the RAM I have. All have 10 SATA ports which is perfect and all have dual NICs (they’re Intel too, which FreeNAS likes):

  • X10SRL-F
  • X10SRi-F (IPMI, that’s nice)
  • X10SRH-CF

So far I’m pitching for the middle one, but I very much welcome any thoughts or comments. I saw a quad NIC port version, but it was crazy money (close to 1000 dollars).

Thanks in advance! :+1:

Current board: Asus X99-E WS
CPU: Intel Xeon E5-1650v4 (5C/12T)
RAM:
4 x 16GB ECC REG 2133Mhz
2 x 16GB ECC REG 2400Mhz
PSU: Corsair HX750i

Well, another day another dollar and the left hand side has come back! I left it turned off/unplugged over night, but life should not really be like this!

The outer 2 are detected, but not the inner 1 :frowning:

Least useful picture, but just for fun :roll_eyes:

No, only way to damage ram slot I can think of is to physically break it or put too much current trough it (short).
So if you didn’t have “magic blue smoke”, then your slots are fine.

And from your latter description I would suspect that lowering voltage through C state maybe is bugging out memory controller in CPU.

Random nature of your experience suggesting that not all clickers* in CPU are clicking properly when they should. And they are known to do that when getting not enough of voltage.

Also 140W stock idle on E5?? Seems way to high for me. You measuring 12V rail?

*clicker - technical term for semiconductor switch, also known as transistor :wink:

1 Like

these big socket cpus are always sensitive to vibration and moving the computer around, you can try reseating the processor and ram, the most consistent results will be when the mother board is parallel with the table and not perpendicular

1 Like

Phew, well I’m glad about that. No magic blue smoke either, double phew.

I reset the BIOS as soon as funky things happened, but it seems the issue persists - though it has returned a few slots to me now, which is nice…still, I like the 80GB I did have :frowning:

Ah, those pesky clickers (thanks for the hint about what that means!)

Yeah, power is a bit high but always thought that was normal. I’m measuring from a plug in wall doo-dad, that provides Watts, Volts and lots of other things. I always thought it was reliable, may be I can test it on a light bulb or something!

Hmm, great advice, I don’t recall it getting any knocks, in fact it hadn’t been touched all day, actually no one had been in the building! I’ve re-seated the RAM more times than I can count now, but I will definitely reseat the CPU if needs be. In fairness, it’s been in that socket for 3 years, so may be it’s due for more thermal paste.

Thank you for your comments though, really appreciated :+1: Oh, if I laid the board flat, is it OK to have the trayless hard drives sideways do you think? Might flop about? I’ve only ever had them horizontal.

https://www.scan.co.uk/products/startech-525in-trayless-hdd-hot-swap-mobile-rack-for-35in-internal-sata-hard-drive

Thanks all!

Oh, and since this mornings pics (showing the BIOS screen and 8,16,8,16GB sticks, I’ve changed the 8’s for 16’s…if I’m going to have lower amounts of RAM, might as well use the newest biggest ones!

Hello all,

thanks again for looking at this, here’s a video that might be of some help…or just the lowest form of entertainment! Confirms the power usage @misiektw :slight_smile:

1 Like

Yes that may be possible solution as well, one of connections on cpu might not connect well, and is dangling freely-neely ;).
It doesn’t have to be knocked for that. Few disks, fans, that all vibrates case in some degree.

Yeah, now its clear :slight_smile:
Don’t worry about wattage I mentioned, since you’re measuring that from the wall.
140W from wall that’s around 120W from case itself (around 80-90% PSU efficiency)
And since you have “few” disks and some other stuff, I would recon your processor takes around 20-50W idle, which is fine. All ballpark numbers of course.
You can try to stress cpu for a while (stress -c 12), you should see that power draw skyrocket to 200-300W. Also thermal stress may help “reseat” CPU on its own. But watch temps, (watch -n 1 sensors). If paste is dry it will overheat and throttle or shut down.
I’m not sure if Freenas has those commands, but if not there’s something similar probably.

Love the accent btw :slight_smile:

Edit: One more thing, since this board is from 2017, maybe you never updated bios in it? They might added some RAM compatibility patches since then? But keep old version of bios around, to roll back just in case :wink:

1 Like

Ah, I kinda forgotten about all those vibratey things, if it persists, I’ll definitely re-seat the ole’ CPU :+1:

Ah yeah (No. 2), there is some loss isnt’ there, mind you, I think I’ve got a platinum jobbie in there, so it shouldn’t be too bad. I’ve gotta try that power state again, cos FreeNAS really just wants the full beans all the time - an idle of 20-50w would be my dream.

I will definitely try stressing it some time, though I think I did that when I first bought it and it did pretty well - when I had Windows on it, wattage was fine and dandy. The more you say, the more I reckon I will re-seat and paste it :+1:

Those commands aren’t familiar to me in Freenas, but I’ve found a few bits and pieces of commands that might work!

Glad you like the accent, I was born with it you know :slight_smile: I can also do Irish, French, Australian and South African :laughing:

I have kept an eye on BIOS updates, it seems largely forgotten, but I will have another look. At the time it was advertised as an all singing all dancing mobo, but considering there are many threads out there complaining, it’s not as good in practice sadly! It also cost so much, I’m reluctant to replace it any time soon, but if there’s a chance it’ll affect my data, it’s outta here!

Thanks again pal :+1:

1 Like

What is your SA voltage running at?

I’m not entirely sure, to be honest I’m not entirely sure what that means - but googling it’s to do with CPU over voltage, may be? So far as that’s concerned, it’s not overclocked in anyway as it’s not an overclockable CPU.

Very welcome for a correction :+1:

1 Like

SA (system agent) voltage on intel platforms affects the imc (intergrated memory controller).
on x99 when stock it sometimes runs at like 0.9V ish i believe on some boards,
depending on board and bios version.
This could lead to issues when you populate all 8 memory slots.
Because eight dimms causes more stress on the imc then like just four dimms.
Maybe you could try to set it to like 1.1V and see if that might solve your issues.
Since you are complaining about dropping out memory slots.

I’m not saying that this is likely the cullprit.
But at least it might be worth checking out. :slight_smile:

1 Like

Thank you very much for that, I’ll definitely give it a go, really appreciate your patience!

I really had never thought about the additional stress of using all dim slots.

It’s certainly a peculiar one, as it was running happily with 6 for a fair few weeks, and I reset the bios - taking a pic of the things I’d changed of course!

Again, muchos thank you :+1:

citation needed :wink:

I agree with MA on most things, except “stress” on IMC that seems pretty much goobly-gook, especially in case of going with more ranks. Because I had this discussion before, and I couldn’t find out what this “stress” means, how do you measure it, or how does it work, assuming that IMC is properly designed.

So yes, voltages could go down if memory VRM is not very good. But afaik IMC doesn’t supply power to memory. It just reads what is on the bus.

And yes, adding more memory sticks may cause frequency drop, because more stuff (interweaving) is happening on the same bus.

Also yes, using more channels is causing that more stuff in IMC has to work. Yet I haven’t seen anybody recommending single channel operation… because IMC ‘stress’.

That leads us to ranks. Current IMCs are designed to support at least 4 ranks of memory on single bus (channel). Of course it may be designed poorly. But only difference I can see in single vs multi rank operation is that data may happen while waiting on other rank.
So IMC is doing exactly the same thing, when there’s nothing else to do. However it has to clock anyway, at the same speed, just with different setting of clickers ;).

So as I understand, to show extra loading on IMC one would have to measure that specific part of the chip “do data” state takes more power that “ignore data” state. May be problematic.

Yeah, it probably won’t help you much Chris, sorry :wink: But I’m just hoping that someone with EE background can shed some light on the issue, because I may be wrong :smiley:

1 Like

Thanks for your 2 cents @misiektw …or should I say at least 20 bucks worth! I think I understand what you’re saying, what a pickle eh? :slight_smile: I guess it comes down to me having a slightly rubbish motherboard, which is a shame as it cost quite a bit of money - looks like you don’t get what you pay for after all, no surprise there! :cry:

I’m going to tinker with it again when I’ve got a moment, though I’m having to focus my efforts on digging holes (for new office build) and doing the drawings stuff I do for income at the moment , priorities and all that - the fun of getting ‘mature’ :laughing: :+1:

1 Like

This topic was automatically closed 273 days after the last reply. New replies are no longer allowed.