I Did A Stupid. [Unraiding my NVME Raid]

Hi everyone, I’d like some advice.

Basically, I set up an nvme raid-0 using two 512gb Samsung 970 Pros as outlined by Wendel in an old video [which I am not allowed to link] on an Asus Crosshair Hero VII

And have very happily used that as my daily-driver boot drive for five years.

…yeaaah…

Man, I’ve had some pretty good luck right?
When I told my friends about my slightly ridiculous raid-boot drive one cryptically said
“Someday god is going to punish you for your hubris”
Well, Five years down the line, that day has come.

After a random restart, Something got corrupted and now my computer struggles to boot. It hangs and crashes on the little boot screen with the rolling dots. SOMEHOW, 1 in roughly 20 times it makes it through and boots into windows. I’m messaging you from the computer right now, But I’ve been too scared to turn it off for two weeks for fear that’ll be it.

I guess my fun is over. :frowning:

Two questions,

In theory, I SHOULD be able to to just image the Raid with something like Macrium Reflect, back that up to a different NVME and just boot off that like nothing happened right?

Also, there should be some kind of way to fix whatever is going on with my current system with a Macrium reflect recovery drive right? I’m told its got some way of fixing issues that can prevent windows from booting.

I should just reformat it, but don’t want to. Y’know how a computer you’ve used for a really long time gets… sort of comfortable? It’s like a well worn couch that fits your arse juuuust right. I’m not quite ready to move yet.

At the very least, Ive got all my files backed up. I just need to save my windows install somehow!

Thanks in advance!

I probably won’t be able to help too much here but just to clarify for others who might see this, is there any concrete evidence that the problem is with your boot drives? Could it be some other hardware issue with your motherboard?

Yes, however I wouldn’t be so sure its the raid array causing your problems, if it was corrupted it wouldn’t let you into windows at all… which makes me thing the problem could be coming from somewhere else, bad memory being really high up on the list of suspects.

You could run chkdsk at this point to find corruption, but it might be more prudent to attempt a backup first.
I can’t remember which backup utility I was using (might have been R-Drive) but it actually caught corruption that my RAM was introducing to the backup as I was backing up which clued me into my memory failing. up to that point I thought my SSD was failing; so backing up could actually be a troubleshooting step for you.

That’s… a horrific possibility I hadn’t considered before.
But that being said, Ive been using the machine stably for the last two weeks. Wouldnt an issue with memory or somewhere else on the board have revealed itself by now?

Not necessarily, I had my stability problems for more than 4 months before realizing it was memory. The majority of that time I thought my SSD was dieing because I kind of abused it plotting chia.

Luckily you can mostly rule out corruption pretty fast with chkdsk and then move on to more difficult to diagnose problems.

1 Like

…How Horrifying…

In that case, I’ll try to diagnose further and report back.
Thank you for the help!

Memory tests are a easy enough thing to do. Also checking motherboard for leaking caps.

If it does restart try keeping it off, unplugging it from power, then press the power button 10 times. Leave for a minute then try again but only as a last use case because we still don’t know what is wrong

I can say I recently had a x570 machine give all sorts of weird hd issues, even after replacing drives, cables, and power supplies. ECC memory and memtest checked out. Moved them to a different machine and everything worked great. Still have no idea what’s wrong with that box, it’s reliably been a proxmox machine for a few years.

Hello again!

It’s been some time. Y’see I was able to keep going along for another month using my favorite strategy; “The Ostrich Gambit”

Can it be said you really have a problem if you bury your head in the ground and pretend it isn’t there?

Well. Yes, …For a little while. Unfortunately a few weeks ago I wound up restarting (While trying to configure windows to NOT restart) and here I am, Locked out again.

First thing I tried was making a windows2go drive with Rufus, But weirdly enough THAT refused to boot up too. Which is… weird.

Next, I tried unplugging the RAM. I tried a few different positions with some of the different sticks, Still won’t boot.
Which would clear the RAM as a suspect wouldn’t it?

Which… seems to imply some sort of insidious Motherboard issue right?

If you have any suggestions, I’m all ears. Im slightly baffled by whatever appears to be going on here. My next and possibly last attempt at a solution will be uplugging my beloved NVME drives to see if they were, infact causing the issue somehow.

A silly question; Pulling them is basically KAPUT for the NVME raid right? No way to get it back after that?

Thanks you for your help. I’ll let you know when I figure out what’s going on.

Removing your drives for a short time period shouldnt harm them but replacing them in the wrong order could very well throw a wrench into the gears.
I remember my old dinosaur server ( scsi drives)
Removed the hotswaps for maintenance cleaning and a fan replacement and the system re initialized the raid.
:face_with_symbols_over_mouth::face_with_symbols_over_mouth::face_with_symbols_over_mouth::face_with_symbols_over_mouth::face_with_symbols_over_mouth::face_with_symbols_over_mouth::face_with_symbols_over_mouth::face_with_symbols_over_mouth::face_with_symbols_over_mouth::face_with_symbols_over_mouth::face_with_symbols_over_mouth::face_with_symbols_over_mouth:
Lost a very well customized gentoo installation i hadn’t had the chance to make a backup on.
But nvme drives, Im not sure of? ( thats right im running a dinosaur ranch😄)

Anyhow for most of us who’ve been techs for a long time understanding the complexity levels for the different class of machines is an understated gift.
the average pc user does not have the in depth training and often gets in way over their head.
The difference is however is when they pick the brains and learn from it, they retain what they learn.

But on the other hand you’ll have a few who will pay as much attention to you as a dog in a park full of squirrels.:chipmunk::chipmunk::chipmunk::chipmunk::chipmunk::chipmunk::chipmunk::chipmunk:
( as in you clean up my messes and ill keep wading in the :poop: you tell me to avoid)
Lord knows I’ve seen way too many of those people👿

So basically make very very certain I put the drives back in the right spots.
Got it. I’ll pretend that I’m the count on Sesame street.
ONE, Ah ah ah, TWO, Ah ah ah, DRIVES ONE AND TWO, AH AH AH.

Lamentably, I might be one of the squirrely types you mention.
An “enthusiast” who knows just enough to get into trouble, But not enough to get himself out of it again.

Hello again! I figured I’d let everyone know I found the culprit;

Faulty CPU!
I googled “5950X die” because I wanted to see what it actually looked like under the IHS. The first result was a Reddit user whos CPU… actually died.

An old 2700X works fine. My 5950X hangs on the rolling dots.

In a weird way this is the best possible scenario because I have two months left on my RMA window!
In the end I was so convinced it was the motherboard I replaced my old board with a midrange X570 board.

Maybe I’ll try to get my old raid back at some point in the future. If so, I’ll need your help again!

Thanks for all the help you all gave me. May you be blessed for seven generations. :blush: