Determining cause of critical hardware failure

I'm having an issue in hardware I don't know how to debug further. The machines posts, boots, and in Linux continues through login, network connection, and start updating (then crash) and in Windows crashes during login. Poorly taken screen-shots here: https://imgur.com/a/dBm2D

Fully passed memcheck+ v5.01
The Windows and Linux systems are of separate disks.

Have you tried doing a memtest?

memcheck+ v5.01 fully passed and stable.

It says memory read error on DRAM DIMM#0. Have you tried different RAM configurations?

Only when running through memcheck.

This also appears consistent over 2 motherboards. I'm worried that I fucked up somewhere causing more problems.

It could be a CPU thing. You don't happen to have another one lying around do you?

Only if I absolutely have to -- I'd have to buy it and its a X99 system.

Yeah that's a tough situation. I honestly don't really know what else there is I can say. I would've said maybe it's a chipset thing, but since it's consistent across 2 motherboards that makes me think otherwise. If you can somehow get it booted into windows or Fedora you could run the Intel CPU Diagnostic Tool and that'd probably tell you but the odds of that happening look slim.

It seems to be pretty consistent in crashing before that's possible. But it very unlikely RAM, very unlikely disks or anything on motherboard, video card seems OK. But then, the CPU is a Xeon 2620 v3, so that shouldn't be it either but I'm not sure what else it could be.

Yeah I'm really not sure what else you can do. Unless you have a friend that has an X99 compatible CPU that you can test.

Did you try to boot a live OS?
The next step you should do in my opinion is boot the system with just one stick of RAM in each slot (if you have four slots place the stick in the slot 1 and boot, see what happens and repeat for all the slots). and see if you get different results (using the live OS always and keeping the drives disconnected).
If nothing changes just do a BIOS reset and see if the issue is still happening.
Try to get your hands on a different kit of RAM that's surely working (borrow it from a friend maybe) and repeat the process I described you before.
If nothing is changing again try to swap out the power supply, but that's something I would do just for the sake of being sure the power supply is healthy.
The last thing would be swapping out the CPU, which might be the issue (faulty memory controller).

I've spent enough time on this to where I've just taken it to a shop.

Yeah man, I know how it feels to troubleshoot a system that's acting that weird. Where I live shops are not reliable at all and I wouldn't trust anyone but myself and a friend to handle all the hardware I bought with hard earned money. I hope you're going to get to a solution.

It's a ECC exception, MCE 5 means "Internal Parity Error" (see Intel® 64 and IA-32 Architectures Software Developer’s Manual volume 3, chapter 15.9.1, table 15.8). It's "recoverable" - correctable ECC error. Not sure why Windows wants to reboot, unless there's uncorrectable errors as well.
Swap the memory modules between channel 3 dimm 0 and some other one (from another channel). If the error moves with the memory module - it's memory module's problem. If it stays - it's a memory controller's (i.e. CPU) problem.

Edit: I wouldn't trust memtest in this case as I'm not sure if it fully supports ECC.

If its recoverable, wouldn't both OSs compensate and throw a non-fatal error?

They would, and I'm an idiot. Seen the "recoverable" and thought "yeah, it's totally correctable error", being too lazy to follow through with the proper diagnostics. Call it professional deformation: if customer says "my server rebooted/froze", and there are ECC errors in the log, you don't dig deep. As tac/outsource/support usually goes, it's safe to assume that either uncorrectable error wasn't logged or correctable error happened in some sensitive ram area (ECC takes time), so the priority is to quickly check which component's fault is this and simply replace it.
Aaanyway.
Took this from the screenshot:

CPU 1: Machine Check Exception: 5 Bank 7: fe04284000010093
TSC 978eeb92c0 ADDR 104f0e11c0 MISC 44022a286
PROCESSOR 0:306f2 TIME 1478378763 SOCKET 0 APIC 6 microcode 38

and fed it to mcelog --ascii:

Hardware event. This is not a software error.
CPU 1 BANK 7 TSC 978eeb92c0 
MISC 44022a286 ADDR 104f0e11c0 
TIME 1478378763 Sat Nov  5 23:46:03 2016
MCG status:RIPV MCIP 
MCi status:
Error overflow
Uncorrected error
Error enabled
MCi_MISC register valid
MCi_ADDR register valid
Processor context corrupt
MCA: MEMORY CONTROLLER RD_CHANNEL3_ERR
Transaction: Memory read error
STATUS fe04284000010093 MCGSTATUS 5
CPUID Vendor Intel Family 6 Model 63
SOCKET 0 APIC 6 microcode 38

Well, I wasn't entirely wrong. There is ECC syndrome set from previous errors, which means it was a single-bit, correctable error. But it becomes uncorrectable because of this bitch:

Processor context corrupt

When this happens, it's game over, ECC or no ECC.

Anyway, advice remains the same: swap sticks between different channels and see if the error moves. Also, install mcelog and feed it your whole syslog/dmesg, if you can (I may have missed something important because, again, lazy).

OK. I've never read or encountered this kind of error before, so this is useful to know. Thanks!