Threadripper virtualisation + storage homeserver - now a buid blog

Hello. I want to upgrade my homelab storage / virtualisation server with a system which supports many pcie lanes but I want to use the cheapest newish threadripper platform as possible.

What are your opinions?

Usage:
This server is mostly used for cold media storage (cd-images, movies, backup), local file sharing and virtualisation with 1 GPU passthrough and should feature hotswappable drive cages. I don’t need many cores, but I want the option to throw in pcie carrier cards with m.2 SSDs. I also would like to have 10GB NICs and IPMI. The server will be switched off when not in use, so a decentish boot time or remote management to see whats going on would be nice. I’ll run Proxmox as hypervisor and virtualised TrueNAS Core.

I want to be as cost effictive as possible. I looked around various marketplaces for old workstation / server HW but most options are really old, dodgy (I simply don’t trust users if it comes to handling epyc / TR CPU installations and I have already made out bend MB pins on some of the “it works great” photos). A lot of HW is outclassed by current gen desktop processors or simply overpriced for their age. At the end I want stuff to “simply work” and not worry about what the previous owner did with it or failed to do.

Case: SilverStone Technology RM43-320-RS - 700 Euro
CPU: AMD Ryzen Threadripper PRO 3945WX - I can get it new for 150 Euro
PSU: Be Quiet! 12 850W Pure Power (already have this)
Mainboard: ASRock WRX80 Creator R2.0 - 600 Euro
RAM: settled for Samsung UDIMM 32GB, M378A4G43AB2-CWE - yeah, its not ecc, I will most likley populate all slots (256GB in total). I just ordered 2 to complete my test build and to get my feet wet.
Cooler: went for a ARCTIC Freezer 4U-M
HDD: currently I have a couple of WD RED 8TB disks and a couple of toshiba enterprise 12TB disks in a mirrored zfs vdev configuration with a useable space of 30TB and 2x toshiba 20TB enterprise HDD I rotate as backup. I would bring this over to the new server and replace the disk with higher capacity ones later.
Possible GPUs to use: AMD RX 6900 XT and AMD RX 6700 (already have these)
As Mini-SAS Interface Card I would use these: 2x Supermicro AOC-S3008L-L8E SAS 8-Port 12Gb/s PCIe HBA IT-Mode w/ Bracket

Well, fuck me. Assembled CPU + mainboard + cooler and RAM tried a test run and the system won‘t even turn on. It has an internal power switch on the board I used. It behaves like it gets no power at all, no fans spinning, not even for a split second. I would assume that even with misbehaving RAM or a wrongly seated CPU it would at least output an error code on its error code LED (or try to spin up the fans for at least a split second). I already tried another PSU (850W) and the PSU I bought for this - Be quiet Dark Power pro 13 (1600W). Tried different outlets. The PSUs are working fine with other Desktop systems.

During install I had to put a lot of downward force on the CPU cooler mounting screws to get them to grab the threads, but that seems to be par curse for the Arctic Cooler Freezer 4U-M. I also checked every cable (Power, CPU-Power 1 & 2 multiple times). Do you have any suggestions how to troubleshoot any further? Tomorrow I will try reseating the CPU and inspecting the pins.

Edit: Checked the pins, they seem fine.



1 Like

Reseated the CPU and memory and tried running it with one stick in the correct slot. This didn’t change anything. My two sticks are on the QVL. (I bought them for the test - the plan was to repurpose the 64GB and order a full 256GB memory kit later - thankfully I didn’t).

@wendell
Can you tell me how a proper start of the system looks like?
Like on pre-poweron: should I expect some LEDs or the mainboard on the internal MB-powerswitch to light up if it gets power but is not switched “on” yet.
What happens if you press on, does the debug LED immediately come to life and do the internal NVME / VRM fans spin up? I am currently on a loss here.

1 Like

After powering on the PSU, the green BMC_LED1 located between PCIE3&PCIE4 should light up and begin blinking within one minute.
If this doesn’t happen, it likely indicates a problem with the board.
Also, yes-a first power-up should always result in the debug LED showing a code and the fans spinning, regardless of which connector is used.

2 Likes

Thank you, then it might be a DOA. Well, off to vendor support I go. Next time I‘ll do a pre power-on test before I struggle with the CPU cooler :stuck_out_tongue:

My other fear might be, that these cheap 150 Euro tray Threadripper 3945WX might be fake. But I bought from a big server parts vendor in EU and the foto comparison do check out. (On reddit somebody had the same issue with a current gen TR 64 core and it turned out that somebody did a CPU swap and he got a first gen TR with a current gen headspreader - but this wouldn‘t make sense for a 150 Euro part.)

1 Like

Send the board back, return was very smooth. One email forth, describing the issue and what I tried to solve it, one email back with a return business stamp. Let’s see how it works out. I am still waiting for my torque tool to arrive then I don‘t have to guess the force I have to apply to the CPU bracket screws.

1 Like

is the motherboard used?

Nope, it’s new old stock.

1 Like

So, got a refund from the old board. Ordered a new one (100 Euro cheaper). In the meantime I got a torque screw driver and the (hopefully) official TR torque tool.

1 Like

So; I got a new board, assembly went much better, BMC works and is accessible on the network,
But now I am stuck with a 00 error (CPU not recognized) and the PC switches itself off a 1-2 seconds after start.

I am starting to get tired of this platform, including the eggshell walk while inserting and securing the CPU + cooler and I am only at the „mainboard on a table“ stage and still stuck with an unbootable system. Man, it can‘t be so hard to get a TR system up an running. I will try reseating the CPU tomorrow.

Can I do test runs (see it if posts and then switch it off) without CPU fan, or is the CPU-cooler pressure kinda required?

If it is a used CPU (it kinda seems sketchy, I noticed cooler marks on top of the CPU headspreader although it should be new), does the same happen if it is paired to a Lenovo system or does the boot process continue further in this case?

I had that issue and needed to update the bios for the 9975wx on an ASRock WRX90 board, but you have an older CPU? Worth looking at.
Also, 00 for me meant that BMC was online, what’s that say?

Interesting that this lack of power issue also affects the ASRock WRX80 Creator R2.0 motherboard for Threadripper not just Epyc asrock systems.

I have an ASRock Romed8-2t rev1 that exhibits the same lack of power issue. The green bmc light blinks at 1Hz but there is no power delivered to the ipmi network interface and the power on header does nothing. Tried an EVGA and Corsair power supplies from two working systems (24+8+4+4(pcie power) were connected) with no difference.

Tried with and without cpu/memory (ipmi should still get power without cpu/memory installed). I am assuming some SMD component is dead and needs to be replaced which I can do just don’t know which component yet.

2 Likes

The AMD Ryzen Threadripper PRO 3945WX should be supported by default without updates according to ASrock - it’s the oldest CPU you can put into this system.

I couldn’t find anything in the logs of the BMC regarding to the system start. The PC switches off as soon as I start it and flashes 00 twice. (Which usually means it can’t detect the CPU according to the internet, I think the second flash is a retry).

The other thing I noticed is is that I can’t turn the torque screw driver until it “ratchets”. I really have feeling of breaking the screws or threads (it’s already turning the whole board) if I force it. I have both the official one and an after market I set to 1,5 Nano meter. According to some guilds only very little force should be required to tighten CPU latch / holder.

1 Like

I would not say the oem amd torque wrench driver “ratchets” so much as it has a detent which pushes out a spring loaded bearing when you get to its minimum specified torque value.

There is a video (I can’t link URLs yet) on youtube called:
Teardown: AMD Ryzen ThreadRipper Torque Screw Driver
which shows its internals.

If you don’t mind a few scratches I would maybe recommend taking a pair of pliers/channel locks etc and grabbing on the torque wrench shaft and turning the handle to see if it “clicks into detent” which it should. That should hopefully provide some info on how much torque is required. (From my experience the torque wrench does require some force but not sure if it would rotate the entire board). I am also assuming you mean 1.5 Newton-meters not Nano-meters of torque?

1 Like

Yeah, thanks. I tried a pair of channel locks and it indeed does ratch, maybe I needed a little more force than I anticipated.

It might be that I am turning the screws to slowly- usually I am to gentle with my HW.

Edit: yes newton meters

Nice, I would maybe try that if you are comfortable with it. If I had to guess 1.5Nm is the lower bound of the oem torque wrench as the springs would vary in stiffness from wrench to wrench.

I would also maybe recommend getting the three screws started then try to wiggle the cpu around a bit to see if it is fully seated in its socket. I had an issue yesterday with my Epyc in its SP3 socket where the cpu and carrier were down and “latched” but the carrier was caught on its frame so the cpu was not fully seated until wiggled which caused it to “drop” a little. Then I did a star pattern tightening the screws until almost snug then did the 1->2->3 torque spec with the wrench.

Edit: other potential problem I have seen (maybe not for new hardware) is the plastic carrier for the cpu itself could be mounted backwards which causes the wrong pins to match with the socket. Might want to check a reference review image just as a double check.

God point about the carrier. I noticed that the CPU was partly unclipped due to transport (just two clips) - I already fixed this. I’ll check the orientation of the carrier as well.

So, I reseated the CPU, still no dice. I still get a 00 error. But it seems to live 0,5 Seconds longer. The onboard fan spins a bit longer with a „whoosh“

What I did so far:

  • used the official torque wrench till it clicked
  • checked MB pins, fine so far
  • checked carrier - CPU is seated fine and snaps in and is aligned to the rail
  • CPU lies flat, doesn‘t fall onto the pins after trying to wiggle it in its resting position (bracket closed)
  • checked carrier and CPU orientation, carrier and CPU orientation are the same (triangle on the CPU matches the triangle on the carrier)
  • checked all PSU connections, all cables are seated properly

What I will do

  • Firmware UEFI Update, dunno if there are older / newer versions / revisions of this CPU
  • run one stick of RAM
  • try all RAM slots, sometimes the manual is wrong
  • check the downside of the CPU - I didn‘t touch it, but maybe someone else did , it didn‘t come with a plastic carrier and marks on top which is highly sus
  • I order another CPU from another vendor, its the cheapest thing to test besides the RAM

BMC reports a CPU hang:

1 Like

Hmm, the firmware update is worth a try for sure. I assume you have already tried clearing the CMOS after the cpu re-seat?

The other possibility you mentioned above is it could be vendor locked, but I would expect it to at least go through some of the post codes first before failing though unsure if threadripper is different than normal ryzen.

OEM Threadrippers are vendor-locked. Lenovo, Dell, etc…for “customer safety” reasons :wink:
Threadripper is basically a modified EPYC, certainly not similar to Ryzen except for the cores themselves.