Disabling ASLR is working a little too well. Which is odd, given that it did next to nothing for ryzen-kill gcc builds. Usually failed at around 2/6 or 28hours.
But I fear it’s only moved the criteria for a problem to occur further away.
But yeah. It’s usually the Instruction Pointer that at some point goes wildly off track.
I should add. I have 2 Ryzen 1700X systems.
My 1 CPU basically has all of the problems. Segfaults,MCE,Reboots,Freezes,VTT drop. Where as my other 1700X has none of those.
3 of those are solved by just manually setting RAM timings + Voltages to what the stock should be.
However the Segfaults and VTT drop under load are something else though.
So, I’m too tired/lazy/stupid to figure out the multithreaded way right now but I can recreate the error with Pamac even on those fixed voltages up to 1.375 outlined above. (I’m not gonna push 1.4v or more into a 65W TDP CPU) So yeah, gonna start the RMA.
Actually not the worst time to be honest. The rig is currently still open and I will take this chance to downgrade to a 1600X. I am hoping to get the same gaming + recording performance as with my 1700 when overclocked but with a little less heat.
@catsay I will gladly run your script when I’m not tired/lazy anymore.
Yeah I’ll get to it at some point. Probably tomorrow evening or thursday morning.
Currently tired as hell too and slugging through some obscure as hell CPU doc I found.
Hopefully AMD has an RMA chip for me at some point, but it seems like they are real busy there.
The instructions for Fedora on the nav panel are for 21-22. On Fedora 26, I get something similar to iostream not found, fatal error . The dependencies do not compile successfully so I can’t even try compiling Blender.
Blender is a huge project. Installing all of those dependencies to do the Ryzen test with is error prone and causes all sorts of other issues. For example…
Trying to install all the dependencies actually took a very long time (an hour or so) while some part of the dependency script updated the entire Fedora 26 RTM installation, downloading hundreds of MB from the interwebs while doing so, and even installed a new kernel. This is bad for people with terrible internet speeds and for those just wanting a small reliable test. The kill-ryzen-sh GCC test does this.
GCC Test #
Fedora 26 Kernel
RAM
failure times (s)
1
Ubuntu VM
24 GB
380
2
RTM
32 GB
240
3
RTM
32 GB
240
4
RTM
32 GB
280
5
RTM
32 GB
42
6
RTM
32 GB
239
7
RTM
32 GB
242
8
RTM
16 GB
298
9
RTM
16 GB
288
10
RTM
32 GB
239
11
RTM
32 GB
239
12
RTM
16 GB
180
13
New
32 GB
948
14
New
32 GB
1525
15
RTM
32 GB
2842
16
RTM
32 GB
~180s (Kernel Panic)
17
New
32 GB
no crash (2 hrs)
18
RTM
32 GB
3638 (1hr)
RTM means 4.11.8-300.fc26.x86_64
New means 4.12.9-300.fc26.x86_64
Ubuntu VM means on a Windows host
Conclusion: It should be pretty obvious when updates were applied by looking at the test #'s and failure times. Using a newer kernel/software only masks the problem. This is not a software issue. Mitigations at the software-level can only prologue the expected time to failure. The chips are still defective and need to RMA’d.
If Blender requires/defaults to a newer kernel/updated packages, that might be why the GCC test does not fail in a timely manner. I would advise against creating a kill-ryzen-sh script based on Blender since the installing dependencies process takes a long time, especially for users with poor net speeds, and newer kernels/software (installed by default) only mask the issue.
If you cannot get the GCC test to “work”, try using a fresh install, no updates.
Just checked a picture I took when I got my Ryzen and i turns out, it might be a faulty one
Any idea how long the RMA process takes (I haven’t tested it yet though).
Can confirm, I just used the Manjaro GUI packet manager to start the build. On fixed settings I get Error 2 or segfault at around 5-7% of the process. And that is with all dependencies already done. Just for lolz I did the same thing on my core-m skylake notebook and it went through like a charm.
I intent to. Even though I don’t have any problems now, I’d rather RMA my CPU while I still can, before I’m facing any issues.
That being said, is there some kind of a date up until RMAs of this kind are accepted? Or does it void when the warranty expires?
I specifically cited this issue, and my batch number, and the RMA was approved “just like that.” Also, my replacement was shipped as soon as I dropped off the box and gave them the FedEx tracking number (or so I was told).
However, from what I understand, none of this is “official.” Not sure the if even the defect itself is “official.”
You’d probably have a much harder time of it if you waited so long. Personally, I’d return it now; but up to you to decide if you’d rather leave well-enough alone.
I’m aware of that, besides I wouldn’t want to replace a working CPU
I tried to, but as it turns out there might be some problems…
Edit: You mentioned something about testing it in a VM. Is it enough to download and install Fedora 26 in VirtualBox, grant the guest all cores, and run the above test?
I had something like that once, too. But I was testing out settings for CPU and memory so I just thought whatever and hit reset. Never saw it again but it might be an indicator… maybe?
Thought I’d ask the obvious question here, since I read somewhere that Threadripper is based on multiple 1800X cores ‘glued’ together ( I may be tad off there) - in any case, has anyone tried similar testing on the 1920X/1950X?
Clarification At least what I meant to say was, has anyone seen seg-faults in TR, similar to this - irrespective of fabrication date?
Well, I didn’t really change anything to begin with, except enabling virtualisation support (SMD? something like that?). I don’t know why ASRock keeps disabling it, I need to enable it after every BIOS update.