Plex transcoding working awfully slow

1ncanus · October 18, 2018, 11:00pm

That’s very strange. I would almost suggest submitting that as a bug (if it hasn’t been submitted already…).

I usually let mine run in the background, which is probably why I never noticed the lack of hyperthreading use. I looked a bit and someone suggested running multiple instances in parallel, but I feel like it should be handled by the package maintainer. Hm. A project for another day, I suppose.

Glad you got it working!

Bitals · October 18, 2018, 11:08pm

Lolno. I was too fast. You won’t believe it, but again, without changing anything, I get “usable” and better than before, but still awfully slow Plex transcoding (up to a minute to start watching and waiting for transcoder to hold up every minute or so).
ffmpeg now gives me 15fps:

Any other ideas about what the hell is happening?

1ncanus · October 18, 2018, 11:22pm

My only remaining thought is that, when you get higher fps, the data is cached in memory. Ffmpeg will happily work with something in memory after it has recently been moved or copied.

That said, the only way to really test it would be to sync the data after moving it (if you are), unmounting the drive, remounting, and then trying ffmpeg. You will get slower speeds here.

Then copy the data and try again. You will get significantly higher speeds here. I would compare those numbers to what you’re seeing. If that’s the case, you just have a bottleneck somewhere. Possibly the source drive (it is a rotational, after all).

Ruffalo · October 18, 2018, 11:27pm

Check NUMA! Multi-socket machine, pretty unusual.

Bitals · October 18, 2018, 11:27pm

File was staying in single location all the time. And I don’t think that a new 7.2k rpm drive can’t provide enough reading speed for 24 fps but could have sustained 200mb/s read speed few days ago while I was copying data from it to my pc.

Bitals · October 18, 2018, 11:48pm

So far so good, but as life have taught me in the last 2 days - too early to be happy. I will watch until tomorrow.

Bitals · October 19, 2018, 1:40am

Not really. Transcoding slowed down again.

nx2l · October 19, 2018, 2:36am

my guess was going to be cpu clocking down from over heating… but maybe worth checking…

Bitals · October 19, 2018, 9:38am

Definitely not the case, hottest cpu core during stress-test is 55C.

nx2l · October 19, 2018, 2:49pm

try iostat running every second or two while its transcoding…

Bitals · October 19, 2018, 3:35pm

Now I am pretty sure it’s a memory-related issue. I’ve had problems with ECC correctable errors previously, but they looked like a bug to me, because I changed several RAM modules and memtest86 gave me 0 errors after roughly 24 hours of testing. But they continued to appear in syslog, so i changed memory mode to mirroring and lived with it, thinking I am safe, because there were no system hangs anymore (I actually have 32gb of RAM).
Now I’ve found out that first after reboot OVERFLOW area:DRAM err_code:0001:0092 error in syslog and transcoding slowing down happen pretty close in time, so likely transcoding slows down due to memory error correction.

Lol, does anyone have any ideas hot to fix this error without changing RAM, CPUs and motherboard?
Google gave me several results about similar looking bug bug with Supermicro and Dell motherboards in RedHat/CentOS. I haveDebian 9.5 and Asus Z9PE-D16, bought it completely unused and sealed, but still 5-6 years old, so it’s unlikely it was broken but no warranty obviously. And it is pretty expensive, I am not really willing to spend another 15-20k rub ($220-290) for same or equal motherboard without any confidence in success.

Errors spawn a lot when system is idling, so might be bits reversion. I am pretty frustrated right now.

Ruffalo · October 19, 2018, 3:52pm

You can try underclocking your memory and/or increasing its voltage. But really, that memory is probably junk.

Bitals · October 19, 2018, 3:58pm

I’ve got total of 48 gigs lying around, interchanged reported dimms, but they still report same issue. This MB does not allow tweaking any voltages, I can only go down to 1066 preset. Would try this.
Modules are hyunix HMT351V7BMR4C.

Bitals · October 19, 2018, 4:23pm

Official response from Dell in one of the threads:

The reason this occurs (and why blacklisting EDAC “resolves” the problem) is due to a bug in the EDAC module present in all major Linux distributions at this time. EDAC does not communicate properly with the Intel Node Manager on the latest generation Intel processors; this causes false error reporting whenever a variety of status triggers are met. Any time the processor increases or decreases clock speed or voltage to meet demands due to different loads, any thermal sensor check, HT being turned on or off, or several other things of this nature will cause this to occur.

We need to disable the edac modules to stop it from attempting to take over the hardware management features of the Lifecycle Controller and the BMC

Would try this as well.

Bitals · October 19, 2018, 6:00pm

Ok. 1066 freq, NUMA on, blacklisted edac modules.
2 hours of idling, everything looks good, transcoding works great. Would wait and watch, hopefully this is it. And hopefully the system would not crash randomly, proving edac was right about errors. I really wish this is over after all my “fixing”. ROFL in tears.

Ruffalo · October 19, 2018, 8:12pm

But we don’t know which one fixed your problem!

Actually the EDAC stuff was just cosmetic, so it shouldn’t be that. Either dropping the frequency or turning on NUMA did it.

Bitals · October 19, 2018, 11:16pm

NUMA was on since the beginning, I turned it off and it gave me nothing. You can’t really fix anything with NUMA I think, only by turning it OFF.
Either frequency or EDAC solved the issue. Blacklisting EDAC was reported to fix this for several guys and was suggested by Dell. According to info online, it tries to rule the memory controller and some other stuff in ex-north bridge, now integrated in CPU, and fails in it with Sandy Bridge. Reported at the time this CPUs were up to date, looks like was not fixed untill today.

Bitals · October 20, 2018, 5:43am

And it crashed overnight.
hardware error from apei generic hardware error source 1

Started memtest86 for 10 runs and gone away for a day. Will see what it gives me.

Bitals · October 22, 2018, 1:44am

Changed 4 modules again. Ran memtest86 10 times, 0 errors. Ran memtester under debian, 0 errors.
Would see if it survives another night.

Bitals · October 22, 2018, 10:45am

And it did, finally.
16h uptime:

But docker was stopped. Would now run it and wait again.