Restart ehtereum miner after error without exit

I'm running the Genoil cpp-ethereum miner (https://github.com/Genoil/cpp-ethereum) on a VM with GPU passthrough and overclocking and it runs mostly stable with a memory clock offset of 1500 MHz (this yield 32 MH/s on a GTX 1070) but occasionally it outputs an error and stops hashing but doesn't exit.

Cuda error in func 'search' at line 346 : an illegal memory access was encountered.

I'm wondering what would be a good approach here. I can check the output for that string, kill the process and start over or check GPU and if utilization drops in consecutive tests and then restart. Etherminer outputs about 5 lines every second.

Anyway, I guess I almost answered my question here but what I don't like is doing 5 checks every second or doing the GPU check which could trigger due to network failure and probably more reasons so I guess I'm asking if people have any other suggestions for how to do this more elegantly?

I usually run the miner in screen, as I like to be able to bring it back up and have the highlighting work but I guess that's a luxury I could drop.

Have you tried dropping the clocks on the VRAM? I mined litecoin back in the day and remember that my mining clients were very sensitive to VRAM and RAM clock speeds.

I could probably get it more stable but a few more MH/s is worth a crash per day as long as the process gets going again shortly after it stops.

I haven't done anything with RAM clockand probably can't do much either, it's a server/workstation motherboard and ECC memory + Xeon CPU.

If you have a distro with systemd and can tell the program to quit on failure/error, then you could use systemd's restart functionality:

Restart=: This indicates the circumstances under which systemd will attempt to automatically restart the service. This can be set to values like “always”, “on-success”, “on-failure”, “on-abnormal”, “on-abort”, or “on-watchdog”. These will trigger a restart according to the way that the service was stopped.

Source

If the software does not support exiting on CUDA errors then ask for the feature in their issue tracker.

1 Like

Thanks, it's an interesting suggestion. I'd have to put in a feature request or find it in code and change it but right now I'll go with a wrapper script. Will post here if it works for reference.