I'm running the Genoil cpp-ethereum miner (https://github.com/Genoil/cpp-ethereum) on a VM with GPU passthrough and overclocking and it runs mostly stable with a memory clock offset of 1500 MHz (this yield 32 MH/s on a GTX 1070) but occasionally it outputs an error and stops hashing but doesn't exit.
Cuda error in func 'search' at line 346 : an illegal memory access was encountered.
I'm wondering what would be a good approach here. I can check the output for that string, kill the process and start over or check GPU and if utilization drops in consecutive tests and then restart. Etherminer outputs about 5 lines every second.
Anyway, I guess I almost answered my question here but what I don't like is doing 5 checks every second or doing the GPU check which could trigger due to network failure and probably more reasons so I guess I'm asking if people have any other suggestions for how to do this more elegantly?
I usually run the miner in screen, as I like to be able to bring it back up and have the highlighting work but I guess that's a luxury I could drop.
Have you tried dropping the clocks on the VRAM? I mined litecoin back in the day and remember that my mining clients were very sensitive to VRAM and RAM clock speeds.
If you have a distro with systemd and can tell the program to quit on failure/error, then you could use systemd's restart functionality:
Restart=: This indicates the circumstances under which systemd will attempt to automatically restart the service. This can be set to values like “always”, “on-success”, “on-failure”, “on-abnormal”, “on-abort”, or “on-watchdog”. These will trigger a restart according to the way that the service was stopped.
Thanks, it's an interesting suggestion. I'd have to put in a feature request or find it in code and change it but right now I'll go with a wrapper script. Will post here if it works for reference.