Page 1 of 1

GPUGrid bad batch of jobs

Posted: Wed Oct 07, 2020 1:56 pm
by chriscambridge
In the last half hour every host is getting computational errors so I would say that a bad batch of jobs has been added to the GPUGrid queue.

If you are running GPUGrid at this time you may want to check your hosts.

Re: GPUGrid bad batch of jobs

Posted: Wed Oct 07, 2020 2:21 pm
by chriscambridge
Other people are also now getting these failed tasks:

https://www.gpugrid.net/forum_thread.php?id=5179

Re: GPUGrid bad batch of jobs

Posted: Wed Oct 07, 2020 3:58 pm
by Woodles
Thanks Chris. I only had one GPU on it but failing tasks here as well :(

Oh well, it's been a while since I did any Collatz :)

Re: GPUGrid bad batch of jobs

Posted: Thu Oct 08, 2020 4:43 pm
by chriscambridge
GPUGrid has now been temporarily suspended.

The project will be offline until the app can be rebuilt, however uploads are still working for outstanding tasks that were completed before the issue.

https://www.gpugrid.net/forum_thread.php?id=5180#55462

From what I can gather ACEMD license wasnt renewed and this is causing all tasks to fail.

https://www.gpugrid.net/forum_thread.php?id=4970
https://www.gpugrid.net/forum_thread.php?id=5179

Re: GPUGrid bad batch of jobs

Posted: Fri Oct 09, 2020 9:31 pm
by chriscambridge
Dears, thanks for your patience.

I updated the acemd3 apps.

Also, I verified that there were very few results by apps from the old CUDA version, so I won't be re-deploying them. In other words, apps now require CUDA 10 (Linux) and CUDA 10.1 (Windows). In terms of drivers versions:

CUDA 10.1 (10.1.105) >= 418.39 for Windows
CUDA 10.0 (10.0.130) >= 410.48 for Linux

https://docs.nvidia.com/deploy/cuda-com ... index.html
https://www.gpugrid.net/forum_thread.php?id=5183#55495

Re: GPUGrid bad batch of jobs

Posted: Sat Oct 10, 2020 11:30 am
by Woodles
Thanks Chris, I'll check which drivers I'm using, they seemed to be alright before.

Re: GPUGrid bad batch of jobs

Posted: Sun Oct 11, 2020 12:20 pm
by chriscambridge
I can't remember if your running Windows or Linux (or perhaps both), but the latest Nvidia Linux (Mint) driver is 450.

I'm running both 435 and 450 and they both work fine on GPUGrid.

Something I did notice in the latest Mint update was there are now 2 different nvidia drivers (per version):

eg: nvidia-driver-450 and nvidia-driver-450-server

I did a quick Google but I couldn't really find anything that explained the difference between the two.

The server version was "recommended" in Mint, but knowing how Nvidia is with GTX/RTX drivers in servers/data centers, I avoided it and just stuck with the normal non-server version.

Re: GPUGrid bad batch of jobs

Posted: Sun Oct 11, 2020 1:22 pm
by Woodles
Mainly Linux with 418.56 up to 455.23 plus a few Windows on 430.86 to 445.75

Looks like I should be alright but I'll update them anyway once the sprint is over.