In the last half hour every host is getting computational errors so I would say that a bad batch of jobs has been added to the GPUGrid queue.
If you are running GPUGrid at this time you may want to check your hosts.
GPUGrid bad batch of jobs
-
- UBT Contributor
- Posts: 2178
- Joined: Mon Aug 08, 2016 1:56 pm
- Location: UK
GPUGrid bad batch of jobs
Last edited by chriscambridge on Wed Oct 07, 2020 4:38 pm, edited 1 time in total.
-
- UBT Contributor
- Posts: 2178
- Joined: Mon Aug 08, 2016 1:56 pm
- Location: UK
Re: GPUGrid bad batch of jobs
Other people are also now getting these failed tasks:
https://www.gpugrid.net/forum_thread.php?id=5179
https://www.gpugrid.net/forum_thread.php?id=5179
Re: GPUGrid bad batch of jobs
Thanks Chris. I only had one GPU on it but failing tasks here as well 
Oh well, it's been a while since I did any Collatz

Oh well, it's been a while since I did any Collatz

-
- UBT Contributor
- Posts: 2178
- Joined: Mon Aug 08, 2016 1:56 pm
- Location: UK
Re: GPUGrid bad batch of jobs
GPUGrid has now been temporarily suspended.
The project will be offline until the app can be rebuilt, however uploads are still working for outstanding tasks that were completed before the issue.
https://www.gpugrid.net/forum_thread.php?id=5180#55462
From what I can gather ACEMD license wasnt renewed and this is causing all tasks to fail.
https://www.gpugrid.net/forum_thread.php?id=4970
https://www.gpugrid.net/forum_thread.php?id=5179
The project will be offline until the app can be rebuilt, however uploads are still working for outstanding tasks that were completed before the issue.
https://www.gpugrid.net/forum_thread.php?id=5180#55462
From what I can gather ACEMD license wasnt renewed and this is causing all tasks to fail.
https://www.gpugrid.net/forum_thread.php?id=4970
https://www.gpugrid.net/forum_thread.php?id=5179
-
- UBT Contributor
- Posts: 2178
- Joined: Mon Aug 08, 2016 1:56 pm
- Location: UK
Re: GPUGrid bad batch of jobs
https://www.gpugrid.net/forum_thread.php?id=5183#55495Dears, thanks for your patience.
I updated the acemd3 apps.
Also, I verified that there were very few results by apps from the old CUDA version, so I won't be re-deploying them. In other words, apps now require CUDA 10 (Linux) and CUDA 10.1 (Windows). In terms of drivers versions:
CUDA 10.1 (10.1.105) >= 418.39 for Windows
CUDA 10.0 (10.0.130) >= 410.48 for Linux
https://docs.nvidia.com/deploy/cuda-com ... index.html
Re: GPUGrid bad batch of jobs
Thanks Chris, I'll check which drivers I'm using, they seemed to be alright before.
-
- UBT Contributor
- Posts: 2178
- Joined: Mon Aug 08, 2016 1:56 pm
- Location: UK
Re: GPUGrid bad batch of jobs
I can't remember if your running Windows or Linux (or perhaps both), but the latest Nvidia Linux (Mint) driver is 450.
I'm running both 435 and 450 and they both work fine on GPUGrid.
Something I did notice in the latest Mint update was there are now 2 different nvidia drivers (per version):
eg: nvidia-driver-450 and nvidia-driver-450-server
I did a quick Google but I couldn't really find anything that explained the difference between the two.
The server version was "recommended" in Mint, but knowing how Nvidia is with GTX/RTX drivers in servers/data centers, I avoided it and just stuck with the normal non-server version.
I'm running both 435 and 450 and they both work fine on GPUGrid.
Something I did notice in the latest Mint update was there are now 2 different nvidia drivers (per version):
eg: nvidia-driver-450 and nvidia-driver-450-server
I did a quick Google but I couldn't really find anything that explained the difference between the two.
The server version was "recommended" in Mint, but knowing how Nvidia is with GTX/RTX drivers in servers/data centers, I avoided it and just stuck with the normal non-server version.
Re: GPUGrid bad batch of jobs
Mainly Linux with 418.56 up to 455.23 plus a few Windows on 430.86 to 445.75
Looks like I should be alright but I'll update them anyway once the sprint is over.
Looks like I should be alright but I'll update them anyway once the sprint is over.