GPUGrid bad batch of jobs

Good at maths? Then why not figure on helping these projects - Gerasim, GPUGrid, Moo! Wrapper, NFS, NumberFields, ODLK, PrimeGrid, Rakesearch, SRBase
Post Reply
chriscambridge
Active UBT Contributor 1+ yr
Posts: 2178
Joined: Mon Aug 08, 2016 1:56 pm
Location: UK

GPUGrid bad batch of jobs

Post by chriscambridge »

In the last half hour every host is getting computational errors so I would say that a bad batch of jobs has been added to the GPUGrid queue.

If you are running GPUGrid at this time you may want to check your hosts.
Last edited by chriscambridge on Wed Oct 07, 2020 4:38 pm, edited 1 time in total.
chriscambridge
Active UBT Contributor 1+ yr
Posts: 2178
Joined: Mon Aug 08, 2016 1:56 pm
Location: UK

Re: GPUGrid bad batch of jobs

Post by chriscambridge »

Other people are also now getting these failed tasks:

https://www.gpugrid.net/forum_thread.php?id=5179
Woodles
UBT Contributor
Posts: 11757
Joined: Thu Dec 20, 2007 12:00 am
Location: Cambridgeshire

Re: GPUGrid bad batch of jobs

Post by Woodles »

Thanks Chris. I only had one GPU on it but failing tasks here as well :(

Oh well, it's been a while since I did any Collatz :)
chriscambridge
Active UBT Contributor 1+ yr
Posts: 2178
Joined: Mon Aug 08, 2016 1:56 pm
Location: UK

Re: GPUGrid bad batch of jobs

Post by chriscambridge »

GPUGrid has now been temporarily suspended.

The project will be offline until the app can be rebuilt, however uploads are still working for outstanding tasks that were completed before the issue.

https://www.gpugrid.net/forum_thread.php?id=5180#55462

From what I can gather ACEMD license wasnt renewed and this is causing all tasks to fail.

https://www.gpugrid.net/forum_thread.php?id=4970
https://www.gpugrid.net/forum_thread.php?id=5179
chriscambridge
Active UBT Contributor 1+ yr
Posts: 2178
Joined: Mon Aug 08, 2016 1:56 pm
Location: UK

Re: GPUGrid bad batch of jobs

Post by chriscambridge »

Dears, thanks for your patience.

I updated the acemd3 apps.

Also, I verified that there were very few results by apps from the old CUDA version, so I won't be re-deploying them. In other words, apps now require CUDA 10 (Linux) and CUDA 10.1 (Windows). In terms of drivers versions:

CUDA 10.1 (10.1.105) >= 418.39 for Windows
CUDA 10.0 (10.0.130) >= 410.48 for Linux

https://docs.nvidia.com/deploy/cuda-com ... index.html
https://www.gpugrid.net/forum_thread.php?id=5183#55495
Woodles
UBT Contributor
Posts: 11757
Joined: Thu Dec 20, 2007 12:00 am
Location: Cambridgeshire

Re: GPUGrid bad batch of jobs

Post by Woodles »

Thanks Chris, I'll check which drivers I'm using, they seemed to be alright before.
chriscambridge
Active UBT Contributor 1+ yr
Posts: 2178
Joined: Mon Aug 08, 2016 1:56 pm
Location: UK

Re: GPUGrid bad batch of jobs

Post by chriscambridge »

I can't remember if your running Windows or Linux (or perhaps both), but the latest Nvidia Linux (Mint) driver is 450.

I'm running both 435 and 450 and they both work fine on GPUGrid.

Something I did notice in the latest Mint update was there are now 2 different nvidia drivers (per version):

eg: nvidia-driver-450 and nvidia-driver-450-server

I did a quick Google but I couldn't really find anything that explained the difference between the two.

The server version was "recommended" in Mint, but knowing how Nvidia is with GTX/RTX drivers in servers/data centers, I avoided it and just stuck with the normal non-server version.
Woodles
UBT Contributor
Posts: 11757
Joined: Thu Dec 20, 2007 12:00 am
Location: Cambridgeshire

Re: GPUGrid bad batch of jobs

Post by Woodles »

Mainly Linux with 418.56 up to 455.23 plus a few Windows on 430.86 to 445.75

Looks like I should be alright but I'll update them anyway once the sprint is over.
Post Reply