So 319,395 WU's to goHappy May Day. Unfortunately for us it's been "Mayday! Mayday!" At 4:43 (PDT) this morning, our science database machine, thumper, became hasenfeffer. It currently refuses to acknowledge that it has any disk drives. Since the controllers are attached to the motherboard, major repairs will probably be required. No work can be created until this machine gets fixed. We are on the phone with Sun now in hopes of securing repair or a replacement.
Massive SETI server failure - Now fixed
-
- Posts: 3790
- Joined: Mon Mar 13, 2006 12:00 am
Massive SETI server failure - Now fixed
Last edited by UBT - Halifax-lad on Mon May 14, 2007 5:56 pm, edited 3 times in total.
-
- Posts: 3790
- Joined: Mon Mar 13, 2006 12:00 am
More news, and the work is going fast, note the developers say we could be in for a long dry period
This was one of those days. Sometime in the early morning MySQL on sidious crashed and rebooted itself. It had minor indigestion and restarted on its own just fine. Eric had to restart the BOINC projects to clean the pipes.
But when I came in I found Eric dissecting our master database server, thumper. That's never a good sign. He and Jeff informed me that it lost the ability to see any of its internal drives. Tests throughout the day confirmed that diagnosis - there's something dead between the power supply and the disk controllers so the drives don't even spin up. Booting from a DVD and an "fdisk" shows nothing. This system has a "preliminary" motherboard, which is one of the reasons we got it for free, but it has no hardware support.
Meanwhile I went ahead with the usual database backup/compression while we figured out what the heck we're gonna do. We're pretty confident the data is intact and as long as some server somewhere can mount the 24 SATA drives the make up the database the SETI@home science data will be perfectly intact. Failing that, we can recover from tape but unfortunately we're at a bad point in the backup cycle so the most recent tape is a week old.
Since data loss is most likely not an issue, the upshot of thumper being down is that we can't run the splitters or the assimilators. I just restarted the scheduler, but we only had about 300,000 results to process. I checked again just now and it's already down to about 281,000. Brace yourselves for a long outage. We placed many phone calls and asked for favors but so far no sure immediate solution presented itself. We have some leads, but does anybody have a 64-bit multiprocessor 8+GB system with 24 SATA drive bays they can loan us within the next 24-48 hours?
- Matt
-
- Posts: 3790
- Joined: Mon Mar 13, 2006 12:00 am
-
- Posts: 1630
- Joined: Mon Nov 06, 2006 12:00 am
-
- Posts: 3790
- Joined: Mon Mar 13, 2006 12:00 am
-
- Posts: 1630
- Joined: Mon Nov 06, 2006 12:00 am
-
- Posts: 3790
- Joined: Mon Mar 13, 2006 12:00 am
Astropulse is kicking out work too, its just swamped with requests too seen as though its on the same servers.
Word of warning Astropulse runs slow on Windows, it may miss deadlines too but Eric has said manual credits would be granted for any that happen to fail the deadline but then complete afterwards
Still plenty of enhanced work on this project too, which is just the same as on normal SETI project
Word of warning Astropulse runs slow on Windows, it may miss deadlines too but Eric has said manual credits would be granted for any that happen to fail the deadline but then complete afterwards

Still plenty of enhanced work on this project too, which is just the same as on normal SETI project
-
- UBT Forum Admin
- Posts: 9725
- Joined: Mon Mar 13, 2006 12:00 am
- Location: NW Midlands
- Contact:
Hi all,Darren wrote:And the well has run dry....
Clearly, the SETI server problem is still on-going and doesn't look like getting any better quickly, unless you have a Sun X4500 server lying around !!
These things normally cost $48,000 each (but are on special offer now for just $24,000 - see here: http://www.sun.com/servers/x64/x4500/).
In the meantime, may I be so bold as to suggest you consider crunching for one of the other projects?
see this link for the "Attach to project" URL's of various other worthy projects - http://forum.ukboincteam.com/viewtopic.php?t=1700
There's plenty to choose from and most have work available...
regards
Tim
-
- Posts: 3790
- Joined: Mon Mar 13, 2006 12:00 am
Great news! Sun Microsystems is coming to the rescue and will be replacing our inoperative science data base server. They are preparing the machine now and will be rushing it to us on Monday. Once we have the machine up and the database recovered, we can start sending work out again. Details on the server crash and our recovery from it can be found in Technical News
-
- UBT Forum Admin
- Posts: 9725
- Joined: Mon Mar 13, 2006 12:00 am
- Location: NW Midlands
- Contact:
Congrats to "Sun" then for becoming the "hero" in this.UBT - Halifax--lad wrote:Great news! Sun Microsystems is coming to the rescue and will be replacing our inoperative science data base server. They are preparing the machine now and will be rushing it to us on Monday. Once we have the machine up and the database recovered, we can start sending work out again. Details on the server crash and our recovery from it can be found in Technical News
Think that's gotta be worth a link to them:
http://www.sun.com
regards,
Tim
Yep, well done Sun 
Most of my machines have now run out of Seti WUs and have been moved over to either Einstein or WEP with a smattering of TMRL & Riesel.
As they reckon they won't get the new server till late monday, I can't see we'll be getting any new work till tuesday at the earlest and maybe even wednesday.
Oh well, at least it gives the other projects a look in

Most of my machines have now run out of Seti WUs and have been moved over to either Einstein or WEP with a smattering of TMRL & Riesel.
As they reckon they won't get the new server till late monday, I can't see we'll be getting any new work till tuesday at the earlest and maybe even wednesday.
Oh well, at least it gives the other projects a look in

-
- Posts: 3790
- Joined: Mon Mar 13, 2006 12:00 am
Update: Our science database server died on Tuesday, May 1st. We haven't been able to create new workunits since then (though we are still accepting completed results). Sun is graciously replacing this server. The bad news is, despite earlier claims, it won't be here until Friday the 11th, which means the earliest we'll be creating new work is Monday the 14th. Thank you for your continued patience! Updates, discussion, and more information about this and other server-related topics can be found in Technical News
-
- Posts: 3790
- Joined: Mon Mar 13, 2006 12:00 am
more or less fixed now
Update: We got the new server yesterday, inserted our old disks and booted it up. It came right up, but verifying the file systems took overnight. The work is being created, the splitters and assimilators are working. It will be a while before we catch up. Thank you for your continued patience and support.
-
- Posts: 3790
- Joined: Mon Mar 13, 2006 12:00 am