Page 1 of 1

Rosetta servers down?

Posted: Thu Dec 09, 2010 10:47 am
by Jeffers
I've noticed that the Rosetta servers look to be down. Completed tasks are failing to upload and I get a site not available message for boinc.bakerlab.org/rosetta

Anyone know what's happening there?

Posted: Thu Dec 09, 2010 5:36 pm
by Rotwang1985
no idea whats happening mate. Rosetta have a twitter account now but don't use it for service announcements, just publicity type stuff - http://twitter.com/rosettaathome

The whole boinc.bakerlab site seems to be down.

Its annoying cos its the only project I run so I'm just wasting CPU cycles at the minute lol

Posted: Fri Dec 10, 2010 8:19 am
by Rotwang1985
looks like she's up and running as of sometime after midnight:
  • Thu  9 Dec 23:54:59 2010 | rosetta@home | Message from rosetta@home: Project is temporarily shut down for maintenance
    Fri 10 Dec 00:55:01 2010 | rosetta@home | Sending scheduler request: To report completed tasks.
    Fri 10 Dec 00:55:01 2010 | rosetta@home | Reporting 1 completed tasks, requesting new tasks for CPU
    Fri 10 Dec 00:55:04 2010 | rosetta@home | Scheduler request completed: got 1 new tasks
    Fri 10 Dec 00:55:06 2010 | rosetta@home | Started download of input_mem_prog_run05_centroid_round01_E_subrun_000002_yfsong.zip
Explanation from Rosetta@Home website:
News  
Dec 9, 2010
Today's heros are Keith and Darwin, our systems administrators and hardware architects. Yesterday, our main filesystem crashed hard. There were warning lights flashing behind every disk on the SAN and it looked pretty grim. Thankfully, Keith and Darwin were able to pinpoint the problem to two redundant laser modules for the fiber optic loops (it was amazing and unlucky that both failed). The laser modules have since been replaced and the filesystem is back up. We'll be starting up the project again shortly. Thanks for your patience.
Check website for more info http://boinc.bakerlab.org/rosetta/

Posted: Fri Dec 10, 2010 9:19 am
by Jeffers
Yep, I've been able to upload the outstanding completed tasks. No new tasks yet, though....

Posted: Tue Jan 04, 2011 10:47 am
by matt40k
Is it down again?

Posted: Tue Jan 04, 2011 10:54 am
by Rotwang1985
Annoyingly its down at the minute, I'm almost out of work ....

I've started using this site to answer that age old question of 'Is it just me?'

http://www.isup.me[/quote]

Posted: Tue Jan 04, 2011 5:16 pm
by Rotwang1985
Managed to report tasks and download new work.

Posted: Tue Jan 04, 2011 8:40 pm
by matt40k
Looks back up, no post about why it went down.

My stats haven't updated publically, I've got over 60k on Rosetta website, but only 40k publically. Maybe they're having a laugh with me ;)

Posted: Tue Jan 04, 2011 8:47 pm
by Rotwang1985
It seems there are some problems with the various servers/systems that administer the Rosetta project.

http://boinc.bakerlab.org/rosetta/rah_status.php

I'll be honest, I love the Rosetta project and its goals, but every time there's  US school/public holiday it goes offline.

I imagine something along the lines of a gerbil powered server that doesn't get fed when no one is there and dies screwing the whole system, only to be replaced by another gerbil :S

As far as someone telling the users there is a problem with the equipment they never do until afterwards; "Sorry for outage, RAID Backup Widget server PSU failed" etc. They have a twitter feed but only use it for publicity and good news, it would be nice if they had engineers just posting a quick note somewhere or they just want us to check to server status link above.

Keep Crunching !!!

Posted: Wed Jan 05, 2011 12:02 pm
by Jeffers
They look to be offline again at the moment. I've got completed WUs waiting to report but no current work from them.

Posted: Wed Jan 05, 2011 9:45 pm
by primalsole
Yes, same here. I hope they hurry up and get it sorted as they have been fairly unreliable over the last couple of weeks.

Posted: Wed Jan 05, 2011 11:17 pm
by matt40k
Looks like the kit also needs replacing, my works PE2850 are over 4/5yrs and have been replaced and PE2650 they use must be WAAY out of date, unless they didn't keep that webpage updated?

Shame it's down, yet again. Nice project, just needs some more love from them.

Posted: Thu Jan 06, 2011 11:00 pm
by matt40k
Update from R@H:

The project's fileserver has crashed. We're working to get things back online as soon as possible. Thanks for your patience. -KEL 01/06/2011

(So that's 06/01/2011)

Posted: Thu Jan 06, 2011 11:06 pm
by Rotwang1985
I've shutdown my headless Ubuntu cruncher until they get sorted, save a few leccy pennies.

If the fileserver is being rebuilt from a backup it might be done in the morning or someone might have to run to [insert US PC World equivalent] to get some kit.

Posted: Fri Jan 07, 2011 7:25 pm
by matt40k
It's back :)

Posted: Fri Jan 07, 2011 7:32 pm
by Rotwang1985
my communications to the project, via Boinc Manager, are still being deferred by 30mins ... :(

But the website is working and things would appear to be getting better.
http://boinc.bakerlab.org/rosetta/rah_status.php

Number 5 is alive!!!

Posted: Sat Jan 08, 2011 8:02 pm
by mik9dt
or rather Rosetta is a full strip of green at
http://boinc.bakerlab.org/rosetta/rah_status.php

Good news... :D

Posted: Sun Jan 09, 2011 7:53 am
by UBT - Rick Horn
New work has arrived, but all WUs say "download failed".  :(

Posted: Sun Jan 09, 2011 10:13 pm
by UBT - Rick Horn
New WUs are downloading and running normally.  :D

Posted: Sun Jan 09, 2011 10:20 pm
by Rotwang1985
I'm still struggling Rick, WU's ready to report but retrying in 10hrs ... grrr

Posted: Sun Jan 09, 2011 11:28 pm
by UBT - Rick Horn
Rotwang1985 wrote:I'm still struggling Rick, WU's ready to report but retrying in 10hrs ... grrr
At least things are improving slowly. I`m sure they will sort things out soon, we hope!  :roll:

Posted: Sun Jan 09, 2011 11:30 pm
by Jeffers
Still not working for me either.....

Posted: Mon Jan 10, 2011 5:45 pm
by Rotwang1985
from the Rosetta Homepge:
Jan 7, 2010
Well, our luck ran out. The SAN controller that has been causing so much trouble in the last few months finally tipped over in a rather distructive fashion, corrupting the binary tree on which the filesystem is based. We're trying to rebuild the thing but the sheer number of files in the filesystem (> 10M files) makes this process very, very slow. We're bringing the project up from a recent backup (12/09/10) but the backup wasn't a perfect replica of the environment, so we're having to scramble to get all the parts working together again. We only need a few more weeks and then our new, next generation SAN will be ready to be put into place... I just thought the old one would last a few more week. I apologize for the hassle and appreciate your patience as we get things online again... KEL 01/07/11
http://boinc.bakerlab.org/rosetta/

Posted: Tue Jan 11, 2011 8:33 am
by UBT - Rick Horn
Just started running Rosetta after a break of nearly 5 years.
Compared to Rosetta, WCG is positively bountiful.  :roll:

Edit: Aborted the rest of them.

Posted: Wed Jan 12, 2011 12:32 pm
by matt40k
Still got a nice back log of files to be uploaded, not uploading, no new data either, not good :(

EDIT:
http://boinc.bakerlab.org/rosetta/forum ... 5028#62879

Result = me gutted.

Posted: Wed Jan 12, 2011 5:04 pm
by UBT - Rick Horn
I had 3 WUs stuck as "uploading" for at least 4 days. I was tempted to delete them, but didn`t.
This morning, they uploaded themselves, so don`t be too disheartened, all may well be OK.

Posted: Thu Jan 13, 2011 8:37 pm
by Jeffers
They seem to be getting there, if rather slowly!
The completed WUs I had waiting have gone, but I've not had any new work from them.....

Posted: Fri Jan 14, 2011 6:15 pm
by matt40k
Seems to have uploaded the backlog, new work coming down.