GPU issues

Having problems installing that new stick of memory? Found some great software or having issues with something? Or maybe want to chat about your PlayStation, X-Box, Nintendo, Sega, even your old Spectrum 48k....! Or maybe something you want to sell or acquire (computing related of course!). Let us know here...
Post Reply
Joshrandom
Posts: 5602
Joined: Sat Jun 23, 2007 1:00 am

GPU issues

Post by Joshrandom »

Over the past few weeks one of the cores on my HD5970s has been producing the occasional invalid result on Collatz, I was initially able to get around this problem by reducing the clock speeds down to default, but now the problem seems to have come back in a big way, and every result is being rejected. Reducing the clocks down to the lowest standard setting doesn't make any difference, nor does installing the latest catalyst software. The card appears to be running in the same mid 70s temperature range that it always has, and the software shows the fan is running at 34% (and yes there is hot air blowing out the exhaust). There are a few more things I can try, but before I do I thought I'd ask and see if anyone here has any constructive advice to offer first.

The card is used in my Q6600 machine which runs 64bit Ubuntu (Lucid) exclusively, I'm now using the 11.5 version of the Catalyst/driver software and the BOINC manager version is 6.10.58

The output of the other core on the card seems to be unaffected by whatever is happening with it's twin, all fans are working and there's nothing to suggest that the card has been overheating. For the time being I've told BOINC not to use the affected core, and that has stopped it generating any new invalid results, but if I had wanted an HD5850 then that's what I would have bought in the first place, lol.

Oh, and naturally this is the card that isn't covered by a valid warranty.  :roll:
Zydor
Posts: 437
Joined: Mon Nov 01, 2010 12:00 am

Post by Zydor »

Try the 11.5b Hotfix driver - it was posted up at AMD 15 mins after 11.5 was put up - arguably the fastest upgrade on record (!)

http://support.amd.com/us/kbarticles/Pa ... otfix.aspx

Clean out the driver fully with Driver Cleaner, don't risk leaving bits that may or may not get overwritten from an old install.

If there are still issues, try running it in other pure Maths Projects (that includes Milkyway, DNETC, Moo! Wrapper and PrimeGrid PPS Sieve).  

If it misbehaves on all those as well, with 11.5b installed ..... time for the dreaded card test if you are sure Main Memory is ok after running a Main Memory Test Routine.

Regards
Zy
Joshrandom
Posts: 5602
Joined: Sat Jun 23, 2007 1:00 am

Post by Joshrandom »

Thanks for the advice Zydor, but I'm pretty sure that the hotfix driver is only available for Windows, while I'm using Ubuntu on this particular machine (which I suspect also does for the Driver Cleaner part too). :)

I've thought about running it on Milkyway, and I may try that tomorrow. As for DNETC and Moo, well since they use a quorum of one I'm not sure that crunching their work units would really prove a great deal (and I don't like returning duff results, even if I do get credit for them). As for PG, well so far I've been shying away from experimenting with OpenCL under Linux, and this really doesn't seem like the right moment to start messing with it, lol, so I'll give that a miss for now.

I'll report back on the Milkyway tests when the results are in, then if the news continues to be bad I'll start thinking about taking more drastic measures.  :shock:
Zydor
Posts: 437
Joined: Mon Nov 01, 2010 12:00 am

Post by Zydor »

Ahhh ... hadnt twigged before as such that you are Linux.

Fingers crossed for you on the MW run and possible follow on test ..... I run 5970s and am well aware that this one could hurt ....  :(   

... good luck  :)

Regards
Zy
Joshrandom
Posts: 5602
Joined: Sat Jun 23, 2007 1:00 am

Post by Joshrandom »

Well I tested the problem system out on Milkyway, just before the project went down (I don't think there's a connection, lol), and there were no invalid results. The only odd thing that I did notice was that core 0 (the one producing the invalid results on Collatz) only seemed to be running at around 35% load, whereas core 1 was running at 96% despite having the same clock settings, but when I set up my other HD5970 powered PC (this GPU being both fully functional and fully warranted) to run Ubuntu, it produced similar figures, so I guess that that I'm not really that much further forward.  :scratch:

I guess that the next thing to try is a straight swap of the HD5970s, leaving everything else the same, and see if the problem moves with the GPU. However, I really don't want to disturb the fully functioning system if I don't have to (I had quite a job seating the GPU in there in the first place, and I imagine that it'll be quite an ordeal trying to extract it again), so if anyone has any other suggestions as what else I could try, I'd love to hear tthem.  :?
Zydor
Posts: 437
Joined: Mon Nov 01, 2010 12:00 am

Post by Zydor »

This is a real longshot ......  but has happened enough times to warrent thoughts and actions.  It could get messy with driver loading and all their fickle temperament :)

When Windows installs, it creates Devices for hundreds of things, and stuffs the details in the registry.  One of those things is the Card(s) you use.  The information recorded is specific to Windows, and provides the information needed to feed the Graphics card driver we know and battle with details of the Windows installation so it can do its thing.

The information is in strings of zero's and one's, total gobbledegook to us.  If that information in the registry gets corrupted in any way, all sorts of wierdness can ensue.  Its possible one or more of those Windows devices needed by the card driver has become corrupt.

To sort - if that is indeed the case, and its impossible know if it is other than "repair it" - you need to go to Device Manager right click the card device, choose "Uninstall", ignore the windows death threats (and possible blue screens).  That will get rid of all graphics card device driver entries in the registry relating to Windows itself.  AMD/NVidia driver see below ...

Then go to Control Panel, and uninstall the Graphics Cards.  Reboot.  Hit safe mode and use Driver Sweeper inside Safe Mode. Once you rebooted again, you should be at the desktop with a wierd 640x480 display and windows yelling to install AMD/NVidia drivers, don't let it.  

At that point check all AMD/NVidia driver bits are cleaned out as appropriate, and reload the AMD/NVidia Drivers.  Then all done, test out BOINC.

Doing all this forces Windows to go to its cache of system files and rebuild the Device Drivers for the Cards on reboot.  When the AMD/NVidia Drivers are loaded, they are loaded against a clean set of Windows Device Drivers,  so if they had become corrupt, you will have replaced with a fresh set.  If they were the problem, all should now run 0k .....  if ......  can't guarantee this is going to work in your case, the Device Drivers may well be ok, so all that will have happened is replace a good set with a good set  ....  

Regards
Zy
Zydor
Posts: 437
Joined: Mon Nov 01, 2010 12:00 am

Post by Zydor »

Just had a thought .... could be way off, doesnt often apply ... but .. what size is the PSU?  5970s are power hungry beasts particularly when overclocked.  When a PSU is stressed near or at maximum, wierdness ensues due to it down-volting trying to keep up with power demand.  

A 5970 more than doubles its power uptake when running at full stretch compared to idle/standard non crunching use.  As an overclock steadily increases, there comes a point when power uptake is enough to make a PSU groan in pain if its marginal on capacity.  Worth a power budget check if the PSU is 500w or less.

Regards
Zy
Joshrandom
Posts: 5602
Joined: Sat Jun 23, 2007 1:00 am

Post by Joshrandom »

You could be right about the driver issues Zydor, and while I would've tried what you suggest if I were running Windows, I'm not quite sure how to go about doing the Linux equivalent. That said though, the nature of the fault has been a gradual increase in error rate over the course of a few weeks, which isn't the sort of thing I'd expect to see with a driver issue, not that that rules it out though.

As for the PSU, well you're quite right about the need for plenty of power to keep these GPUs fed. When I installed the 5970 I also installed an 850w Antec power supply to go with it, a PSU which should be more than man enough for the job. Still the symptoms could indicate that it's the PSU that's struggling, so it's definitely something for me to keep in mind.

If I have time tomorrow I'll power my systems down and pull them apart (they're due a good clean anyway) reseat everything and see if that makes a difference, if not then I'll just have to risk swapping over the 5970s and see what that tells me.  :shock:
Zydor
Posts: 437
Joined: Mon Nov 01, 2010 12:00 am

Post by Zydor »

Keep forgetting you have Linux, sorry about that...

An 850w should be fine - thats what I have in my main box and thats with 2x5970s.  I am power limited, and can't push both right to the edge, really I should have a 1000w in there - not that I would want to push them that far frankly, with the costs of replacing burnt ones.  Nonetheless, an 850w running two of them at half to two thirds full power indicates you should not have power issues with an 850w running one, even if there is another card in there of another type.

You may be right in taking them apart - crunch3r does that to his cards once a year because of the gunge build up he finds inside when taking off the fan unit.  By all accounts he goes nuts with a toothbrush and soap and water rofl, then replaces the thermal paste on the heatsink. Braver man than I gungadin .... I had nightmares just sticking on the aftermarket coolers - dunking it all in soap and water is beyond my zone of courage by a mile  :wink:

Out of ideas now to be honest, all I can say is I feel your pain, having one of those beasts acting up is no fun.

Regards
Zy
Joshrandom
Posts: 5602
Joined: Sat Jun 23, 2007 1:00 am

Post by Joshrandom »

Ha, there is no way that I would risk dismantling either of my HD5970s (or even my trusty old HD3850) unless all possible hope of getting them working again had ended. As for soap and water, well I think I'll give that one a miss too, lol.

Anyway, I finally plucked up the courage to extract my working, and still under warranty, HD5970 from my other main cruncher and swap it for the suspect card in my Ubuntu machine. The results are pretty clear cut, the suspect HD5970 is definitely at fault.   :cry:

Oh well, I suppose I'll explore the possibility of repair, but I suspect that this is the end of the line for this card. Time to start saving up for a replacement or replacements.  :roll:
Zydor
Posts: 437
Joined: Mon Nov 01, 2010 12:00 am

Post by Zydor »

There are a number of test applications you can use to run on the suspect 5970 - worth a try as a last resort .... at least you'll know for sure

Re replacement - keep your eye open for the 6870x2  and  6970x2, former due out by end of July, the 6970x2 by circa mid-end August.

The specs are very tasty, the 6970x2 will without doubt outperform the 6990, although I suspect price will also be tasty rofl.

Howver, it seems the 6870x2 will be a "reasonable" price, and way outperform a 5970.  Both are definitely worth keeping an eye on if you are still into a High End card

Regards
Zy
Joshrandom
Posts: 5602
Joined: Sat Jun 23, 2007 1:00 am

Post by Joshrandom »

Thanks for your advice Zydor. :)

In the last few days the computer with the suspect 5970 has become increasingly unstable, with lots of graphical artefacts dancing across the screen and frequent crashes, since that has all gone now with the known good 5970 installed, there isn't really much doubt that the 5970 has a problem. Still, it might be worth looking a little deeper into it, which applications would you recommend trying?

As for a replacement, well I was thinking about getting a couple of price reduced 5870s, and seeing where that took me, although the cards you mentioned do look kind of interesting. That said though, isn't the 6850 a single precision card, and therefore no use on Milkyway (should I want to run it), and isn't the 6990 already supposed to be a 6970x2, or have I missed something there?  :?
Zydor
Posts: 437
Joined: Mon Nov 01, 2010 12:00 am

Post by Zydor »

Joshrandom wrote: ....Still, it might be worth looking a little deeper into it, which applications would you recommend trying?
Furmark - and that may cause a "ripple" amongst some :)

Furmark is the toughest test there is, it hammers a GPU - safely, but ....

You have to be careful with Furmark, and watch it like a hawk because the temperatures will shoot up, if it hits 100C, get out of it.  However don't be put off, if used sensibly its fine, its like anything else, abuse it and you head for trouble.  

Its stability test is very good, just watch the temperatures as it deliberately hits a card harder than would be the case in real life - hence the phrase, pass Furmark and it will do anything. The link below shows at the bottom of the page a salutary lesson in not watching it run - and a nice out for me as you can't say I didnt warn you  :wink:

http://www.ozone3d.net/benchmarks/fur/

There are others in the Utilities Section of ozone3d you might want to play with as well, used properly they are all safe.
As for a replacement, well I was thinking about getting a couple of price reduced 5870s, and seeing where that took me, although the cards you mentioned do look kind of interesting. That said though, isn't the 6850 a single precision card, and therefore no use on Milkyway (should I want to run it), and isn't the 6990 already supposed to be a 6970x2, or have I missed something there?  :?
6990 yes it is, but rather like the NVidia 590 AMD had to squeeze a 32nm design onto a 40nm PCB because the manufacturing process for 32nm GPU cards was cancelled.  The latter (about 15 months ago caused some ahrd swollowing in the Red and Green Teams.  AMD coped better because their design is basicly further advanced than NVidia's - however both had to rush to market, usual e-peen thing.

Now time has passed, AMD are coming out with more stable and better produced cards that will be very considerably better than the first 6XXX offerings.  At the moment NVidia have no answer for the ones coming out in 6-8 weeks, their next offering will not be until Q2 next yesr, by which time AMDs 28nm cards will be out there. (Red V Green War continues Rofl)

The 6990 does have two on it, true, however the engineering gone into refining a formal series of x2 cards has surpassed the 6990.  I personally suspect that the 6990 will at the end of the year be replaced with an updated 6990 on 28nm.

However, right now the 6870x2 is the first out, followed by 6970x2.  A brief Guru3d article is below.  You should keep Googling at intervals as time goes on in the next four weeks, these two cards will create quite a stir when released, NVidia will not be happy.

http://www.guru3d.com/news.php?cat=&per ... genumber=2
Some are being released as we type right now, best thing is Google "HD6970x2"  and  "HD6870x2" periodically, I suspect the noise on them is about to get to a clammer.

Re 6870x2 and Double Precision, its not clear yet as formal specs still not out there - even that may have changed by the time you read this as deployments are happening now, some are already "on the shelves".  Probably not, but .... check formal spec.  6970x2 will do DP for sure.

Regards
Zy
Joshrandom
Posts: 5602
Joined: Sat Jun 23, 2007 1:00 am

Post by Joshrandom »

If I understand things correctly, running furmark requires Windows, and as that means sticking the 5970 into the Windows machine that I took the good 5970 out of, I'll hold off for now (I had to break some of the plastic fittings off the motherboard when I originally installed the 5970 into the PC, and this makes extracting the card again very tricky, an operation that I don't want to do if I don't really need too :wink:). I'm going to pop round to a local firm tomorrow that apparently has some experience repairing faulty graphic cards, and see what they think. *crosses fingers*

As for the 6970x2 and the possible replacement for the 6990, well perhaps I would be better off waiting to see what's going to be coming out in the next few months, but than again I guess that there is always something better just around the corner, lol. Hmm, I'm clearly going to have give this a lot more thought.  :D
Zydor
Posts: 437
Joined: Mon Nov 01, 2010 12:00 am

Post by Zydor »

Joshrandom wrote:If I understand things correctly, running furmark requires Windows, and as that means sticking the 5970 into the Windows machine that I took the good 5970 out of, I'll hold off for now (I had to break some of the plastic fittings off the motherboard when I originally installed the 5970 into the PC, and this makes extracting the card again very tricky, an operation that I don't want to do if I don't really need too :wink:). I'm going to pop round to a local firm tomorrow that apparently has some experience repairing faulty graphic cards, and see what they think. *crosses fingers* ...
Fingers crossed for you on the hopeful look see and maybe repair  :)

As for the 6970x2 and the possible replacement for the 6990, well perhaps I would be better off waiting to see what's going to be coming out in the next few months, but than again I guess that there is always something better just around the corner, lol. Hmm, I'm clearly going to have give this a lot more thought.  :D
With you all the way re "something always around the corner", and that applies to the next generation of 28nm cards that will be on the shelves Q2 next year - "Kepler" for NVidia, and "Southern Islands" for AMD  (Google "Kepler NVidia"  and "Southern Islands AMD" for the latest).  Those 28nm cards will be 2 to 4 times faster than equivalent price point cards of the current generation of NVidia & AMD cards.  That particular "War" is going to be many, many times more viocal than the 590 V 6990 saga, as the 28nm deployments will set reputations in concrete for years - its going to be very interesting. AMD are likely to come out in front, however much catching up by NVidia was done with 5XX after the 3XX & 4XX debacles, so you never know, the dark horse may come up the inside rail :)

NVidia are always 6 months later than their marketing claim, so all that ties in fine re Kepler, and AMD tend to like to throw a headline card out early, so a mega 28nm Southern Islands card at Xmas would be true to form.  However, mainstream 28nm deployments are not until Q2 2012 (AMD have a 3 month booking starting end Q1 at the 28nm Fabrication Plants so it will not be before then).

Regarding the stream of AMD x2 cards, they are out right now - eg 6870x2, 6950x2 - or due in the next 6 weeks like the 6970X2, so its a live decision point, not jam tomorrow. Really well worth focusing on whilst you do your final check on the suspect card in case you do need to buy another.  

The ASUS Mars II is a NVidia 580x2, out there now (and will be the only NVidiaX2)and much better engineered than the 590, the price however is bonkers just under £1,000 ..... (!) .... and only 1,000 will be produced.  That price and limited quantity reflect NVidia's biggest problem at present, their design is a real £$£"£^&!!"£ to make money from as at present they have huge GPUs, far too big, and take up a lot of room on the PCB.  Its badly limiting the numbers that NVidia can produce, and effectively, subsidise, as they are made at a loss. 28nm next year should see a level playing field and NVidia will have no excuses, if they don't ante up Q2 next year with a winner .... they will be in substantial trouble in the consumer market, its only Past Loyalty keeping their figures up right now.

Regards
Zy
Joshrandom
Posts: 5602
Joined: Sat Jun 23, 2007 1:00 am

Post by Joshrandom »

Well the company I took my 5970 to worked on it for around a week, but ultimately had to give up on it.  :cry: However, on the plus side, they didn't charge me anything for their time (which was rather generous of them I thought) so even though they couldn't help me, I'll post the link to their website here on the off chance that someone else might benefit from it. :wink:

With the 5970 now officially dead I decided to get those two 5870s I mentioned earlier, but it was only when I came to fit them that I remembered that when I built my main system I was an Nvidia fan, so needless to say my motherboard is SLI rather than Crossfire compatible, Image. Still, I went ahead and fitted them anyway, jiggled around with the Catalyst Control Centre, and they are both now crunching just fine on Moo (I am still an idiot though).  :roll:

So the morals of the story are, don't buy a second graphics card just because it's cheap, and don't forget that you'll need a compatible motherboard if you want to use either Crossfire or SLI with your GPUs.  :oops:
Post Reply