Over the years I've sold and repaired
lots of computers and found that the most common failures are
optical drives, motherboards, and hard drives, in that order.
DVD and CD rewriters seem to fail very frequently and I wonder
whether the circuitry in some models omits to turn off the laser,
or leaves the laser turned on if there's a disk present?
Motherboards fail generally because the decoupling capacitors fitted to smooth the low voltage rail get quite warm, leak, and eventually develop a high resistance allowing spikes to interfere with the working of the microprocessor.
The worst failures are of course hard drives and, I think, apart from motor bearings just wearing out, getting noisy (a warning that all is not well) then failing to spin up properly, the most common cause is bad design.
The first instance I came across that is worthy of note is a particular Fujitsu model of yesteryear. This used a complex chip supplied to the manufacturer by a sub-contractor. In turn, the sub-contractor used a specific type of glue in the manufacture of the chip that decomposed when hot into a highly corrosive chemical and ate away the internal bits of the chip. All the drives manufactured over a period of a year effectively had within them a time bomb that would cause the loss all the user's data.
The latest hard drive problem I've met concerns Seagate drives. I'm not altogether sure of the whole story, but it goes something like this
Built into a hard drive is a program, called "firmware", that determines how the drive stores and retrieves data and a few frills, such as monitoring and reporting errors.
As I understand it the Seagate designers' decided to change the design of their Barracuda drive and employ some of the platter space (the platters are where your data gets stored) to carry some, but not all, the firmware.
Most of the firmware is carried in a
chip on the circuit board attached to the drive, but because the
programmers ran out of space, a little is put onto the platters.
Ordinarily this is satisfactory, but if the hard drive for some
reason misbehaves then a circuit board change cannot fix the problem
because the firmware on the board and that on the platters does
not match up.
The problem with the Seagate Barracuda
drive was associated with a slip-up by the programmers who wrote
the firmware, in particular version "ST15".
In ST15 a rare combination of hard drive manipulations resulted
in a stray bit overflowing from one area to another. Unfortunately
the area to which it overflowed included the "Busy"
flag. When a hard drive does not want to be disturbed, for example,
when it's writing data to the platters, it puts up a "Busy"
flag and the computer will wait until this is reset before continuing
its dialogue with the hard drive.
In the fault condition, because the hard
drive wasn't actually busy, and didn't actually know the busy
flag was set, it was oblivious to the fact that the outside world
couldn't access it. In fact it was a stalemate. The computer could
not access the drive and arrange somehow to reset the busy bit,
and the hard drive was just sitting there waiting to be used,
but completely inaccessible.
This inaccessibility extended even to the computer BIOS. At start-up
the computer will always interrogate any hard drives present and
either confirm the settings are correct, or revise the settings
if a new drive has been fitted.
Because the busy flag is set the computer has no option but to
declare that there is no hard drive present. This being so, the
computer will fail to boot and display a message of some kind
about the fact that it cannot find an operating system.
When faced with this difficulty there is very little that can be done by an ordinary computer user.
A glance at Seagate's website indicated they were being very coy about the affair but, a search of the Internet revealed I wasn't the first to be aware of the Barracuda drive problem.
Mostly, any positive feedback pointed in the direction of a particular data recovery company that was offering some free software and a schematic for a special piece of hardware that could be connected to the factory setup pins of the hard drive and so reset the busy flag.
I downloaded the free software and purchased the parts for the special box of tricks.
Next I investigated the exact details of how to go about sorting out my particular Barracuda drive. Drat the free software was designed to fix the previous version of the drive it seems the bad firmware was used in successive versions of the Barracuda before being spotted. A call to the firm that supplied the free software quickly revealed that the version I needed wasn't free, it was $500. This included the special interface cable so I didn't have to make my own.
I didn't feel like parting with $500.
I returned to the Seagate website and
penned a message to them. The data is very very important to my
customer, I explained. It certainly was as he hadn't backed up
his prolific email exchanges.
After acknowledging a return email asking if I was serious about
opening some dialogue with them, things happened very swiftly
.
I received a phone call from TNT (the carriers).. requesting me to print out an advice note (to be attached to an email.. to follow) for their driver who was speeding towards me that very minute I did so and hastily packaged the hard drive with an address supplied in the email from Seagate.
A short time later the TNT driver knocked on the door and accepted the package, remarking that the address I'd put on the box was wrong not Seagate in the UK, but a firm in Holland..
The next day I got another email stating that my hard drive was fixed, and a day or two later the repaired drive was delivered by TNT; in fact it arrived so early I was still in bed and had to collect it from a neighbour.
Full of confidence I fitted it to my customer's computer and low and behold it fired up to the XP Desktop exactly as it was before the busy flag had been set. A quick check revealed that the firmware was now the latest version so, hopefully, the drive will be good for several years.
What about a RAID system you might ask?
This would have a second hard drive carrying a mirror of the boot
drive so that failure of one drive would not result in loss of
data.
Well funnily enough the computer I supplied did have a RAID-1
configuration
I actually fitted two Seagate Barracuda drives,
but strangely the RAID setup had been disconnected somehow after
a month or so without the owner noticing. The other drive did
in fact take over when I'd changed the BIOS settings, but it stepped
back 6 months or so in time, to the day it had been disconnected.
I checked its firmware status and found it to be ST15. I therefore
ran a small firmware update program supplied by Seagate and updated
it before it too met the same fate as its partner.
The final phase was to make a backup
of the repaired drive, just to be safe, then reformat the second
drive and wait till the RAID software finished its mirroring.
I suppose the moral is to never alter BIOS settings unless you
know exactly what you're doing, and NEVER carry out a motherboard
BIOS update, because this invariably resets the settings to default
which isn't compatible with a RAID setup.
It's a pity that Seagate didn't advise resellers, via suppliers, to update the firmware on any Barracuda drives they'd supplied.