The Trouble with (SMP) Tribbles - AKA Rant
{sigh} Where to begin?
Recently I've been trying to make my main home machine
(consisting of an Asus P2B-D, rev D06, and two Pentium 3
733mhz's), work for more than 5 hours without a hard-lock.
No dice so far.
After exhausting all other options (cooling, new fans,
slockets, power supply, etc), I've finally decided that the
problem is with the board itself. Each CPU runs at 133
Front Side Bus (a purchasing decision I now regret, due to
the CPUs having their multipliers locked), which the board
can support. Unfortunately, it's only stable at 66/100
FSB (which forces the clock speed down).
Arg.
It's not the first time I've built a SMP system either. I
was also foolish enough to purchase the bastard of all SMP
motherboards (heh, excluding the Tyan Tiger 100 apparently):
The Abit BP6.
The motherboard manual explicitly stated the dual Socket 370
ability was "experimental" (a pseudo-legal disclaimer for
Abit's technical support). Run it at your own risk. Being
one of the only SMP boards available at the time, most
people were happy to live with that.
Getting this board stable took three months of my life.
What started out as a simple upgrade from a P2/300, ended up
in the occasional fit of rage. I've outlined the bulk of my
experiences here.
In disgust, I canned the board (decided to keep it instead
of a RMA for some reason) and purchased my current board,
the
Asus P2B-D. Finally, stability. Ran like a dream (at
Celeron FSB: 66mhz). Amazed by the kernel compilation speed
and MP3 ripping, etc, etc. Decided never to buy another
Abit product again.
A few months later, I read about the "EC10" modification
that's reported on the bp6.com forums. Basically, someone
discovered that a recent revision of the BP6 board came with
a higher rated capacitor (in the EC10 position). An
additional capacitor was wired in parallel to the existing
one, and it fixed the bulk of the BP6's stability issues
(especially voltage discrepancies).
Performed the EC10 fix on my BP6. Bang. I was able to do a
complete e2fsck. Repeatedly. And other things.
Standard things. It's now my file/print/other server
to this day. Renewed my decision to never buy another Abit
product.
As time went on, more demanding applications (cough,
games, cough) required more horsepower, so the current P3's
were purchased. To accommodate them, the slockets had to be
replaced, and PC-133 memory was required (due to the 'locked
multipliers' issue).
With both CPUs at 133 FSB, my box hard-locks (no response
from mouse, keyboard or otherwise, including 2 second sound
repetition) anywhere from 15 minutes to 5 hours after
initial bootup. Which brings us back to the present.
What is it about SMP boards? Why are they more prone
to
experience problems than their solo-CPU counterparts? It
could be argued that all motherboards require the odd BIOS
update to fix ongoing issues (eg. controller problems,
compatibility, etc), but SMP boards are notoriously
bad for having all kinds of configuration issues (eg.
sufficient power supply, proper cooling, hardware/software
that's SMP capable) together with all the other hassles that
are associated with solo-CPU boards.
Arg.
After a bit more tweaking, its uptime is now 4 hours and 55
minutes. Should it fall over again, I'll think about
acquiring a more capable SMP board. My choices are the Abit
VP6 (as yet unreleased and
untested, wow, BP6 all over again), a Tyan Tiger 133 or MSI
694D.
The VP6? Fool me once, shame on me. Fool me twice...