11 Oct 2000 crudman   » (Journeyer)

The Trouble with (SMP) Tribbles - AKA Rant

{sigh} Where to begin?

Recently I've been trying to make my main home machine (consisting of an Asus P2B-D, rev D06, and two Pentium 3 733mhz's), work for more than 5 hours without a hard-lock. No dice so far.

After exhausting all other options (cooling, new fans, slockets, power supply, etc), I've finally decided that the problem is with the board itself. Each CPU runs at 133 Front Side Bus (a purchasing decision I now regret, due to the CPUs having their multipliers locked), which the board can support. Unfortunately, it's only stable at 66/100 FSB (which forces the clock speed down).

Arg.

It's not the first time I've built a SMP system either. I was also foolish enough to purchase the bastard of all SMP motherboards (heh, excluding the Tyan Tiger 100 apparently): The Abit BP6.

The motherboard manual explicitly stated the dual Socket 370 ability was "experimental" (a pseudo-legal disclaimer for Abit's technical support). Run it at your own risk. Being one of the only SMP boards available at the time, most people were happy to live with that.

Getting this board stable took three months of my life. What started out as a simple upgrade from a P2/300, ended up in the occasional fit of rage. I've outlined the bulk of my experiences here.

In disgust, I canned the board (decided to keep it instead of a RMA for some reason) and purchased my current board, the Asus P2B-D. Finally, stability. Ran like a dream (at Celeron FSB: 66mhz). Amazed by the kernel compilation speed and MP3 ripping, etc, etc. Decided never to buy another Abit product again.

A few months later, I read about the "EC10" modification that's reported on the bp6.com forums. Basically, someone discovered that a recent revision of the BP6 board came with a higher rated capacitor (in the EC10 position). An additional capacitor was wired in parallel to the existing one, and it fixed the bulk of the BP6's stability issues (especially voltage discrepancies).

Performed the EC10 fix on my BP6. Bang. I was able to do a complete e2fsck. Repeatedly. And other things. Standard things. It's now my file/print/other server to this day. Renewed my decision to never buy another Abit product.

As time went on, more demanding applications (cough, games, cough) required more horsepower, so the current P3's were purchased. To accommodate them, the slockets had to be replaced, and PC-133 memory was required (due to the 'locked multipliers' issue).

With both CPUs at 133 FSB, my box hard-locks (no response from mouse, keyboard or otherwise, including 2 second sound repetition) anywhere from 15 minutes to 5 hours after initial bootup. Which brings us back to the present.

What is it about SMP boards? Why are they more prone to experience problems than their solo-CPU counterparts? It could be argued that all motherboards require the odd BIOS update to fix ongoing issues (eg. controller problems, compatibility, etc), but SMP boards are notoriously bad for having all kinds of configuration issues (eg. sufficient power supply, proper cooling, hardware/software that's SMP capable) together with all the other hassles that are associated with solo-CPU boards.

Arg.

After a bit more tweaking, its uptime is now 4 hours and 55 minutes. Should it fall over again, I'll think about acquiring a more capable SMP board. My choices are the Abit VP6 (as yet unreleased and untested, wow, BP6 all over again), a Tyan Tiger 133 or MSI 694D.

The VP6? Fool me once, shame on me. Fool me twice...

Latest blog entries     Older blog entries

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!