Recent blog entries for tmattox

21 Mar 2005 (updated 21 Mar 2005 at 15:08 UTC) »

Cool, I remembered my advogato password ;-)

I am getting married in less than two weeks! My wife-to-be has kicked my butt to make progress on my Ph.D... Yes, I am STILL working on my Ph.D. I passed an important milestone a month ago, and I now see light at the end of the tunnel.

A quick open source software update... Last year I became a co-developer on the Warewulf cluster project. If you do or are thinking about doing beowulf cluster work, check it out.

Back to the grindstone... I'll post again in a year or so I guess ;-)

Well, it's been awhile since I've written anything here... over 3 years! And, yeah, I'm still working on my Ph.D. at UK (U. of KY). I'm at my sister's in Atlanta for the holidays.

The FNN stuff we (myself and my advisor, Hank) started developing in 2000 has lead to all sorts of stuff. Too much to go into right now. As a side note, that little $10K cluster from Oct 2000 was called the KRAA Z-MP and actually turned out to be a real pain... Someday I might write up an article about what not to do when building a PC cluster... It's been decomisioned and several of the nodes are now workstations in one of our labs.

The cluster worth checking out is our new 128+4 node creation KASY0 which we built this summer.

Hmmm, too much going on to post it all...

I'm presenting a paper at ALS2000 on Thursday morning. I still have some tweaking to do on my slides, but I've already got a full set done... just too many of 'em. My flight to Atlanta is in 22 hours.

SC2000 is approaching rapidly, and there is a LOT left to do before our research group will be ready to go. We are building a new cluster (yes, another one) for under $10K for the HPC.Games contest. It is going to be sweet when it's finished. However... Some PC-Parts vendors really annoy me... out of five 256MB PC133 DIMMs delivered from someplace that will remain anonymous, only one survived more than 30 minutes with MemTest86 without errors. Don't companies do burnin testing of anything anymore?

Well, that's all for now.

Hmmm, I seem to have been demoted from Apprentice to Observer. Not that I'd used my former status as Apprentice to make any article postings... I still find it odd, since I have several certifications as apprentice from others who are certified as apprentice or higher. Oh well for now.

On another note, I just spent about 2 or 3 hours reading some of the recent articles and discussions here on Advogato. I can remember when Slashdot was only half as good, and it's only gone downhill from there. Advogato sure has some very intelligent discussions relating to software development. I've picked up a few really good references to some tools and techniques that I hope to use in AFAPI related things.

Our SC2000 paper was revised, and hopefully will now be readable by most users of the proceedings later this year. We had a tough time getting a fully working Postscript document to covert to PDF without introducing errors. Computers are so advanced, yet it is still a major problem to distribute documents electronically in a way that is truely portable across platforms/OS. My advisor's solution of using troff works for him, but our collaborators on this project use LaTeX, and well, the two don't mix well. I guess it's a universal problem: the perfect "versioning control system", "project configure and make facility", "programming language", or "portable document format" just don't exist yet, and probably can't due to some yet undiscovered law of nature :-).

The UKLUG group has gotten a small subset of their machines set up and configured as a video wall with AFAPI and VWLib. For now (and probably for quite some time), their stuff is housed in the KAOS Lab. This has prompted me to look into some unfinished business with the merge of VWLib into the AFAPI distribution. But classes start this Wednesday, so who knows when I will get a chance to resolve the issues with the merger. I've got other higher priority things to work on.

Speaking of higher priority things, Hank & I are working this week on our final version of a paper to be presented at the Annual Linux Showcase in Atlanta on October 12. It's "yet another paper" about KLAT2 and it's FNN.

Hmmm, what was I saying about having time to try doing an AltiVec version of our LINPACK optimizations??? Oh well. Anyone reading this know how to make human clones? I need two or three more of me :-)

Oh yeah, that takes funding... maybe next month :-)

Ack, what happened to July?

  • Hank and I hacked with Thomas on his CFD code on KLAT2 to get final performance numbers for the Gordon Bell prize entry.
  • Wasted a week with a misunderstanding if the -ARP flag for /sbin/ifconfig. Long story short: -ARP turnes off everything related to ARP for that NIC, including use of any preloaded/constant entries in the ARP cache! What a useless option.
  • Hacked on the GA for finding/optimizing FNNs. We now have a better framework for specifying specific communication patterns for the GA to optimize for. Still not much closer to releasing the source :-(
  • I took a week vacation to attended MacWorld Expo 2000 with my Dad. I promptly drooled on a G4 Cube. In a way, I wish I didn't already have a G4 Tower!
  • Upgraded Odie to gigahertz Athlons and ABIT KA7 motherboards that AMD donated. They are fast, but my 450 MHz G4 still cranks out a few more RC5 keys/sec.
  • Upgraded Opus to 500 MHz K6-2 CPUs that AMD donated. This made a very perceptable difference in the speed of the video wall, so I guess VWLib isn't memory bandwidth limited...
  • Built and tested one of the AFN000601 boards. Will need to recruit some students to help assemble 20 more to use on KLAT2. Still waiting on one critical part (74LS13 chips) to be delivered.
  • The University of Louisville's S+LUG visted the KAOS Lab to get a tour and some assistance with their cluster project.
  • The University of Kentucky's UKLUG has begun work on converting a pile of 486's in the KAOS Lab into a usable cluster (Galugtica) for learning/hacking purposes.

So, it was a busy month of July for me. The fall semester is rapidly approaching. But there are still a few weeks left of "no classes". MacWorld Expo 2000 was a fun vacation to take with my Dad (and Mom). While at MacWorld, I met and chatted extensively with Troy Benjegerdes about AltiVec, BlackLab/ YellowDog Linux and parallel computing. Now that the Gordon Bell paper is done, I might have time to try our LINPACK optimizations out on my G4 with AltiVec.

Lots of news this time:

  • Ahhhh, the replacement switches for KLAT2 are installed. Hopefully they actually fixed the design flaw.
  • Ars Technica has posted our article about how we got 64 GFLOPS out of KLAT2.
  • The new AFN 000601 PCB's are finished, and should be in my hands by Wednesday. I envision lots of soldering in my near future... :-)
  • Our submission for a Gordon Bell price/performance award was accepted, so we are finalists for this prestigious award.
  • Our abstract/paper about KLAT2/FNNs was accepted at the Third Extreme Linux Workshop which will be held at the 4th Annual Linux Showcase in Atlanta.
  • We put together a KLAT2 In The News page that links in all the coverage that we have found so far.
  • Finally, our cluster work is linked-in on the "official"Beowulf web site.
The more I get done, the longer my TO-DO list gets! That's two papers that need mucho revisions, 21 AFN boards to assemble and solder, an update to AFAPI to handle more than 32 processors, and some major code/network tweaking to improve that price/performance ratio!

On a personal note, last weekend I might have gotten a few more people addicted to Settlers of Catan, a really cool board game. Yeah, no fancy 3D video card needed... much less a computer :-)

It's been a busy week and a half since my last entry.

  • On Friday, Hank and I finished writing an article about how we got over 64 GFLOPS on KLAT2. Hopefully it will be appearing on Ars Technica in the next week.
  • I sent off the PCB design files for a new AFN based on a revised PAPERS 960801 module. So, hopefully, in a few weeks, we can have an AFN up and working on KLAT2. The new design files will be posted once I've verified that none of my changes/tweaks messed up the functionality/reliability of the PAPERS 960801 design. I didn't need to make revisions, however, since we needed to have a new run of PCBs made, it was a good time to correct some annoyances with the old design.
  • And now for the ugly events of the past two weeks or so:

The 32-port Fast Ethernet switches that we purchased for KLAT2 had a design flaw, causing a 60% failure rate after a few weeks of use. The manufacturer says it is a latent thermal problem. I suspect that the failure rate will approach 100% within another month or two. The company is sending us replacements for the entire set, that will have a design revision that supposedly fixes the problem. Yet, they won't get here for another two weeks! I've set the thermostat in the lab down to 65 F, making it rather unpleasant for me to work in the room. Hopefully, the remaining 4 switches will keep working until the replacements arive.

The other recent unpleasant event is our discovery that the "marketroids" have again redefined a technical term/phrase into oblivion. The phrase "wire-speed switching on all ports" used to have a technical meaning that the backplane bandwidth of a switch was large enough to handle all ports going at full speed in full duplex mode continuously, as long as the communication pattern was a permutation. The key here is that "wire-speed" should mean that as long as I am the only processor/NIC sending to another particular processor/NIC, I should have full wire-speed bandwidth available for my use, regardless of what other traffic is in the switch. The marketroids seem to have modified this definition to mean that for some permutations, you can achieve wire-speed, but not for all permutations. ACK!

So, if we can get more specific details on the internal structure of common switches we will try and modify our GA to accomodate the restrictions when designing a FNN. Most switches seem to be built with 8+1 switch-on-a-chip modules, where the +1 ports are tied together in a unidirectional ring of varous bandwidths. The key is that, depending on how high the ring's bandwitdh is, the overall switch cannot achieve wire-speed for permutations that must go almost all the way around the ring. This will also affect the observed latency of your connection patterns, possibly dramatically.

P.S. - We did NOT want to know this. But too late now... What happend to crossbars, fat-trees, and star topologies for internal switch fabrics? (I know: economics...) Addendum: I just read through a document from Allayer, a switch-on-a-chip maker, that reasonably explains the choice of a ring.

Cool! KLAT2 has been hitting the press around the world! It's been reported in a Chinese newspaper's business section... I still need to find out "which" newspaper. A fairly well done article appeared in the EE Times under Technology News. The CNET story got reported at Tom's Hardware which has translated mirrors around the globe (German, Japanese, and Korean to name a few).

I'm putting the finishing touches on a new PCB for making an AFN. It's a tweaked version of the PAPERS 960801 board. Hopefully we can assemble and test an AFN on KLAT2 by the end of June. We could just use the old design, but we need 21 boards to build the AFN for KLAT2... so we needed to get more PCBs fabbed, so it was a good time to make a few fixes (and to update the URL on the PCB :-) Once the new board has been checked out, I'll post the design files and board masks.

P.S. - We now have access to a wave solder machine... cha-ching!

Patience Luke.. Patience...

What a difference a few days makes. CNET picked up the story today and even included a picture of KLAT2. And we discovered that a strange/funny rewording of our press release was on LinuxMall.com... I never thought moonshine would be associated with supercomputers. :-)

We've already had several people "chomping at the bit" to get a copy of our software to design and use Flat Neighborhood Networks. It'll take some time to clean up the code so that it can be used/understood by people other than the authors. But as soon as it's not embarasing for others to see, it'll be posted and released into the Public Domain.

Hmmm, I guess a super-keen-neato-fast Linux Athlon cluster for real cheap isn't as newsworthy as we had thought. Is it just common knowledge that Athlons rock, or is it "Beowulf press release overload" recently? Anyway, we made it onto TechNN under "press releases" for a few hours... almost a day, Linux Today with 1500 "reads" or so, and KLAT2 was prominently mentioned on 3DNow.net.

A little 64 node Athlon cluster for under $42K just doesn't compete with a $15 million NOAA cluster for news coverage. Or am I just being impatient with the press...

Oh well, its time to get back to the grind, and get the next software release out the door.

4 older entries...

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!