21 Mar 2005 (updated 21 Mar 2005 at 15:08 UTC) »
I am getting married in less than two weeks! My wife-to-be has kicked my butt to make progress on my Ph.D... Yes, I am STILL working on my Ph.D. I passed an important milestone a month ago, and I now see light at the end of the tunnel.
A quick open source software update... Last year I became a co-developer on the Warewulf cluster project. If you do or are thinking about doing beowulf cluster work, check it out.
Back to the grindstone... I'll post again in a year or so I guess ;-)
The FNN stuff we (myself and my advisor, Hank) started developing in 2000 has lead to all sorts of stuff. Too much to go into right now. As a side note, that little $10K cluster from Oct 2000 was called the KRAA Z-MP and actually turned out to be a real pain... Someday I might write up an article about what not to do when building a PC cluster... It's been decomisioned and several of the nodes are now workstations in one of our labs.
The cluster worth checking out is our new 128+4 node creation KASY0 which we built this summer.
I'm presenting a paper at ALS2000 on Thursday morning. I still have some tweaking to do on my slides, but I've already got a full set done... just too many of 'em. My flight to Atlanta is in 22 hours.
SC2000 is approaching rapidly, and there is a LOT left to do before our research group will be ready to go. We are building a new cluster (yes, another one) for under $10K for the HPC.Games contest. It is going to be sweet when it's finished. However... Some PC-Parts vendors really annoy me... out of five 256MB PC133 DIMMs delivered from someplace that will remain anonymous, only one survived more than 30 minutes with MemTest86 without errors. Don't companies do burnin testing of anything anymore?
Well, that's all for now.
On another note, I just spent about 2 or 3 hours reading some of the recent articles and discussions here on Advogato. I can remember when Slashdot was only half as good, and it's only gone downhill from there. Advogato sure has some very intelligent discussions relating to software development. I've picked up a few really good references to some tools and techniques that I hope to use in AFAPI related things.
Our SC2000 paper was revised, and hopefully will now be readable by most users of the proceedings later this year. We had a tough time getting a fully working Postscript document to covert to PDF without introducing errors. Computers are so advanced, yet it is still a major problem to distribute documents electronically in a way that is truely portable across platforms/OS. My advisor's solution of using troff works for him, but our collaborators on this project use LaTeX, and well, the two don't mix well. I guess it's a universal problem: the perfect "versioning control system", "project configure and make facility", "programming language", or "portable document format" just don't exist yet, and probably can't due to some yet undiscovered law of nature :-).
The UKLUG group has gotten a small subset of their machines set up and configured as a video wall with AFAPI and VWLib. For now (and probably for quite some time), their stuff is housed in the KAOS Lab. This has prompted me to look into some unfinished business with the merge of VWLib into the AFAPI distribution. But classes start this Wednesday, so who knows when I will get a chance to resolve the issues with the merger. I've got other higher priority things to work on.
Speaking of higher priority things, Hank & I are working this week on our final version of a paper to be presented at the Annual Linux Showcase in Atlanta on October 12. It's "yet another paper" about KLAT2 and it's FNN.
Hmmm, what was I saying about having time to try doing an AltiVec version of our LINPACK optimizations??? Oh well. Anyone reading this know how to make human clones? I need two or three more of me :-)
Oh yeah, that takes funding... maybe next month :-)
So, it was a busy month of July for me. The fall semester is rapidly approaching. But there are still a few weeks left of "no classes". MacWorld Expo 2000 was a fun vacation to take with my Dad (and Mom). While at MacWorld, I met and chatted extensively with Troy Benjegerdes about AltiVec, BlackLab/ YellowDog Linux and parallel computing. Now that the Gordon Bell paper is done, I might have time to try our LINPACK optimizations out on my G4 with AltiVec.
On a personal note, last weekend I might have gotten a few more people addicted to Settlers of Catan, a really cool board game. Yeah, no fancy 3D video card needed... much less a computer :-)
The 32-port Fast Ethernet switches that we purchased for KLAT2 had a design flaw, causing a 60% failure rate after a few weeks of use. The manufacturer says it is a latent thermal problem. I suspect that the failure rate will approach 100% within another month or two. The company is sending us replacements for the entire set, that will have a design revision that supposedly fixes the problem. Yet, they won't get here for another two weeks! I've set the thermostat in the lab down to 65 F, making it rather unpleasant for me to work in the room. Hopefully, the remaining 4 switches will keep working until the replacements arive.
The other recent unpleasant event is our discovery that the "marketroids" have again redefined a technical term/phrase into oblivion. The phrase "wire-speed switching on all ports" used to have a technical meaning that the backplane bandwidth of a switch was large enough to handle all ports going at full speed in full duplex mode continuously, as long as the communication pattern was a permutation. The key here is that "wire-speed" should mean that as long as I am the only processor/NIC sending to another particular processor/NIC, I should have full wire-speed bandwidth available for my use, regardless of what other traffic is in the switch. The marketroids seem to have modified this definition to mean that for some permutations, you can achieve wire-speed, but not for all permutations. ACK!
So, if we can get more specific details on the internal structure of common switches we will try and modify our GA to accomodate the restrictions when designing a FNN. Most switches seem to be built with 8+1 switch-on-a-chip modules, where the +1 ports are tied together in a unidirectional ring of varous bandwidths. The key is that, depending on how high the ring's bandwitdh is, the overall switch cannot achieve wire-speed for permutations that must go almost all the way around the ring. This will also affect the observed latency of your connection patterns, possibly dramatically.
P.S. - We did NOT want to know this. But too late now... What happend to crossbars, fat-trees, and star topologies for internal switch fabrics? (I know: economics...) Addendum: I just read through a document from Allayer, a switch-on-a-chip maker, that reasonably explains the choice of a ring.
I'm putting the finishing touches on a new PCB for making an AFN. It's a tweaked version of the PAPERS 960801 board. Hopefully we can assemble and test an AFN on KLAT2 by the end of June. We could just use the old design, but we need 21 boards to build the AFN for KLAT2... so we needed to get more PCBs fabbed, so it was a good time to make a few fixes (and to update the URL on the PCB :-) Once the new board has been checked out, I'll post the design files and board masks.
P.S. - We now have access to a wave solder machine... cha-ching!
What a difference a few days makes. CNET picked up the story today and even included a picture of KLAT2. And we discovered that a strange/funny rewording of our press release was on LinuxMall.com... I never thought moonshine would be associated with supercomputers. :-)
We've already had several people "chomping at the bit" to get a copy of our software to design and use Flat Neighborhood Networks. It'll take some time to clean up the code so that it can be used/understood by people other than the authors. But as soon as it's not embarasing for others to see, it'll be posted and released into the Public Domain.
A little 64 node Athlon cluster for under $42K just doesn't compete with a $15 million NOAA cluster for news coverage. Or am I just being impatient with the press...
Oh well, its time to get back to the grind, and get the next software release out the door.
FOAF updates: Trust rankings are now exported, making the data available to other users and websites. An external FOAF URI has been added, allowing users to link to an additional FOAF file.
Keep up with the latest Advogato features by reading the Advogato status blog.
If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!