Older blog entries for dreier (starting at number 28)

Materials from LinuxConf.eu RDMA tutorial (at last)

At long last, after several requests, I’ve posted the slides, notes, and client and server examples from the tutorial I gave at LinuxConf.eu 2007 in Cambridge back in September.  Hyper-observant readers will notice that the client program I posted does not match the listing in the notes I handed out; this is because I fixed a race condition in how completions are collected.

I’m not sure how useful all this is without me talking about it, but I guess every little bit helps.   And of course, if you have questions about RDMA or InfiniBand programming, come on over to the mailing list and fire away.

Syndicated 2007-10-10 21:38:24 from Roland's Blog

InfiniBand/RDMA in 2.6.23 and 2.6.24

With yesterday’s release of  kernel 2.6.23, I thought it might be a good time to look back at what significant changes are in 2.6.23, and what we have queued up for 2.6.24..

So first I looked at the kernel git log from the v2.6.22 tag to the v2.6.23 tag, and I was surprised to find that nothing really stood out.  We merged something like 158 patches that touched 123 files, but I couldn’t really find any headline-worthy new features in there.  There were just tons of fixes and cleanups all over, although mostly in the low-level hardware drivers.  For some reason, 2.6.23 was a pretty calm development cycle for InfiniBand and RDMA, which means that at least that part of 2.6.23 should be rock solid.

2.6.24 promises to be a somewhat more exciting release for us.  In my for-2.6.24 branch, in addition to the usual pile of fixes and cleanups, I have a couple of interesting changes queued up to merge as soon as Linus starts pulling things in:

  •  Sean Hefty’s quality-of-service support.  These changes allow administrators to configure the network to give different QoS parameters to different types of traffic (eg IPoIB, SRP, and so on).
  • A patch from me (based on Sean Hefty’s work) to handle  multiple P_Keys for userspace management applications.  This is one of the last pieces to make the InfiniBand stack support IB partitions fully.

Also, bonding support for IP-over-InfiniBand looks set to go in through Jeff Garzik’s tree.  This is something that I’ve been wanting to see for years now; the patches allow the standard bonding module to enslave IPoIB interfaces, which means that multiple IB ports can finally be used for IPoIB high-availability failover.  Moni Shoua and others did a lot of work and stuck with this for a long time, and the final set of patches turned out to be very clean and nice, so I’m really pleased to see this get merged.

Syndicated 2007-10-10 18:18:32 from Roland's Blog

Shape up or ship out

I know I’m not in the shape I once was, but it seems rather cruel of the Cantabrigians to send around huge signs on buses that say “Run, Fatboy, Run” when I go out for a morning jog.

Syndicated 2007-09-03 21:46:18 from Roland's Blog

Force of habit…

I went to the little mini post office in the grocery store today and bought some stamps.  The conversation went something like this:

Me: “I’d like a sheet of 41-cent stamps and two sheets of 2-cent stamps, please.”

Counter person: “That will be $9.  Do you need any stamps or postal supplies today?”

Me: “Um.  Just the stamps I already asked for, thanks.”

Syndicated 2007-08-16 03:48:00 from Roland's Blog

Lazyweb: best American plug to UK receptacle adapter?

As I mentioned in my previous post, I’ll be traveling to Cambridge next month.  I haven’t been to the UK in nearly 10 years, and so I’m in the market for an electrical adapter, since even after powertop’s best efforts, I still need to charge my laptop occasionally.  So I’m looking for something I can use between a North American plug and a UK receptacle.  Ideally the adapter would be neither impossible tight nor prone to coming out, and wouldn’t fall apart until after my trip.  I don’t need any gold-plated active phase skew compensation or anything like that, though.

Any suggestions?  Thanks….

Syndicated 2007-08-06 23:10:36 from Roland's Blog

Cambridge… England, that is

My tutorial Writing RDMA applications on Linux has been accepted at LinuxConf Europe 2007. I’ll try to give practical introduction to writing native RDMA applications on Linux — “native” meaning directly to RDMA verbs as opposed to using an additional library layer such as MPI or uDAPL.  I’m aiming to make it accessible to people who know nothing about RDMA, so if you read my blog you’re certainly qualified.  Start planning your trip now!

My presentation is on the morning of Monday, September 3, and I’m flying to England across 7 time zones on Sunday, September 2, so I hope I’m able to remain upright and somewhat coherent for the whole three hours I’m supposed to be speaking….

Syndicated 2007-08-06 16:04:31 from Roland's Blog

Do you feel lucky, punk?

Sun just introduced their Constellation supercomputer at ISC Dresden. They’ve managed to get a lot of hype out of this, including mentions in places like the New York Times. But the most interesting part to me is the 3,456-port “Magnum” InfiniBand switch. I haven’t seen many details about it and I couldn’t find anything about it on Sun’s Constellation web site.

However I’ve managed to piece together some info about the switch from the new stories as well as the pictures in this blog entry. Physically, this thing is huge–it looks like it’s about half a rack high and two racks wide. The number 3,456 gives a big clue as to the internal architecture: 3456 = 288 * 12. Current InfiniBand switch chips have 24 ports, and the biggest non-blocking switch one can build with two levels (spine and leaf) is 24 * 12 = 288 ports: 24 leaf switches each of which have 12 ports to the outside and 12 ports to the spines (one port to each of the 12 spine switches).

Then, using 12 288-port switches as spines, one can take 288 24-port leaf switches that each have 12 ports to the outside and end up with 288 * 12 = 3456 ports, just like Sun’s Magnum switch. From the pictures of the chassis, it looks like Magnum has the spine switches on cards on one side of the midplane and the leaf switches on the other side, using the cute trick of having one set of cards be vertical and one set horizontal to get all-to-all connections between spines and leaves without having too-long midplane traces.

All of this sounds quite reasonable until you start to consider putting all of this in one box. Each 288 port switch (which is on one card in this design!) has 36 switch chips on it. At about 30 Watts per switch chip, each of this cards is over 1 kilowatt, and there are 12 of these in a system. In fact, with 720 switch chips in the box, the total system is well over 20 kW!

It also seems that the switch is using proprietary high-density connectors that bring three IB ports out of each connector, which reduces the number of external connectors on the switch down to a mere 1152.

One other thing I noticed is that the Sun press stuff is billing the Constellation as running Solaris, while the actual TACC page about the Ranger system says the cluster will be running Linux. I’m inclined to believe TACC, since running Solaris for an InfiniBand cluster seems a little silly, given how far behind Solaris’s InfiniBand support is when compared to Linux, whose InfiniBand stack is lovingly maintained by yours truly.

Syndicated 2007-06-28 22:11:48 from Roland's Blog

Soon I will be invincible

I just read Aaron Grossman’s novel Soon I Will Be Invincible, and I heartily recommend it as an amusing summer read. It would be perfect for a long trip (sorry I didn’t write this post in time for everyone going to OLS). It’s set in a world where super-powers exist, and interleaves first person chapters narrated by Doctor Impossible (a super-villian, “The Smartest Man in The World”) and Fatale (a rookie superhero, “The Next Generation of Warfare”).

Grossman strikes a nice balance by taking everything seriously enough to capture what’s great about superhero comics, while slipping in enough sly jokes to keep things light. For example, the book starts with Doctor Impossible in jail, and it turns out that the authorities have decided he’s not an evil genius–he just suffers from Malign Hypercognition Disorder.

Syndicated 2007-06-26 03:43:49 from Roland's Blog

Enterprise distro kernels

Greg K-H wrote recently about kernels for “Enterprise Linux” distributions. I’m not sure I get the premise of the article; after all, the whole point of having more than one distro company is that they can compete on the basis of the differences in what they do. So it makes no sense to me to present this issue as something that Red Hat and Novell have to agree on (and it also leaves out Ubuntu’s “LTS” distribution, although I’m not sure if that has taken any of the “enterprise distro” market). Obviously Novell sees a reason for both openSUSE and SLES; why should SLES and RHEL have to be identical?

In fact (although Greg didn’t seem to realize it when he wrote his article), there are already significant differences between the SLES and RHEL kernel updates. SLES has relatively infrequent “SP” releases, where the kernel ABI is allowed to break, while RHEL has update releases roughly every quarter but aims to keep the kernel ABI stable through the life of a full major release.

Greg seems to favor the third proposal in his article, namely rebasing to the latest upstream kernel on every update. However, I don’t think that can work for enterprise distros, for a reason that DaveJ alluded to in his response:

W[ith] each upstream point revision, we fix x regressions, and introduce y new ones. This isn’t going to make enterprise customers paying lots of $ each year very happy.

For a lot of customers, the whole point of staying on an enterprise distro is to stick with something that works for them. No kernel is bug-free and every enterprise distro kernel surely has some awful bugs; what enterprise customers want to avoid are regressions. If SLES10 works for my app on my hardware, then SLES10SP1 better not keel over on the same app and the same hardware because of a broken SATA driver or something like that.

Of course customers often want crazy-sounding stuff, for example, “Give me the 2.6.9 kernel from RHEL4, except I want the InfiniBand drivers from 2.6.21.” (And yes, since I work on InfiniBand a lot, that is definitely a real example, and in fact a lot of effort goes into the “OpenFabrics Enterprise Distribution” (OFED) to make those customers happy) A kernel hacker’s first reaction to that request is most likely, “Then you should run just 2.6.21.” But if you think about what the customers are asking for some more, it starts to make sense. What they are really saying is that they need the latest and greatest IB features (maybe support for new hardware or a protocol that wasn’t implemented until long after the enterprise kernel was frozen), but they don’t want to risk some new glitch in a part of the kernel where RHEL4’s 2.6.9 is perfectly fine for them.

This is just a special case of Greg’s “support the latest toy” request, and if there were some technical solution for pulling just a subset of new features into an enterprise kernel then that would be great. But as I said before, without a major change in the upstream development process, rebasing enterprise kernels during the lifetime of a major release doesn’t seem to be what customers of enterprise distros want. And I agree with Linus when he says that you can’t slow down development without having people losing interest or going off onto a branch that’s too unstable for real users to test. So I don’t think we want to change our development process to be closer to an enterprise distro.

And given how new features often have dependencies on core kernel changes, I don’t see much hope of a technical solution for the “latest toy” problem. In fact the OFED solution of having the community that works on a particular class of new toys do the backporting seems to
be about the best we can do for now.

Syndicated 2007-06-24 16:21:45 from Roland's Blog

19 older entries...

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!