Older blog entries for Zaitcev (starting at number 156)

My diary rating continues to climb for no reason, and it's about time to do somehting about it. To that end, here's my views on the war with Islamists.

The war is a perfect example of denial of reality when it is too hurting. Leftists simply deny that they all will die prematurely if Islamists are not stopped, because most of leftists are good people. They cannot comprehend the scale of murderous villany which engulfed the Middle East. It is unreal to them, they do not believe that no matter how much we appease, help the poor, donate hospitals and schools, etc., it is all in vain. I think Raph is one of those good natured and tragically misguided souls, and it's sad.

In other news, I am confronted with a port of rmap to s390. It may not be a big deal, but only if PMD entries form full pages on the platform (crossing fingers).

Val's re-emergence to linux-kernel is off to a bad start, pushing agenda with which we are all too familiar. Honestly, I expected better.

Sparc saw a thremendous progress over the weekend. I did not do much, only changed 3 or 4 lines in sunzilog.c, but this removed old the chokes from my wheels. It was a little tough to debug kernels without a console output. I used JavaStation (PCI based) to keep with Linus' pace, but recent (~ 2.5.49) breakage in NFS client made it problematic. And all SMP boxes are SBus based, anyhow.

Turned out we shipped a beta (Phoebe) with broken USB, but I think I fixed it, thanks to Bill Nottingham's help. Good thing it was a beta, and not a product! Now crossing fingers and barring gates against 11th hour vital features required by product management.

I did some other USB work, sent a patch to Vojtech. Hit CapsLock on PS/2 keyboard, then plug a USB keyboard into a box. The light on USB keyboard won't lit. Hit NumLock. Now both will come on! Trivial, but apparently annoys the hell of KVM users.

Meanwhile, sparc languishes. I wish I had Alan's productivity. Or DaveM's.

I remember that some time ago I posted a diary entry "Jerks from Gentoo", where I vented against an article at "Gentoo Linux News", which took a dim view of my work, berating me for not submitting patches up to Marcelo fast enough. At that time I thought that "Gentoo" was an online rag. Turned out that they were a distro maker! Now it all falls into place.

    Daniel Robbins recently proposed a new kernel development strategy for Gentoo Linux, with the main goals being to improve hardware support and stability of the kernels used in the Gentoo project. As part of this strategy, Gentoo would leverage many of the hardware patches that make their way into the Red Hat kernel tree since most hardware vendors seek out Red Hat as their primary/only Linux partner. In addition to taking advantage of the improved hardware support in the Red Hat kernel source tree, Gentoo users would also benefit from additional features and functionality not normally found in the Red Hat kernel, including XFS, EVMS and Win4Lin, as well as others.

So, no volunteering to help me to split patches and merge with Marcelo, or is there? How quaint. They just realized that they do not have to pretend to be working for the community benefit. Why help to merge upstream when you can take the whole thing? Not that many believed them, anyway.

BTW, the idea to ship EVMS when its creators decided to switch to LVM2 smacks of an attempt to "diffirentiate" at any cost, without an attempt to help users at all. Apparently, nobody thought what is going to happen to RAID arrays of Gentoo users in EVMS on-disk format in about a year, when kernel 2.6 rolls out.

Dunno what made me wound up so much about this. I guess I care a lot about my and Red Hat's community role.

In kernel 2.4.18 or so, if two Token Ring boards are used in a SMP computer, a lock-up happens. IBM asked me to look at a patch by Denis J. Barlow which was said to fix the lockup. The patch was too big and touched too much code, so I figured what it did to locking and wrote a small patchlet which fixed it. Today I received a mail from Arnaldo (the maintainer of weird networking stacks) which said that someone else did exactly the same thing for 2.5 already, so he is taking the patch and sending it to Marcelo. In a year or so we'll know if it actually worked...

I am a little sad to see all the rest of Denis' patch to fall by wayside, but he is stubbornly refusing to split it into manageable chunks.

Another moral of the story is: Avoid exotic locks, such as spin_lock_bh(). The "my" patch simply replaces spin_lock_bh with spin_lock_irqsave. Why did the author use spin_lock_bh in the first place? I place the blame squarely on premature optimization.

Somewhere in August, Robert Love wrote an article in Linux Journal about Linux locks. It was an honest article, which listed all lock varieties and even tried to explain how to use them. When I read it, I thought it was a wonderful article; the only omission I noticed was that he did not mention that semaphore down() cannot be used in an interrupt. But guess what, about a month ago, Grant Edwards, a comp.os.linux.development.system regular, made a posting where he told he was going to use exotic locking. He wrote that he went through RML's article, and so, everything was going to be ok, because he'll use those correctly. I had to step in to direct him to sane locking. Of course, his code would be correct - in that particular version of the kernel. One step left, one step right, and it bitrots. Just like Token Ring did.

I extend the KISS phylosophy way beyond spin_lock_bh, into the realm of reader-writer locks. They can improve performance of contented locks a lot, but they are rife with pitfalls. In this regard Bill Irwin grieves me especially, because he advocates using reader-writer locks by default while being sharp and respected. He probably corrupted minds of hundreds of newbie driver writers. He does not understand that they are not ready for the sophisticated stuff.

Newbies need something simple instead, which may not be the most scalable, but working robustly, like a set of these "Zaitcev's Commandments":

  • If you are in a process context (any syscall) and want to lock other process out, use a semaphore. You can take a semaphore and sleep (copy_from_user or kmalloc(x,GFP_KERNEL)).
  • Otherwise (== data can be touched in an interrupt), use spin_lock_irqsave and spin_unlock_irqrestore.
  • Avoid holding spinlock for more than 5 lines of code and across any function call (except accessors like readb).

That would be it. Just keep it simple. Any of these rules can be broken under some circomstances, but the problem is that newbies cannot judge adequately. This is why they are "Commandments", to prevent them from confusing themselves.

Today, Advogato article system was DoSed by our local DCE and Microsoft worshipper. It was pretty funny, because it reminded me about my buddy Sergey, who went to work for Microsoft and got assigned to support DCE. His comments about the code were quite entertaining, if a little harsh.

Couple of days ago I sent a patchlet for USB hub code to gregkh. The fix was about 10 lines big and it would not deserve mentioning but for its amusing work and social context.

It was known for a long time that the debouncing loop was broken from the conception, but nobody did anything to fix it. An incompetent programmer somewhere eventually came up with a patch, which does not fix the algorithm, but just short-circuits the loop instead, letting it to proceed. This fixed it for people who were frustrated by the original (broken) loop implementation. The patch began to float around, and one day SuSE began to ship it. Eventually, a gentleman from Khroatia came to the linux-usb-devel and wrote "This patch fixes my problem, SuSE ships it, so be so kind and include it in the mainstream kernel at last."

This was sooo wrong on so many levels, that I felt compelled to fix the darn thing. Also, I got some negative lessons, which may be obvious to some, but perhaps my elitism and snobism blinded me.

  • Some time ago, if a person could formulate a patch with correct argument order for diff, I would assume he or she read the code, at least. It is NOT SO.
  • (Corrolary: there is a swarm of garbage patches floating around).
  • Lots of people mastered the skill of downloading and applying patches without a foggiest understanding of the code.
  • Even big and respectable vendors do not read crap which they ship with their kernels.

Regarding the last point, I'm not saying that SuSE is particularly bad, or Red Hat is good in this respect. We always review patches we ship. This should be sufficient, right? Well, in case of the debounce loop, the garbage patch floating on the 'Net was so small, that it provided no context. It was pretty hard to tell from three lines up and down what it actually did. I caught it because I "own" the USB subsystem in Red Hat, so I am somewhat familiar with the code. We also have owners of other major subsystems, such as scsi (Doug), IDE (Alan & Arjan), filesystems (SCT & Al), processes/signals (DaveM/Alan), scheduler (Mingo), networking (DaveM). These people zealously guard their territories from infiltration of rogue code. But what if someone posts a patch for IEEE-1394 which fixes a problem on someone's box? Or same for ISDN? Thankfuly, we have a gatekeeper (Arjan), who is good at saying "no", but this is a personality. He may quit one day.

The answer is, of course, push everything through the community, and resist additional patches as much as possible. The pressure to include is strong, however. I remember a public outcry when we shipped 81 patch RPM for 5.2 or something like that. Despite Arjan's valiant efforts, we now ship about 300, and everyone appears resigned to it. But that's not the limit. United Linux pushes 800! Worse, Linus preaches vendor kernels in interviews. I dunno where this is going to end, I really don't.

Gloomy today, am I.

P.S. Do not download and apply crap patches without looking at the code they are touching.

I had to touch s390 again after a long break. Guess what I found. Marcelo 2.4.20 does not compile due to a syntax error in the chandev. And nobody posted anything. The stupid mainframe simply has no community. Nobody ever tests whatever is in the open. IBM produces "drops", which vendors incorporate in big chunks. The result is the proprietary software development model which happens to produce GPLed code.

Today I managed to do a little sparc work, applied Eric Brower's patch to support nodes with "interrupts" property, but without "intr" property. Also did other small things around serial ports, etc.

The incredible shrinking Red Hat Linux

To my amazement, I just found that RH8.0 takes about 27% less space than RH7.3 (860MB vs. 1200MB) with essentially identical installation on my laptop. Of course, I have only myself to blame for even paying attention to this. I partitioned the disk in the way which allows me to test betas painlessly and I thought that 1300MB system partitions would be plenty for the life time of the laptop. The 7.3 release got me thinking about repartitioning and moan about bloat at mailing lists.

UML, what UML?

I found that we shipped a broken User-Mode kernel in RH8.0. The bug was pretty easy to hit, and known to the UML community. Jeff Dike told me what to fix, so the patch is in the CVS already, tested to work, and I'm running patched kernels already.

But this got me thinking. Nobody, not a single customer posted an issue to issue-tracker, or a bug to bugzilla. This can mean only one thing: nobody is using UML. Why is that?

If we consider rumors, the main reason is that the jail mode is slow, and "regular" mode provides no security. Thus, UML cannot be used by ISPs to hand out shells. Without the proper security, UML is nothing but a debugging tool for kernel hackers, and those just ignore what we ship and download all latest anyway. This is all speculation, of course. Perhaps we just have a solution in search of a problem here.

Back from the Red Hat World-wide Engineering Meeting. It was fun. Here's a couple of observations.

Our core development departments apparently follow a staffing model "hire superstars, and they'll work it out", which of course is helped by the overall state of U.S. economy. The rest of the company gets to collect all the good folks who had enough of it with proprietary software from other places, in particular from DEC's campuses.

The collorary to this is that it's pretty hard to place even solid performers who I know personally, the competition is so stiff. I seriously fail to understand how I was hired in the first place. Also, it is very pleasant to work with people who, at minimum, do not need elementary things explained, and typically are better than that.

I found more pilots in the development department, although most struggle to stay current in respect to number of landings, BFR, and medical. Tough life...

Matt Jacob presented me with a SPARCserver-1000 (Scorpion), so now I have no excuse but to support sun4d. Uh-oh...

Great many thanks for Matt!

I am off to Red Hat Worldwide Engineering meeting.

147 older entries...

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!