23 Sep 2004 mharris   » (Master)

Time for an update. This week I've been spending a lot of time in Red Hat bugzilla. My first couple of years at Red Hat, I was the only X11 developer actively working on XFree86 development/bug fixing. As one might imagine, a single person could not ever possibly handle the number of incoming bugs that get filed against something as massive as XFree86, and so over time the number of open bugs against XFree86 and now X.Org X11 have steadily increased. The majority of bugs are video driver related bugs and X server related crashes and/or video corruption.

The sheer volume of bugs is just too staggering for one person to handle on their own, and so many just sit there forever simply because there aren't enough man hours to go around to cover every single issue reported. I'm sure this situation is common to all software out there, and probably all developers, wether they're working for a Linux vendor, or just an open source project. I think it's safe to state the following law about bug trackers: "The number of bugs open in a given project/vendor's bug tracker at any given point in time, will exceed the project/vendor's available man hour resources in order to investigate all of the issues".

I've thought about this over time, and like many, have had some weird utopian idea that "some day, the bug count will go down". Well, after the count hit 400 or so, and has steadily inclined since then, I realized that my utopian fantasy was just not very real-world realistic, so I gave up hope of ever seeing the total number of open bugs ever drop in any significant way, since it is just the nature of computer software to be imperfect, and something the size of XFree86/X.Org is just too massive to ever really experience a major decline in the number of overall bugs/problems. Since the number of Linux users steadily increases over time, it stands to reason that the number of people finding and reporting bugs will also increase over time, and so it seems natural that the bug tracker would have a net steady increase in open issues over time as well.

Several times when I've had some spare time, I did a massive "bug attack", where I tried to examine as many issues as possible and do some kind of triage, to try to find as many bogus issues, or ancient bugs as possible and determine if they were valid/invalid and try to close out as many issues as possible that seemed fixed now, invalid, or some other resolution. That helped somewhat, but it was realistically a losing battle due to time constraints and the sheer number of bugs.

A year or so later, we hired a second X developer, in order to be able to scale up the number of issues we could resolve in a given time frame. This turned out to be very helpful, as some of the issues reported in bugzilla can end up taking a great many days or even a week or two to investigate and track down the problem and fix it. He lessened the burden somewhat by taking over quite a number of these D-Day type of bugs, in particular on IA64 architecture which is a very strange beast which tends to find more obscure problems in the X code than you can shake a stick at. Having a second person working on X was a definite blessing.

The number of bugs continued to be more staggering than our finite man-hours available to investigate and fix, which is expected with something such as X. I doubt that *any* OS vendor out there has the resources to have enough X11 engineers to effectively handle all bugs reported at the rate they're reported, although we've done a fairly reasonable job considering all factors.

Fast forward a year and a half or so after that, and the bug count for X and related technology is almost tripled. A lot of these bugs are now old and stale... but which ones? It takes a lot of man hours just to read them all and try to find bugs to close out and triage. Not the best use of engineer time. So, they continue to pile up.

This year, we expanded our X Development team by an additional 3 members, and had one person switch out of X development into other areas of OS development, so we now have an X Development team of 4, which is really starting to take off the ground now. Having the larger team, allows much more work to get done, but also allows a lot more discussions and team collaboration on issues. It allows multiple viewpoints to work together, and I believe the net result of the team is greater than the sum of the individual parts.

We are now looking into ways where we can improve our team policies and procedures to "the next level", and we have a great many ideas that have come out of brainstorming sessions. Some of these we've implemented, some we're working the details out and experimenting with, and others are still in the idea stage. The future does look bright now though, as I can see the light at the end of the tunnel! ;o)

One of the things we've decided is of high priority, is to try to get a handle on bugzilla. We're currently trying to focus on developing some standard policies and procedures for handling X related bug reports, and to come up with stock polite/proactive responses to give to people for common situations (much like the GNOME bug squad uses). We believe this will help us to be more productive, and will also be greatly appreciated by users and customers. We've also decided to start doing "bug days", where we sit in IRC for a full day (or more) and just triage bugs, add comments, and try to make a solid concrete decision about each bug, rather than having so many bugs in "limbo" states or falling between the cracks into the bugrot zone.

There are about 500 or more bugs currently to go through and triage. I believe once we pass through these, we can probably cut off about 100-200 easily, and work our way through another 100 or so. A lot of others are issues that we probably wont ever have the resources to investigate, and should thus be reported upstream to X.Org or wherever. Personally, I believe that if we know 100% that we will never be able to schedule time for a particular issue, there's no sense in us leaving it open forever "until we get around to it", because quite frankly, that is not likely to happen. It is much better to come with concrete DECISIONS on issues than to let them rot.

I've devised a method for analyzing/triaging bugs which I believe will help tremendously in this area. It is based off of a decision making method called "The 4-D approach" in the book "The Power of Focus" by Jack Canfield. My method seems to have a lot of merit in theory, and we're going to try it out in the X Devel team now and see how it works in practice, then hone it more. Hopefully this will reduce the overall bug count, and allow us to focus on _investigating_ and _fixing_ more bugs, and spend less time reading bug reports, many of which are bogus or useless.

Well, I think I've dumped my brain enough for one blog entry, so I'll give everyone's eyes a rest now. If I see enough interest expressed, I might post a future blog discussing my "4D decision making approach to handling bug reports", as I believe it may be very valuable for other projects out there, and individual developers. Anyone interested, can express their interest by emailing me at my personal email address (if they can figure out what that is). ;o)

Latest blog entries     Older blog entries

New Advogato Features

FOAF updates: Trust rankings are now exported, making the data available to other users and websites. An external FOAF URI has been added, allowing users to link to an additional FOAF file.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!