Older blog entries for mjg59 (starting at number 231)


If only eeepc-laptop sent standard keycodes, or something.

Oh, wait.

Writing a Linux distribution is hard. There's a huge range of interconnected dependencies. It takes a long time to learn how everything fits together, and fixing things properly rather than adding device-specific hacks often requires rewriting a lot of code. I'm sure Google will figure it out in time[1], and I'm also sure that the majority of their work is going into their UI rather than the underlying infrastructure. But even so, don't expect that you'll be able install Chromium OS on a random piece of hardware and have it work as well as, say, Fedora in the near future.

[1] Based on that script, I'd say they're about equal to Xandros at the moment

Syndicated 2009-11-19 20:32:10 from Matthew Garrett

Legacy PC design misery

I've spent chunks of the last couple of days fighting a problem that's existed for about 25 years. The 8086 was a 16-bit processor with a 20-bit address space, limiting the maximum physical address that could be accessed to 1MB. However, quirks of the segmented memory system meant that addresses greater than 1MB could be constructed - these would wrap around to the bottom of memory. Because loading the segment registers was a time consuming operation, some programmers used this behaviour as a performance optimisation.

The 80286 introduced 24 bit address space. Unfortunately, this meant that the addresses that previously wrapped to the bottom of memory now pointed at real addresses - not ideal if you were expecting the old behaviour. IBM fixed this by tying the 21st address line (A20 - they're zero indexed) through an and gate, with the default behaviour being to keep it tied at 0 and thus maintaining the old wraparound behaviour. Applications that wanted to access the full address space needed to enable the A20 logic gate. IBM didn't want to add any extra hardware to their system if they could avoid it, so tied the other side of the and gate to a spare pin on the keyboard controller. By writing a couple of bytes to the keyboard controller, your PC-AT stopped pretending to be an XT and gave you access to all of the insanely expensive RAM it had stuffed in it. Hurray!

PCs have been emulating this behaviour since the AT was first cloned. Of course, this being the PC industry, many have got it wrong. There's a set of approaches for controlling the A20 gate that may work, varying in terms of performance and desirability. Most hardware will give the desired result (ie, I have no desire to run DOS executables from 1982, make my A20 work damnit) using any of the various methods of A20 enabling. Some hardware doesn't. The most common method used in bootloaders (where we still have access to system BIOS services) is to call int 15h with an ax of 0x2401, which asks the BIOS to enable A20 for us. This isn't implemented on all hardware, but we should get a failure back that lets us go and bang on the keyboard controller in an attempt to get it to pay attention[1].

Enter the Kohjinsha SC3.

I picked this up second hand in Japan. It's a ridiculously cute little tablet, only slightly larger than hardware that's comfortably in the MID range. It booted a Fedora liveCD perfectly, though having GMA500 graphics meant that what appeared wasn't terribly attractive. Installation proceeded happily enough, followed by a reboot and... nothing. Grub loaded the kernel and initrd, jumped to the kernel and everything hung.

So, for the past couple of days, I was stepping through the kernel setup code, trying to work out where and why it was hanging. I'd got it narrowed down to the region where the kernel tried to free the memory used by the initramfs, but the failure hopped around depending on my kernel build. Something was clearly very wrong. The strangest thing about this was that if I booted the liveCD boot menu and selected "Boot from local drive", everything worked perfectly. isolinux was clearly doing something that grub wasn't, but there's rather a lot of code to step through there.

Things became a lot easier once I found that the OpenSuse version of grub worked. Their grub has a rather smaller set of patches than ours, and only a few looked even plausibly relevant. It only took ten minutes or so to figure out that it was one that altered the A20 code. Things became much clearer then.

The main functional difference between the Suse A20 implementation and the upstream one[2] is that the Suse one explicitly tests whether the A20 enabling worked by putting values at two different addresses that would be the same if A20 is disabled. By comparing them, we know whether A20 is working properly or not. If not it can then fall back to other mechanisms. The Fedora code trusted the BIOS's claim that the int 15 call had worked. The Kohjinsha's BIOS lied, A20 remained disabled, grub copied the kernel and initramfs to chunks of address space that contained lies rather than RAM and everything fell over horribly.

Thankfully, not a difficult fix once the problem was identified. But seriously, people. How hard can it be not to screw this up?

(For an excrutiatingly detailed analysis of how hard it can be not to screw this up, see here)

[1] the Intel Macs don't implement the int 15 approach, but return a failure. They also don't have a legacy keyboard controller, so attempting to hit that resulted in grub falling over. The magic IO port approach works. Another example of how the Intel Macs aren't really PCs...

[2] grub2 implements the more paranoid check

Syndicated 2009-11-13 14:32:35 from Matthew Garrett

The ACPI Embedded Controller

Of course, the event model I described before is far too simple to be worthy of a place in the ACPI spec. At the most basic level, there's more possible events than there are GPEs to attach them to, so there's a need for some further complexity. This manifests itself in the form of the ACPI embedded controller (EC).

The EC is typically a small microprocessor sitting on your motherboard, often implemented in the same hardware as the keyboard controller. It shares a lot in common with the keyboard controller - on PCs it'll usually appear in system io space, with one register for writing a command or reading a status, and a second register for passing data back and forth[1]. There's 256 registers available, so a typical interaction might be to write the READ command (0x80) to the command register, write the EC register address to the data register and then read back from the data register to get the EC register contents.

The embedded controller will often be responsible for tracking information about the hardware, such as the temperature. Attempting to read the temperature through ACPI will execute an ACPI method - in the case of the temperature being monitored by the embedded controller, this method will attempt to read from an EC register. The EC driver then performs the read and returns the result, which gets converted into decidegrees kelvin and passed back to whatever made the temperature query.

But, as mentioned above, the EC also generates events. These may be in response to a user initiated event like a hotkey press, or may be triggered by some change in hardware state like a thermal trip point being passed. The embedded controller will then raise a GPE.

Unlike normal GPEs, the EC GPE is not handled by looking for a _Lxx or _Exx method. Instead, the ACPI tables provide information about the GPE that the EC is using. This may be in the form of a _GPE definition in the EC object in the main ACPI tables, or alternatively may be provided in an ECDT (Embedded Controller Descriptor Table), an optional table that provides all the EC information. In either case, the OS knows which GPE will be triggered by the EC. It then installs a handler that will be called whenever the EC raises that GPE.

Things get a touch confusing at this point. The first thing this handler does is read the command byte, which functions as a status byte on reads. It then checks whether the SCI_EVT bit is set. This informs the system that the GPE was in response to a hardware event, and so the EC handler writes a query command to the EC command register and then reads back a value between 0 and 255 from the data register. This is then mapped to a _Qxx method, with xx representing the number of the EC event read from the data register. Like the _Lxx and _Exx methods, the _Qxx method is then executed.

The problem with all of this is that the EC isn't that fast. When a byte is written to it, it's necessary to read back the status byte and check whether the IBF bit is set. This is set when the OS writes a byte to the data register, and cleared once the EC has processed it. The straightforward way to deal with this is to poll the status byte until the bit is cleared, and then write the next byte, but polling is slow and wastes CPU time. The EC can instead be set to interrupt mode, where it'll fire a GPE when the IBF bit clears.

The EC has one additional function. The ACPI spec allows for an i2c bus to be implemented through the EC, with EC registers mapping to i2c registers. The observant among you will realise that this means that there's an indexed access protocol being implemented on top of indexed access hardware, which is more layers of indirection than seem sane. For additional humour, this is usually only used to add support for ACPI smart batteries. ACPI batteries are generally abstracted behind a set of ACPI methods that provide information. Smart batteries instead speak i2c directly to the OS[2] for no real benefit. Linux handles these devices fine, and while the chances are you probably don't have one, the chances are also that if you do you haven't noticed.

The final quirk of ACPI events is that there's yet another means of delivering events. The term "fixed feature" is used to describe an ACPI device that isn't described in the ACPI tables. A power button may be implemented as a fixed feature device rather than a normal ("control method") device. This is indicated by a flag in the fixed feature block. Hitting a fixed feature power button will generate an ACPI interrupt, but no GPE. Instead the OS has to read the fixed feature block and note that the power button flag is set there. It then notifies userspace appropriately. Sleep buttons can also be implemented this way, but other devices will be in the normal ACPI tables and will generate either GPEs or EC events.

[1] On my laptop, these are ports 0x62 and 0x66 - compare to the keyboard controller's use of ports 0x60 and 0x64

[2] As directly as indirection via the EC can be...

Syndicated 2009-11-10 14:58:52 from Matthew Garrett

ACPI general purpose events

ACPI is a confusing place. It's often thought of as a suspend/resume thing, though if you're unlucky you've learned that it's also involved in boot-time configuration because it's screwed up your interrupts again. But ACPI's also heavily involved in the runtime management of the system, and it's necessary for there to be a mechanism for the hardware to alert the OS of events.

ACPI handles this case by providing a set of general purpose events (GPEs). The implementation of these is fairly straightforward - an ACPI table points at a defined system resource (typically an area of system io space, though in principle it could be something like mmio instead), and when the hardware fires an ACPI interrupt the kernel looks at this region to see which GPEs are flagged. Then things get more interesting.

The majority of GPEs are implemented in the ACPI tables via methods with names like _Lxx or _Exx. The xx is the number of the GPE in hex, while the leading _L or _E indicates whether the GPE is level- or edge-triggered. If an ACPI interrupt is fired and GPE 0x1D is flagged as being the source of the interrupt, the ACPI interpreter will then look for an _L1D or _E1D method. Upon finding one, it'll execute it. What this method does is entirely up to the firmware - on most HP laptops, GPE 0x1D is hooked up to the lid switch[1] and so executing it will send a notification to the OS that the lid switch has changed state. The OS will then evaluate the state of the lid switch (generally by making another ACPI query) and send the event up to userspace.

How does the lid end up triggering GPE 0x1D? Things get pretty hardware specific at this point. Intel motherboard chipsets have a set of general purpose io (GPIO) lines that can, for the most part[2], be used by the system vendor for anything they want. For a lid switch, one of these lines is hooked to the switch and the BIOS configures the GPIO as an input. Pressing the switch will cause the GPIO line to become active. The GPIO lines are mapped to GPEs in a 1:1 manner, though with an offset of 16 - ie, GPIO 0xd will map to GPE 0x1d. If GPIO 0xd becomes active, GPE 0x1d will be flagged and an ACPI interrupt sent. The ACPI code will then do something to quash the interrupts, such as inverting the polarity of the GPIO[3], as well as send the notification to the OS.

Why are the GPIOs offset by 16 relative to the GPEs? The lower 16 GPEs (again, talking about Intel hardware) have pre-defined purposes[4]. These range from things like "Critically low battery" to "PCIe hotplug event" down to "This device triggered a wakeup". And the latter is what I'm most interested in here.

Various pieces of modern hardware can be placed into power saving states when not in use. The problem with this is that the user experience of having to turn on hardware before you can use it is not a good one, so in order to make this the default behaviour we need the hardware to tell us that something happened that requires us to wake the hardware up.

There's something of a chicken and egg problem here, but thankfully most of the relevant modern hardware has out of band mechanisms to tell us about things going on. The PCI spec defines something called Power Management Events (PME), which are driven by an additional current that's supplied to the hardware even when it's otherwise turned off. On plug-in PCI Express cards, firing a PME generates an interrupt on the root bridge and a native driver can interpret that, but for legacy PCI devices and integrated chipset devices the notification has to come via ACPI.

The example I've been working on is USB. It's a good choice for various reasons - firstly, there's already support for detecting when the USB controller is idle. Secondly, modern USB host controllers have support for generating PMEs on device insertion, removal or (and this is important) remote wakeup. In other words, as long as the USB bus is idle we can power down the entire USB controller. If the OS tries to access a USB device, we'll power it back up. If the user unplugs or plugs a device, we'll power it back up. If a previously idle device suddenly responds to some external input, we'll power it back up. And it's all nicely invisible to the user.

How does this work? The controller retains a small amount of power even when nominally pwoered down. This is used to keep the detection circuitry alive. When it receives a wakeup event, it asserts the PME line. The chipset detects this and fires a GPE. The OS runs this GPE and receives a device notification on the ACPI representation of the USB controller, telling us to power it back up. We do so and process whatever woke us - if the bus then goes idle again, we can power down once more.

The astonishing thing is that this all works. The only problem we have is that it relies on the machine vendor to have provided the ACPI methods that are associated with the GPEs. If they haven't, we can't enable this functionality - even though the hardware is capable of generating the GPEs, we have no method to execute to let us know which device has to be woken up. The GPE is never answered, we never acknowledge the PME and the hardware keeps on screaming for attention without getting any. And, more to the point, it never gets powered up and your mouse doesn't work.

There's a pretty gross hack to deal with this. In general, we know what the GPE to device mappings are - they're pretty static across Intel chipsets, and while AMD ones can be programmed differently by the BIOS we can read that information back and set up a mapping ourselves. This trick also comes in handy when some vendors (like, say, Dell) manage to implement one of the GPE events wrongly. Everything looks like it should work, but the method never sends a notification because it's buggy. In that case we can unregister the existing method and implement our own instead.

This code isn't upstream yet, but patches have been posted to the linux-acpi mailing list and with luck it'll be there in the 2.6.33 timeframe. My tests suggest about 0.2W saving per machine, which isn't going to save all that many polar bears but seems worth it anyway.

[1] _L1D = lid. Sigh.

[2] There's a few that are reserved for specific purposes

[3] So where before it had to be high to be active, it now has to be low to be active - this means that it'll now trigger on the switch being opened rather than closed, so you'll get another event when you open the lid again.

[4] You can find a list in the documentation for the appropriate ICH chip - the relevant section is "GPE0_STS" under the LPC interface chapter.

Syndicated 2009-11-10 03:08:36 from Matthew Garrett

Looking to the past

It’s an oft-voiced suggestion that rather than looking at the bad things that happen in our communities, we should focus on the good things. There’s a number of highly successful geek women already – should we not be concentrating on encouraging more of them, rather than scaring people away with tales of thoughtlessness, discrimination and outright abuse?

Let’s draw an analogy. One day, a $20 charge appears on your credit card. You didn’t make it. You report it to your credit card company, who assure you that they take fraud seriously and then do nothing. A few days later, another $20 charge. Your credit card company tells you that such events are rare, unrepresentative of the general credit card experience and continue to do nothing. A week afterwards, another charge. This time your credit card company describes how they’re planning on implementing a brand new anti-fraud system, but that this is unrelated to any events that may currently be occuring and will give no details as to when it’s going to be rolled out. And proceed to ignore any further reports you make about fraudulant transactions.

Would you stay with this company? Or would you take your business somewhere else?

The problem with the “Let’s look to the future rather than spending too much time getting stuck in the present” argument is that it assures people that things will get better without providing a roadmap for getting there. It does nothing to validate their concerns or make them feel wanted within a community. It assumes either that people will stick with a community that doesn’t respond to their complaints, or that it’s possible to construct a community that’s welcome to an assortment of genders, ethnicities and lifestyles without any of those people being represented in the first place.

Ignoring people’s concerns is an excellent way to drive them away from your community. Doing so because of a potential future that’s probably conditional on you having those people in your community is short sighted and self defeating. Ignoring the present doesn’t benefit the future. It benefits the status quo.

(Originally posted here)

Syndicated 2009-11-09 20:56:21 from Matthew Garrett

More GMA500

But is Intel really the party at fault, here?

For shipping a gpu without open drivers? Given that the alternatives involve someone else designing, fabbing and releasing a piece of hardware under Intel's name without being sued in the process, I'm going to have to say "Yes".

(Note that while Moblinzone.com is a website owned by Intel, the writers don't appear to be Intel employees)

Syndicated 2009-10-28 18:05:16 from Matthew Garrett

Asymmetries in offence

I wasn't going to write about this since I thought that Chris's post covered pretty much everything I would have said, but after reading Scott's entry on how people would have interpreted Mark's remarks differently if he'd said "We'll have less trouble explaining to boys what we actually do" instead I realised that people are still confused about the fundamental issue here.

The assumption that Scott's making is that "girls" and "boys" are semantically equivalent in this case. They're not. There's various ways in which the symmetry is broken, but the most basic one is that Mark's a straight man. When the overwhelming stereotype is that "we" as a community are heterosexual males, using "we" as a shorthand for "People who are straight men" is unfortunate because it supports that stereotype. Using "we" as a shorthand for "People who are attracted to men" doesn't. Unsurprisingly, this results in a fairly significant change in who's going to be offended.

Whatever his intentions (and I could easily believe that it was a slip of the tongue), Mark managed to imply that the Linux community is entirely made up of straight men. This is possible because straight men do make up the majority of the Linux community. In contrast, Scott's version doesn't succeed in implying that everyone in the Linux community is attracted to men because it's blatantly obviously not the case, so we know that Scott is using "we" in a different manner. Context is important, and unless you can invert everything else about the situation as well then simply replacing the word "girls" with "boys" doesn't give you any meaningful insight into whether or not people are justifiably offended.

In a more general sense, I'm saddened by this case because I think it's a clear case where the Ubuntu code of conduct could have been used to good effect. "Be excellent to each other"[1] ought to include accepting that you've offended other people without meaning to and making appropriate restitution. If the offence was unintended, an apology should be cheap. Whatever the reality of the situation, failing to provide that apology gives people the impression that either the offence was intended or that Mark doesn't care about those who were offended. That's not a good way to build an inclusive community.

[1] Mako's original summary of the code of conduct

Syndicated 2009-10-12 18:30:07 from Matthew Garrett

Intel IGD opregion and GMA500

A while back, Intel defined a specification for binding ACPI-defined methods for controlling hardware to the OS-specific driver, ensuring that the two don't get out of synchronisation. I added support for this to the in-kernel i915 driver last year, and after a couple of awkwardnesses it works well now. One consequence of this that showed up slightly later is that it's necessary to do some of the setup from the i915 driver rather than the ACPI driver, which meant that we had to defer the ACPI driver from binding until the drm driver had done that setup.

The problem with GMA500 is that it also implements the IGD Opregion spec, and the ACPI video driver detects this and refuses to bind. But the GMA500 kernel driver doesn't implement support for the spec and so doesn't call the function that triggers the ACPI video registration. Working around this is simple - just add acpi_video_register() to the init function of the GMA500 drm. But note that this means that you're failing to implement the spec properly, and there's potential for stuff to be broken. A full implementation of the spec for GMA500 wouldn't be especially difficult, but there's no docs and I have no hardware so I'm not going to do it myself.

The reason I bring this up is that various people have been approaching this problem in a different way. It's easy to assume that the check in the acpi driver was naively assuming that all Intel hardware was driven by i915 and that this patch was broken. It's actually entirely correct and the (out of tree) GMA500 driver was broken. If Intel had made the effort to get their code properly upstream, it'd have been fixed there when the original change was made and nobody would ever have had a problem. Just say no to out of tree drivers.

Syndicated 2009-09-24 18:24:23 from Matthew Garrett


I'm off to Boston in under 16 hours, and I'll be getting into Portland around lunchtime on Monday. I'll be talking at Linuxcon about how we're broadening power management on Linux to be applicable from phones through netbooks up to supercomputers - that's 10:15 on Tuesday. At 10AM on Friday I'll be presenting at the Linux Plumbers conference on how userspace can express its requirements to the kernel more clearly, thereby allowing the kernel to be smarter about powering down hardware. And after a short hop down to SF for the weekend, I'll be back in Portland at the X developers conference talking about the role of X in providing relevant information to the kernel and using that to facilitate more aggressive power management.

Three talks in under 10 days. I'll even do my best to ensure that there's new jokes for each of them.

Syndicated 2009-09-17 00:04:36 from Matthew Garrett


I'm moving to the US on Thursday, so I will be here on Wednesday evening from about 7. If your presence is unlikely to make me stupifyingly angry, feel free to join me.

Syndicated 2009-09-15 11:16:36 from Matthew Garrett

222 older entries...

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!