10:45: Restate my assumptions.
- all software contains bugs.
- all bugs can be represented and understood through files.
- if you graph files against bugs, a pattern emerges. Therefore: We must reduce the number of files.
10:45: Restate my assumptions.
Using qemu to instrument Windows
Part of the problem that we face in providing Linux hardware support is that we're lucky if there's a spec, and even if there's a spec there almost certainly isn't a test suite. Linux still isn't high on the list of things that vendors test with, so as a result hardware and firmware tend to be written to work with Windows rather than some more ideal notion of what a spec actually says.
This can lead to all kinds of problems. If the spec says we can do something and Windows never does that, we'll often find that there's at least one piece of hardware out there somewhere that breaks in response. But sometimes there'll be several different workarounds, and picking the wrong one may break some other system instead. It's helpful to know what Windows is actually doing, if only because that's the only configuration most systems have been tested with.
The problem is that doing this at the hardware level is next to impossible. I'm sure there are people out there who salivate at the possibility of working out i8042 initialisation sequences by hooking up oscilloscopes to their mouse, but I'm not one of them. There's a much easier alternative involving qemu.
The qemu source base is reasonably large and complex, but that's ok - we don't need to care about most of it. For our purposes we really only want to trace accesses to given bits of hardware. There's three main types of hardware access we're likely to care about: io ports, memory mapped io and pci configuration space. io ports are easy. Each piece of qemu that performs hardware emulation will call register_ioport_read or register_ioport_write. Just grep for those, find the ports that correspond to your hardware and then edit the functions to dump the information you want. PCI configuration space will be handled via pci_default_read_config or pci_default_write_config unless the driver overrides them. Finally, for PCI devices mmio will be handled via pci_register_bar - the last argument is the function called when the memory region is accessed.
All of which makes it pretty easy to watch register reads and writes performed by Windows when it's starting up. Suspend/resume is also an interesting problem, but sadly one that's harder to work with. First of all, you need at least version 0.6c of vgabios in order to indicate to Windows that it's possible to suspend at all. Secondly, for Vista or later you'll also need a WDDM driver for the graphics. Sadly there isn't one for the Cirrus that qemu emulates, which is unsurprising given how ancient it is. So I've had to perform my testing under XP, which was enough to give me an indication as to what Windows does with the SCI_EN bit on resume (answer: Ignores both the bit of the spec that says it should already be enabled and the bit of the spec that says it should never be written by hand). Nice, but if someone would like to write a WDDM driver for Cirrus it'd make my life easier.
The other thing I've been testing is keyboard controller probing. Some Macs deal badly with you banging on the keyboard controller, which is dumb but on the other hand they don't claim to have one. Linux will look at your ACPI tables and use any keyboard controllers it finds there, but if there isn't one it'll go on to try probing the legacy locations. Adding some debug code to the read and write functions of the keyboard driver in qemu and then editing the DSDT in Seabios to remove the declarations for the keyboard showed that Windows will only probe if there's a device with a recognised keyboard PNP ID. No keyboard for you if that's missing. So we probably need to emulate that behaviour as well.
The main reason to do this work is to try to reduce the number of DMI tables in the kernel. These are cases where we alter the kernel's behaviour based on the identity string of the hardware, allowing us to work around functional differences. The problem with this approach is that Windows generally works fine on these machines without knowing anything special about them, and chances are that the tables aren't exhaustive - there may well be some other piece of hardware that's also broken, but the user just gave up and went back to Windows instead of ever letting us know. Using qemu to work out how Windows actually behaves gives us the opportunity to fix things up in a way that will (with luck) work on all machines.
It turns out that even failing a Google interview isn't enough to keep Google recruiters away.
PCI power management problems
I've previously written about some of the implementation details of runtime PCI power management. One of the key aspects for PCI devices is that the PME line on the device be able to generate a wakeup event. Unfortunately, it turns out that since Windows makes no use of this functionality at present, some vendors have started failing to wire this up. This is problematic because the device itself still announces PME support and will even raise a PME signal when it gets woken up - but it's pretty much screaming as hard as it can and nobody's listening because somebody's replaced the air with cheese.
Bother, etc.
The obvious question is what to do next. We could poll all the PCI devices every second or so to see if there's a PME status register that's been set. This would be reasonably cheap (we're going to wake up at least once a second on x86 anyway, so it makes no real difference to power consumption) but would introduce obvious latency in the wakeup path. This would be fine for certain types of situation (you're probably not going to be too sad if your SD reader takes a second longer to notice a card insertion) but not others (if an sdio wireless card generates a wakeup interrupt then we really ought to do something about it in a more sensible timeframe). A second is to force users to pass a boottime argument to enable this at all. That kind of sucks.
The third possibility is limited to a subset of hardware, but does allow the introduction of some kind of elegance into what would otherwise be one of those nightmarish scenarios that makes me consider taking up farming instead. If we can trigger a PME ourselves then we can test whether the line is connected. This doesn't appear to be possible on SDHCI but the spec for firewire makes it look like we can do it there - one of the interrupt sources is the port enable/disable register, and we can hit toggle that ourselves. I haven't actually tested this yet, but if it works that would let us make this determination.
It's depressing that doing anything interesting with power management is still heavily determined by what Microsoft have bothered to implement - to the extent that on some machines (hello, Thinkpads) there are no GPE methods at all for PME signals and you don't get runtime power management at all. The only thing that saves us is that (a) it's pretty hard to be able to screw up stuff that's already all glued into one chip package, so integrated stuff like USB should be fine anyway, and (b) PCIe does this as part of the normal traffic stream so there's no way to get that wrong. Unless you're still missing the GPE methods for the signals (hello, Thinkpads) in which case you get nothing unless the native PCIe signalling works. I suspect that we can always force that, so there's some hope yet.
In summary, then: dispassionate.
Radeon reclocking update
The code I mentioned here is now all in the drm-radeon-testing branch of drm-2.6.git.
Radeon reclocking
Alex released another set of Radeon power management patches over the weekend, and I've been adding my own code on top of that (Alex's patches go on top of drm-next, mine go on top of there). I've left it stress-testing for a couple of hours without it falling over, which tells me that it's stable enough that I can feel smug. This is a pleasing counterpoint to the previous experiences I've been having, which have been rife with a heady mixture of chip lockups or kernel deadlocks. It turns out that gpus are hard.
There's a few things you need to know about gpus. The first is that if they're discrete devices they typically have their own memory controller and video memory. The second is that there's an impressive number of ways that you can end up touching that memory. The third is that they tend to get upset if something tries to touch that memory and the memory controller is in the middle of reclocking at the time.
The first and most obvious use of video memory is by the gpu itself. Accelerated operations on radeon are carried out by sending a command packet to the command processor. This is achieved by sharing a ring buffer between the cpu and the gpu, with the gpu reading packets out of that ring buffer and performing the operations contained within them. Many of these operations will touch video memory (that being the nature of most things you want a gpu to do), and if that happens bad things occur. Like the card locking up and potentially taking your PCI bus with it.
So, obviously, we don't want that to happen. The first thing we do is take a mutex that blocks any further accelerated operations from being submitted by userspace. Then we wait until we get an interrupt from the gpu telling us that the display engine has gone idle. The problem here is that we don't have a terribly good idea of how many more operations there are to complete and we don't know how long each of those operations is going to take, but this is less bad than some of the alternatives[1]. Jerome Glisse has some ideas on how to improve this to require less waiting, but the effects should still be pretty much invisible to the average user.
So we've stopped the command processor touching ram. Everything's good, right?
Well, not really. The obvious problem is that users typically want to display something, so there's a separate chunk of chip that's repeatedly copying video memory over to your monitor. That's got to go too. Thankfully, there's a convenient bit in the crtc registers that lets us turn that off, but the pretty unsurprising downside is that your screen goes blank while that's happening. So we don't want to do that. Instead, we try to perform the reclock while there's nothing being displayed on the screen - that is, while we're in the screen region where a crt's electron gun would be scanning back from the bottom of the screen to the top. It turns out that rather a lot of display assumptions depend on this happening even if there's no crt, no electron gun and no thick sheet of glass with a decent approximation of vacuum behind it, so we get to do this even if we're displaying to an LVDS. And we have about 400-500 microseconds to do it - an almost generous amount of time.
So we ask the hardware to generate an interrupt when we enter vblank and then we reclock. Except the hardware has an irritating habit of lying - sometimes we get the interrupt a line or two before vblank, sometimes we get it after we've already gone out the other side. Vexing, and not entirely solved yet - so sometimes you'll still get a single blank frame during reclock. But there are plans, and they'll probably even work.
At this point the acceleration hardware isn't touching the memory and the scanout hardware isn't touching the memory. Except it still crashes under some workloads. This one took me longer to track down, but the answer turned out to be pretty straightforward. Not all operations are accelerated. When they're not accelerated they have to be done in software. That means that the CPU has to write to the video memory itself. I'm sure you can see where this is going. This was fixed without too much trouble once I'd finished picking through the driver to work out every location where objects might be mapped into the CPU's address space, at which point it's a simple matter of unmapping them and blocking the fault handler from remapping them until the reclock is finished. Linux, thankfully, has lots of synchronisation primitives. And now everything works.
Except when it doesn't. This took a final period of head scratching, followed by the discovery that ttm (the memory allocator used by radeon) has a background thread that would occasionally fire to clean up objects. And by clean up objects, I mean change their mapping - which means updating their status in the gart, which means touching video memory. So, let's block that. And that tripped me off to the fact that even if it couldn't submit new commands, the CPU could still create or destroy objects - with the same consequences.
So, once all of these are blocked, video memory is quiescent and we can do what we want. And we do, at least once I'd sorted out the bits where I was taking locks in the wrong order and deadlocking. Depending on the powerplay tables available on your card we'll chose different rates and so your power savings will vary heavily depending on the values that your vendor provided, but the card I'm testing on sees a handy 30W drop at idle. Right now we're only changing clocks and not dropping voltage so there's potentially a little more to come.
While getting this stable was pretty miserable, the documented entry points for clock changing made a lot of this easier than it would otherwise have been. It's also probably worth noting that Intel's clock configuration registers are entirely missing from any public documentation and the dirver Intel submitted to make them work in their latest chips appeared to have been deliberately obfuscated, so thanks to AMD for making all of this possible.
[1] It's possible to insert various command packets that either indicate when they've passed or stall until a register value gets updated, but these either cause awkward problems with the cp locking or mean that the gui idle functionality never goes idle, so they're not ideal either.
Looks like I picked the wrong week to give up crystal meth
One of the features of Windows 7 is that hitting windows+p will pop up a little dialog that allows you to configure your active display outputs. This is an improvement over previous versions of Windows, which would generally instead have a variety of random vendor-specific tools that would function differently, look ugly and make you cry. So, hurrah to Windows for moving into the 21st century.
Most laptops have a display switch key. This is sent in a variety of ways, generally either via the keyboard controller, via WMI or via ACPI. In Linux we take all of these events and turn them into KEY_SWITCHVIDEOMODE, making it easy to implement standardised behaviour.
This is, obviously, far too straight forward.
Microsoft, rather than introducing an input mechanism that allows all of these events to hook into the windows+p infrastructure, provide the following recommendation:
As documented in this Launchpad bug, vendors are starting to do this. It's been seen in HP and Dell machines, at least, and it's presumably going to become more widespread.
So, if your display switch button now just makes the letter "P" appear, say thanks to Microsoft. There's a range of ways we can fix this, although none of them are straightforward and those of you who currently use left-Windows-p as a keyboard shortcut are going to be sad. I would say that I feel your pain, but my current plan is to spend the immediate future getting drunk enough that I stop caring.
(The good news is that the same set of recommendations says that you can no longer put a Windows sticker on a monitor unless it has a valid and accurate EDID. The bad news is that that implies that you've previously been able to put a Windows sticker on a monitor without it having a valid and accurate EDID)
Nook update (again)
Barnes and Noble released the nook source code last week. This includes the code to busybox, uboot and their kernel. Unfortunately, the uboot and kernel code both appear to be missing swathes of code found statically linked in the binaries that they're distributing. License compliance is hard, let's flail wildly.
You know it's a bad day when:
ld gives you "Can not allocate memory".
(turned out to be a corrupt object file)
Pittsburgh
As I mentioned, I headed to Pittsburgh last week to give some talks at CMU and find out something about what they're doing there. Despite the dire weather that had closed the airport the day before, I had no trouble getting into town and was soon safely in a hotel room with a heater that seemed oddly enthusiastic about blasting cold air at me for ten seconds every fifteen minutes. Unfortunately, it seems that life wasn't as easy for everyone - ten minutes after I arrived, I got a phone call telling me that the city had asked CMU to cancel classes the next day.New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.
Keep up with the latest Advogato features by reading the Advogato status blog.
If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!