Older blog entries for mjg59 (starting at number 175)

2 Dec 2008 »

Thintel

ICH8 docs, section 5.13.5.1:

To eliminate the audible noise caused by aggressive voltage ramps when exiting C4 states at a regular, periodic frequency, the ICH8 supports a method to slow down the
voltage ramp at the processor VR for certain break events.

Sounds good. Section 9.8.1.6:

C4_TIMING_CNT: Bit 6 Slow-C4 Exit Enable —When 1, this bit enables the Slow-C4 Exit functionality.

Well, hey, we can quirk this on and make life better for everyone oh wait section 9.8.1.4:

Bit 0 C-STATE_CONFIG_LOCK: When set to 1, this bit locks down the C-State configuration parameters. The following configuration bits become read-only when this bit is set:
...
The entire C4 Timing Control Register (C4_TIMING_CNT)

Thanks, Intel. Thintel.

Syndicated 2008-12-02 03:28:00 from Matthew Garrett

1 Dec 2008 »

Aigars:

1703 called. They wanted to let you know that you haven't been paying attention to modern linguistic trends.

Syndicated 2008-12-01 21:49:07 from Matthew Garrett

25 Nov 2008 »

Unexpected results

On my test setup, disabling the blinking cursor in Gnome saves me 2 Watts.

Syndicated 2008-11-25 16:06:33 from Matthew Garrett

24 Nov 2008 »

Good power management practices

Based on feedback from my last couple of posts, I've written a guide to some good power management practices. It's aimed at people implementing desktop environments and power management interfaces for the most part, rather than end users or driver developers. Any suggestions for additions or corrections happily received.

I'm not aware of any OS that currently gets all of these correct.

Syndicated 2008-11-24 15:48:40 from Matthew Garrett

23 Nov 2008 »

Nngh.

Sebastian Kügler dissects my dissection, except in fact he appears to be responding to a pile of stuff I didn't write.

First of all, Matthew assumes that the setting "powersave" will actually use the powersave cpufreq governor - citation needed.

the "conservative" governor, which indeed is an "ondemand" governor, meaning it gives you the extra needed CPU power when you ask for it - more slowly than ondemand does, resulting in extra power draw. Never run conservative unless your P-state latency is so high that ondemand won't work.

Powerdevil doesn't offer such option, so Matthew will be happy about this item - I didn't say it did. However, it is a common myth that thermal management is part of the job a power management policy should play. CPU frequency changing is a mechanism that can be driven by two policies - power management and thermal management. The fact that they share a mechanism does not mean that these policies should be conflated. Really. I wasn't just talking about Powerdevil here.

"3d takes more energy than 2d" is something I didn't actually say. I brought this up because various people have advocated disabling compiz to save power and because Powerdevil appears to have a "Disable Kwin compositing" button.

Lesson learned: Don't have an immediate emotional response to perceived criticism and fail to actually read what you're responding to. I would have commented on Sebastian's entry directly, but he forbids comments, so fuck that.

Syndicated 2008-11-23 16:49:43 from Matthew Garrett

23 Nov 2008 »

Making sure we do power management the right way

I saw a posting about PowerDevil, the new KDE power management interface. It's somewhat disheartening - for the most part, it falls into the same traditional errors made in power management (ie, letting you change cpufreq settings, using "presentation" as a power management setting rather than getting applications to actually do the right thing). But my point isn't to bitch about PowerDevil. My concern is more about why we're still failing to get the message across about certain power management myths. Implementing power management incorrectly leads to wasted power, dead polar bears and wet carpets. It's important that we get this right.

To a first approximation, the Powersave governor will only save you power if you're playing 3D games. The performance governor will basically never give you extra performance. Don't use them. Use ondemand instead. Do not make it easy for your users to choose them. They will get it wrong, because it is difficult to explain why this result is true.
Thermal management is not the job of a power manager. Using power management to implement thermal management will result in your computer taking up more power overall, reducing battery life. Implement things in the right places.
When people say "Presentation mode", what they mean is "Disable screensaver and automatic suspend to RAM". This is not a power management policy. It is an application behaviour policy. The sensible behaviour is for applications to request that these things be disabled when they switch to full-screen presentation mode. The foolish behaviour is to request that your users select "Presentation mode" in their desktop and then start their presentation in their application and then finish their application and then remember to disable "Presentation mode".
It takes no more energy to scan out a framebuffer drawn with the 3D engine than with the 2D engine. If you're spending a significant proportion of the time on the GPU when in your desktop environment, you're already doing something wrong.

One of the problems is probably that most desktop programmers don't know how hardware behaves. How can we change that?

Syndicated 2008-11-23 14:45:30 from Matthew Garrett

18 Nov 2008 »

Aggressive graphics power management

My current desktop PC has an RS790-based Radeon on-board graphics controller. It also has a Radeon X1900 plugged in. Playing with my Watts Up, I found that the system was (at idle!) drawing around 35W more power with the X1900 than with the on-board graphics.

This is clearly less than ideal.

Recent Radeons all support dynamic clock gating, a technology where the clocks to various bits of the chip are turned off when not in use. Unfortunately it seems that this is generally already enabled by the BIOS on most hardware, so playing with that didn't give me any power savings. Next I looked at Powerplay, the AMD technology for reducing clocks and voltages. It turns out that my desktop hardware doesn't provide any Powerplay tables, so no joy there either. What next?

Radeons all carry a ROM containing a bunch of tables and scripts written in a straightforward bytecode language called Atom. The idea is that OS-specific drivers can call the Atom tables to perform tasks that are hardware dependent, even without knowledge of the specific low-level nature of the hardware they're driving. You can use Atom to do several things, from card initialisation through mode setting to (crucially) setting the clock frequencies. Jerome Glisse wrote a small utility called Atomtools that lets you execute Atom scripts and set the core and RAM frequencies. Playing with this showed that it was possible to save the best part of 5W by underclocking the graphics core, and about the same again by reducing the memory clock. A total saving of 9-10W was pretty significant.

The main problem with reducing the memory clock was that doing it while the screen is being scanned out results in memory corruption, showing up as big ugly graphical artifacts on the screen. I'm a fan of doing power management as aggressively as possible, which means reclocking the memory whenever the system is idle. Turning the screen off to reclock the memory would avoid the graphical corruption but introduce irritating flicker, so that wasn't really an option. The next plan was to synchronise the memory reclocking to the vertical refresh interval, the period of time between the bottom of a frame and the top of the next frame being drawn. Unfortunately setting the memory frequency took somewhere between 2 and 20 milliseconds, far too long to finish inside that time period.

So. Just using Atom was clearly not going to be possible. The next step was to try writing the registers directly. Looking at the R500 register documentation showed that the MPLL_FUNC_CNTL register contained the PLL dividers for the memory clock. Simply smacking a new value in here would allow changing the frequency of the memory clock with a single register write. It even worked. Almost. I could change the frequency within small ranges, but going any further resulted in increasingly severe graphical corruption. Unlike the sort I got with the Atom approach to changing the frequency, this corruption manifested itself as a range of effects from shimmering on the screen down to blocks of image gradually disappearing in an impressively trippy (though somewhat disturbing) way.

Next step was to perform a register dump before and after changing the frequencies via Atom, and compare them to the registers I was programming. MC_ARB_RATIO_CLK_SEQ was consistently different, which is where things got interesting. The AMD docs helpfully describe this register as "Magic field, please use the excel programming guide. Sets the hclk/sclk ratio in the arbiter", about as helpful as being told that the register contents are defined by careful examination of a series of butterflies kept somewhere in Taiwan. Now what?

Back to Atomtools. Enabling debugging let me watch a dump of the Atom script as it ran. The relevant part of the dump is here. The most significant point was:

MOVE_REG @ 0xBC09
src: ID[0x0000+B39E].[31:0] -> 0xFF7FFF7F
dst: REG[0xFE16].[31:0] <- 0xFF7FFF7F

, showing that the value in question was being read out of a table in the video BIOS (ID[0x0000+B39E] indicating the base of the ROM plus 0xB39E). Looking further back showed that WS[0x40] contained a number that was used as an index into the table. Grepping the header files gave 0x40 as ATOM_WS_QUOTIENT, containing the quotient of a division operation immediately beforehand. Working back from there showed that the value was derived from a formula involving the divider frequencies of the memory PLL and the source PLL. Reimplementing that was trivial, and now I could program the same register values. Hurrah!

It didn't work, of course. These things never do. It looked like modifying this value didn't actually do anything unless the memory controller was reinitialised. Looking through the Atom dump showed that this was achieved by calling the MemoryDeviceInit script. Reimplementing this from scratch was one option, but it had a bunch of branches and frankly I'm lazy and that's why I work on this Linux stuff rather than getting a proper job. This particular script was fast, so there was no real reason to do it by hand instead of just using the interpreter. Timing showed that doing so could easily be done within the vblank interval. This time, it even worked.

I've done a proof of concept that involved wedging this into the Radeon DRM code with extreme prejudice, but it needs some rework. However, it demonstrates that it's possible to downclock the memory whenever the screen is idle without there being any observable screen flicker. Combine that with GPU downclocking and we can save about 10W without any noticable degradation in performance or output. Victory!

I gave the code to someone with an X1300 and it promptly corrupted their screen and locked their machine up. Oh well. Turns out that they have a different memory controller or some such madness.

So, obviously, there's more work to be done on this. I've put some test code here. It's a small program that should be run as root. It should reprogram an Atom-based discrete graphics card[1] to half its memory clock. Running it again will halve it again. I don't recommend doing that. You'll need to reboot to get the full clock back. This isn't vblank synced, so it may introduce some graphical corruption. If the corruption is static (ie, isn't moving or flickering) then that's fine. If it's moving then I (and/or the docs) suck and there's still work to be done. If your machine hangs then I'm interested in knowing what hardware you have and may have some further debugging code to be run. Unless you have an X1300, in which case it's known to break and what were you thinking running this code you crazy mad fool.

Once this is stable it shouldn't take long to integrate it into the DRM and X layers. I'm also trying to get hold of some mobile AMD hardware to test what impact we can have on laptops.

[1] Shockingly enough, it's somewhat harder to underclock graphics memory on a shared memory system

Syndicated 2008-11-18 20:20:25 from Matthew Garrett

15 Nov 2008 »

And another thing

I swear I'm going out in a minute, but:

Running strings on the firmware for a Dlink wireless bridge I have gives output that includes the following:

From isolation / Deliver me o Xbox - / Through the ethernet
Copyright (c) Microsoft Corporation. All Rights Reserved.
Device is Xbox Compatible. Copyright (c) Microsoft Corporation. All Rights Reserved.

. This confused me for a while until I plugged it into an Xbox 360 and discovered that despite it being plugged into the ethernet port, I could control the wifi options including network selection and encryption method. Does anyone have the faintest idea how this is implemented? A tcpdump of the Xbox booting reveals some ICMP6 packets, a bunch of DHCP and some uPNP service discovery. uPNP seems like a plausible option, but I've got no idea how to probe a device for uPNP services using Linux. Has anyone played with reverse engineering this stuff? Googling didn't seem to show anything up.

Syndicated 2008-11-15 20:36:02 from Matthew Garrett

15 Nov 2008 »

Hybrid suspend

One often requested feature for suspend support on Linux is hybrid suspend, or "suspend to both". This is where the system suspends to disk, but then puts the machine in S3 rather than powering down. If the user resumes without power having been removed they get the benefit of the fast S3 resume. If not, the system resumes from disk and no data is lost.

This is, clearly, the way suspend should work. We're not planning on adding it by default in Fedora, though, for a couple of reasons. The main reason right now is that the in-kernel suspend to disk is still slow. Triggering a suspend to disk on a machine with gigabytes of RAM (which is a basic laptop configuration these days) will leave you sitting there for an extended period of time when all you actually want to do is pick your machine up and leave. Fixing this properly is less than trivial. TuxOnIce improves the speed somewhat, at the expense of being a >500k patch against upstream that touches all sorts of interesting bits of the kernel such as the vm system. We're not supporting that for fairly obvious reasons. But even then, the suspend to disk process involves discarding some pages. Those need to be pulled in from disk again on resume. With the current implementation, suspend to both is fundamentally slower than suspend to RAM for both the suspend and resume paths.

So, what other approaches are there? One is to resume from RAM some period of time after suspending and then if the battery is low suspend to disk. Many recent machines will automatically resume when the battery level becomes critical. If the hardware doesn't support that, we can wake up after a set period and measure the battery consumption, set a new alarm and go to sleep again. The downside of this approach is that your system wakes up and does stuff without you being aware of it, which may be bad if it's inside a neoprane cover at the time. Cooking laptops is generally considered unhelpful.

Using the kexec approach to hibernation provides a more straightforward way of handling the problem. The fundamental problem with the existing approach is that it ties suspend into the vm system and involves making atomic copies of RAM into other bits of RAM. kexec would allow us to pre-allocate enough space on disk to save RAM as-is, and then simply kexec into a new kernel and dump RAM to disk without any of the tedious shrinking required first. Resuming from S3 would kexec back into the old kernel, whereas losing power would just fall back to reading off disk. The extra time taken on the S3 path would be minimal.

In an ideal world we'd adopt the Vista approach where "off" is synonymous with suspend. There's still more work to be done on enhancing reliability before that can be achieved, though.

Syndicated 2008-11-15 18:35:11 from Matthew Garrett

15 Nov 2008 »

Adventures in PCI hotplug

I played with an Eee for a bit last time I was in Boston, culminating in a patch to make the eeepc-laptop driver use standard interfaces rather than just having random files in /sys that people need to write custom scripts to use. The world became a better place.

However. Asus implemented the rfkill control on the Eee in a slightly odd way. Disabling the wifi actually causes the entire card to drop off the bus, similar to how Bluetooth is normally handled. The difference is that the Bluetooth dongles are almost exclusively USB, while the Eee's wifi is PCI. Linux supports hotplugging of PCI devices, but nothing seemed to work out of the box on the Eee. Another case of this was the SD reader in the Acer Aspire One. Unless a card was present in the slot during boot, it simply wouldn't appear on the PCI bus. It turned out that Acer have implemented things in such a way that removing the card results in the entire chip being unplugged. This was when I started looking more closely into how this functionality is implemented.

The two common cases of PCI hotplug are native PCIe hotplug and ACPI mediated hotplug. In the former case, the chipset generates an interrupt when a hotplug event occurs and the OS then rescans the bus. This is a mildly complicated operation, requiring enabling the slot, checking whether there's a card there, powering the card and all its functions up, waiting for the PCIe link to settle and then announcing the new PCI device to the rest of the OS. ACPI-mediated hotplugging puts more of the load on the firmware rather than the OS - the hotplug event generates a notify message that is caught by the ACPI interpreter in the OS, allowing the OS to check for device presence by calling another ACPI method. If the device is present it's then a simple matter of telling the PCI layer about it.

Native PCIe hotplug has the advantage that there's much less vendor code involved. ACPI is still involved to an extent - an _OSC method on the PCIe bridge is called to allow the OS to tell the firmware that it supports handling hotplug events. This allows the firmware to stop sending any ACPI notifications. ACPI hotplugging requires more support in the firmware, but can work for PCI as well as PCIe.

The general approach taken to getting the Eee's wifi hotplugging to work has been to load the pciehp driver with the pciehp_force=1 argument. This tells the driver to listen for hotplugging events even when there's no _OSC method to tell the firmware that the OS is handling things now. Since the hardware will generate the event anyway, things work. However, this is non-ideal. Some hardware exists where ACPI hotplugging will work, but due to quirks in the hardware design native PCIe hotplugging control will fail. This has been handled in their firmware by having the _OSC method fail, signalling to the pciehp driver that it shouldn't bind to the port. Using pciehp_force overrides that, leading to a situation where hardware could potentially be removed from a port that's powered up. Unfortunate.

My first approach was to add a new argument to pciehp called pciehp_passive. This would indicate to the pciehp driver that it should only listen for notifications from the hardware. User-triggered events would not be supported, avoiding the situation where anyone could remove the card by accident. This worked on my test machine (an Eee 901 somewhere in Ottawa, since I don't actually have one myself...) but was reported to work less well on a 700. Since the 700 didn't claim to have any support for power control, the code was forced to wait a second on every operation to see whether the link powered up or not. This resulted in long pauses during boot and suspend/resume operations.

The final issue that convinced me that this was the wrong approach was reading a document on Microsoft's site on how PCIe hotplugging is implemented in Windows. It turns out that XP doesn't support native PCIe hotplugging at all - that feature was added in Vista. Both the Eee and the Aspire One are available with XP, but things work there. So PCIe native hotplugging was clearly not the right answer. Time to look further.

Armed with a disassembly of the Aspire One's DSDT, I figured out why the ACPI hotplug driver didn't work on it. The first thing the driver does is walk the list of ACPI devices, looking for any that are removable. That was being implemented by looking for an _EJ0 method. _EJ0 indicates that the device can be ejected under the control of the OS. The Aspire One doesn't have an _EJ0 method on its SD readers. However, it did have an _RMV method. This can be used to indicate that a device is removable but not ejectable - that is, the device can be removed (by physically pulling it out or by the hardware taking it away itself), but there's no standard way to ask the OS to logically disconnect it. A quick patch to acpiphp later and the Aspire One now worked without any forcing or spec contravention. This also has the nice side effect of making expresscard hotplug work on a bunch of machines where it otherwise wouldn't.

But back to the Eee. acpiphp still wasn't binding, and a closer examination revealed why. There's nothing to indicate that the Eee's ports are hotpluggable, and there's no topological data in the ACPI tables that ties the wifi function to the PCIe root bridges. However, the Eee firmware was sending an ACPI notification on wifi hotplug. But it was only sending this to the PCIe root bridges, and there's no way to then tell which device had potentially appeared or vanished.

In the end, I gave up on trying to solve this generically. Instead I've got a patch that implements the hotplugging entirely in eeepc-laptop. In an ideal world nobody else will have implemented this in the same way as Asus and we can all be happy.

Syndicated 2008-11-15 18:01:04 from Matthew Garrett

166 older entries...