This week I'm in Mountain View for the X Developers' Conference, where I gave a presentation on things graphics drivers have to care about in order to make suspend/resume work reliably. Various people suggested that I write up an explanation of the current situation, along with some best practices for manufacturers.
One of the major differences between the ACPI specification and the older APM specification is that saving and restoring hardware state is generally left up to the operating system, with the firmware playing a much more passive role. Linux is now able to handle this state restoration for the majority of hardware, but we still have issues with graphics. Modern graphics hardware can be hugely complicated, which to a large extent we've been able to ignore because the platform will initialise it on boot. Many X drivers know how to switch a card from text mode to graphics mode (and vice-versa), but that's not adequate.
What can make this situation even more awkward is that the ACPI specification makes no guarantees about what state the hardware comes up in. It may still be in PCI D3. It may be powered up, but the registers may all be zeroed. It's even valid for the hardware to come up in text mode. A driver has to be able to deal with all of these situations.
So far, we've mostly tried to deal with this situation by running chunks of the video BIOS with applications like vbetool. This is hacky and often works badly - in some cases, it won't work at all (nvidia rewrite the entry point to their video BIOS after boot, in order to prevent situations where someone attempts to POST the card and it tries to jump to bits of the system BIOS that no longer exist). The only long-term option is for the kernel to handle the graphics suspend/resume. There's several levels at which this can be supported:
- Able to restore text mode. This is basically adequate right now, since the kernel enforces a virtual terminal switch around suspend/resume.
- Able to restore arbitrary video mode. In the kernel modesetting world, we want to be able to go straight into the mode that we're running X in, in order to reduce screen flicker. This would probably still be surrounded with a VT switch out of and into X.
- Able to jump straight into X. Right now, on x86 the process freezer will prevent X from running until after all of the devices have been resumed. As we move to a model where we remove the freezer and expect drivers to be able to block userspace tasks, we hit a pretty obvious problem with X - right now it hits the graphics hardware directly, rather than going via the kernel. With nothing to block it, it'll hit the hardware while it's still in an undefined state and then become very unhappy. This can't really be fixed until X does all its hardware access through the kernel. For now, we can inhibit X by leaving the VT switching in place.
It's not all plain sailing, though. Vendors can wire up hardware in different ways. AMD and Nvidia hardware provides scripts in the ROM that, in theory, abstract this away. The downside to this is that we'll have to put the interpreters for these scripts in kernel space, though this is probably needed for kernel modesetting work anyway. Even then, it's possible that we'll need workarounds for specific pieces of hardware that don't quite behave as we expect them to.
As far as making graphics work over suspend/resume, we have a pretty good idea where we are now (failing on anything other than modern Intel hardware), where we need to be in the future (triumph, huge success, cake) and how to get there (?). However, hardware manufacturers can still help.