Back in the APM days, everything was easy. You called an ioctl on
/dev/apm, and the kernel made a BIOS call. After that, it was all up to
the hardware. Sure, it never really worked properly, and it was
basically impossible to debug what the hardware actually did. And then
ACPI came along, and nothing worked at all. Several years later, we're
almost back to where we were with APM. But what's actually happening
when you hit that sleep key?
Without the ability to suspend and resume, laptop users are doomed to
spend several hours of their lives waiting for machines to boot and
shutdown. This is, clearly, suboptimal. APM made it fairly easy to
implement this, because almost everything was handled by the BIOS. And
that, in a nutshell, is one of the primary reasons why ACPI ended up in
charge.
The biggest problem with APM is that it left policy in hardware. Don't
want to suspend on lid closure? The OS doesn't get any say in the
matter, though if you're lucky there might be a BIOS option to control
it. Would prefer it if the BIOS didn't scribble all over the contents of
your video registers while it tries to reprogram them (probably back to
the defaults of the Windows drivers...)? Sucks to be you. Want the sleep
button to trigger suspend to disk, not suspend to RAM? A-ha ha ha.
ACPI deals with that problem, by moving almost all the useful
functionality out of hardware. The downside of this is that the
functionality needs to be reimplemented in the OS. Which, given that the
ACPI spec is around 600 pages long, has taken a little time.
(Of course, it turns out that most of the ACPI spec is entirely
uninteresting for suspend and resume purposes, but that's not really the
point right now)
So, firstly, lets have some ACPI jargon. ACPI itself stands for
"Advanced Configuration and Power Interface". It's not just a power
management spec - it provides the OS with a description of all the
built-in hardware in your system, along with a certain degree of
abstraction. It gives you information about interrupt routing, tells you
if someone's just removed a hot-pluggable DVD drive from a laptop and
may even let you control which video output is being used.
This information is provided in a table called the DSDT (Discrete System
Descriptor Table). The DSDT is in a bytecode called AML (ACPI Machine
Language), compiled from a simple language called ASL (ACPI Source
Language, shockingly enough). At boot time, the system reads the DSDT,
parses it and executes various methods. These can do pretty much
anything, but on the bright side they're being executed in kernel
context and (in principle) you can filter out anything that you really
don't want to do (such as scribbling all over CMOS or something).
The final relevant piece of ACPI information is something called the
FADT, or Fixed Address Descriptor Table. This gives the OS information
about various register addresses. It's a static structure, and doesn't
contain any executable code.
So, how does all of this stuff actually work?
First of all, the user hits the sleep key. This triggers a hardware
interrupt, which is caught by the embedded controller. That pokes a
register in the southbridge, which flags that a general purpose event
has just occured. The OS notices this, and checks the DSDT for what's
supposed to happen next. Generally, this just calls a notification
event. This is bounced back out to userspace via /proc/acpi/events
(currently, though it's going to be moved to the input layer in future)
and userspace gets to choose what happens next.
Let's concentrate on the common scenario, which is that someone hitting
the sleep button wants to suspend to RAM. Via some abstraction (either
acpid, gnome-power-manager or kpowersave or something), userspace makes
that decision and initiates the suspend to RAM process by either calling
a suspend script directly or bouncing via HAL.
Depending on distribution, this ends up running a shell script or binary
which attempts to prepare the system for suspend. Right now, this tends
to involve a bunch of bandaids around various broken drivers - unloading
modules and reloading them is one of the easiest workarounds for
breakage. Finally, the string "mem" is written to /sys/power/state.
This jumps back into the kernel. First, userspace is stopped. This stops
it getting horribly confused when a load of hardware mysteriously stops
working. Then the kernel goes through the device tree and calls suspend
methods on each bound driver. Individual drivers have responsibility for
storing enough state in order to be able to reprogram the device on
resume - ACPI doesn't make guarantees about what the hardware state is
going to be when we come back. Once the kernel-side suspend code has
been run, we execute a couple of ACPI methods - PTS (Prepare To Sleep)
and GTS (Going To Sleep). These tend to poke various things that the
kernel knows nothing about, and so a certain amount of magic may be
involved.
At this point, the system should be fairly quiescent. Only two things to
do now. Firstly, the address of the kernel wakeup code is written to an
address contained in the FADT. Secondly, two magic values from the DSDT
are written to registers described in the FADT. This usually causes some
sort of system management trap, which makes sure that the memory is put
in self-refresh mode and actually sequences the machine into suspend.
For the S3 power state, this basically involves shutting the machine
(other than the RAM) down completely.
Time passes.
The user presses the power button. The system switches on, jumps to the
BIOS start address, does a certain amount of setup (programming the
memory controller and so on) and then looks at the ACPI status register.
This tells it that the machine was previously suspended to RAM, so it
then jumps to the wakeup address programmed earlier. This leads it to a
bunch of real-mode x86 code provided by the kernel, which programs the
CPU back into protected mode and restores register state. Suddenly we're
running kernel code again.
From this point onwards, it's much the reverse of the suspend process.
We call the ACPI WAK method, resume all the drivers and restart
userspace. The shell script suddenly starts running again and cleans up
after itself, reloading any drivers that were unloaded before suspend.
As far as userspace is concerned, the only thing that's happened is that
the clock has jumped forward.
So why is this difficult?
In a lot of cases, it's just down to bugs in the drivers. Restoring
hardware state can be hard, especially if you don't actually have all
the documentation for the hardware to start with - traditionally, many
Linux drivers have ended up depending on the BIOS to have programmed the
hardware into a semi-sane state, and there's no guarantee that that will
happen with ACPI. Other cases can just be oversights - for instance, the
bug in the APIC (not to be confused with ACPI) code that meant a single
register wasn't restored, resulting in some machines resuming without
any interrupts being delivered.
The single biggest problem is video hardware. The spec doesn't require
the BIOS to reprogram the video hardware at all, and so often it'll come
back in an entirely unprogrammed state. This is an issue, since we (in
general) have absolutely no idea how to bring a video card up from
scratch. One of the easiest workarounds is to execute code from the
video BIOS in the same way that the system BIOS does on machine startup.
vbetool lets you do this from userspace, and it works a surprisingly
large amount of the time. However, there's no guarantee that it'll be
successful. Vendors often unmap that section of BIOS after the system
has been brought up, since they've got far more BIOS code than will fit
in the BIOS region of the legacy address space. In the long run, the
only solution is drivers that know how to program an entirely
uninitialised chip. The new modesetting branch of the Intel driver aims
to do this, as do the developers of noveau.
Despite all this misery, ACPI support is generally improving. Most
machines can now suspend and resume once more. The next big challenge is
improving run-time power management in order to get battery life to at
least the level it is under Windows, and ideally beyond that.
we have to deal with this, a lot - nothing to do with ACPI, on the reverse-engineering project to get linux running on HTC (high tech corporation) handhelds (smartphones and pdas). it's the one major irritating factor that can stop a device from being useful.
i managed to get linux running entirely on the ipaq hw6915 in just six weeks (because of the common hardware between the htc universal, ipaq hx4700 and a couple of others).
however, i spent a very frustrating further two weeks trying to work out what it was that was stopping the device from being able to resume: exactly as you say, interrupts weren't being enabled [in the end i had to give up as i was running out of time]
now the really annoying thing is that it is terribly difficult to track down why this is.
firstly, you're resuming: you _can't_ do any significant debugging: it either works, or it doesn't. if you get it right, it resumes. if you don't get it right, you've no way of communicating anything to find out why.
secondly, the booting is coming not from startup but from gnu-haret.exe which is the ARM / wince equivalent of LOADLIN.EXE for x86 / win32. so, you're booting into linux with most of the hardware preinitialised. on some devices, it is an absolute _bitch_ to work out the GPIO pin requirements, some of which require switching to alternate states and back with special timings in between to let buggy or glitchy hardware recover, because there's not enough current or something - you just don't know.
and on one device, 272 GPIO pins (192 on the CPU, 16 on a chip we've named EGPIO for 'extended gpio', and 64 on a separate custom I/O chip which we've named ASIC3) ... actually weren't enough, so the designers had to _borrow_ some of the I/O pins on the GSM radio rom (another 64 or so gpio pins - it's another ARM processor) and so you have to, unbelievably, communicate proprietary commands over a _serial_ line to specify which set of speakers are to be used! (yes, this device i'm describing i think it has 5 speakers 2 of which are stereo speakers and 3 microphones and a headphone socket _and_ bluetooth audio _and_ a car-kit)