Older blog entries for mjg59 (starting at number 153)

Testing 2.6.27-rc2 with the current released (not development) BIOS on the Foxconn G33M reveals the following:

  • There are no ACPI errors on boot, other than the (irrelevant) OEMB table (there are in previous kernels, stuff's clearly been fixed in .26 or so. Can't really be bothered digging through to find out what)
  • The system fails to reboot if it has been suspended and resumed. The fix is three lines long, one of which is a comment and one of which is blank.
  • The system is otherwise perfectly stable.
Summary: Almost all problems caused by bugs in Linux, one problem caused by BIOS vendors interpreting the ACPI specification differently to the Linux implementation and trivially worked around. No sabotage.

Thanks very much to Carl at Foxconn for being able to get me information about what was causing the reboot issue - I spent significantly longer putting the system together than I did fixing it.

Syndicated 2008-08-06 18:13:03 from Matthew Garrett

Coincidentally, I had the opportunity to poke at a machine that actually does deliberately treat Linux differently in its ACPI tables today. Jeremy was poking at an Acer Aspire One before installing Fedora on it, and Dave noticed that it printing a bootup message indicating that the firmware was testing for _OSI("Linux"). A bit of poking later, and we have the following:

            If (_OSI ("Linux"))
            {
                Store (0x03E8, OSYS)
                Store (0x0A, \_SB.PCI0.LPC.S4TM)
                Store (0x43, \_SB.PCI0.EXP2.PXS2.LSMP)
                Store (One, \_SB.PCI0.EXP2.LL0S)
                Store (One, \_SB.PCI0.EXP2.LLL1)
            }
            Else
            {
                If (_OSI ("Windows 2006"))
                {
                    Store (0x07D6, OSYS)
                    Store (0x06, \_SB.PCI0.LPC.S4TM)
                    Store (Zero, \_SB.PCI0.EXP2.PXS2.LSMP)
                    Store (Zero, \_SB.PCI0.EXP2.LL0S)
                    Store (Zero, \_SB.PCI0.EXP2.LLL1)
                }
Other OSes get the same values as Linux, other than the OSYS field. Now, what do these writes do? They're all to PCI config space, so since the machine in question is a 945/ICH7 machine we have publically available docs. A bit of digging later and it shows that the firmware is disabling PCIE active state link control and programming more conservative timings for entry into the C4 processor idle power saving state. In other words, certain bits of power management functionality are compromised if it detects that it's running anything other than Vista. Weirdly, it also flags the HPET as present but invisible on Linux, but I suspect that's an oversight rather than anything deliberate.

Why would they do this? I've no idea. I suspect it's something to do with the degree of platform validation performed rather than a subtle attempt to degrade Linux's battery life on the hardware (frankly, we do a good enough job of that ourselves right now), but this is exactly the kind of reason we removed _OSI("Linux") support from the kernel. Vendors will do stupid things with it.

Syndicated 2008-08-01 18:47:15 from Matthew Garrett

Various people have asked me why there'd ever be a justification for ACPI tables to base various types of behaviour on the operating system they're running on, and why Linux claims to be Windows. There's pretty straightforward explanations that don't involve conspiracies, but they're not necessarily obvious. Let's start from the beginning.

ACPI provides two mechanisms for determining the OS, _OS and _OSI. _OS is an ACPI object containing a string. This string is supposed to represent the operating system. Windows 98 contained "Microsoft Windows", NT4 "Microsoft Windows NT" and ME "Microsoft WindowsME: Millennium Edition". All later versions of Windows contain "Microsoft Windows NT". Linux was "Linux" up until 2.6.9, and has been "Microsoft Windows NT" since then.

The obvious drawback to _OS is that it's a single string. _OSI was introduced with later versions of ACPI, providing an interface for ACPI tables to request which interfaces an OS supported. Interfaces may be something like "3.0 Thermal Model", indicating support for a specific aspect of ACPI, but may also be used to indicate the interfaces supported by the OS. The _OSI method is passed a string and returns either true or false depending on whether the OS claims to support that interface. The OS can therefore claim to support many different interfaces. Linux implements pretty much every aspect of ACPI that Windows does, and so claims to support all the interfaces that Windows implements. As a result, Linux will return true when asked if it implements support for any of the Windows interfaces (up to and including Vista).

That's all straightforward enough, but leaves two questions. The first is why Linux now claims "Microsoft Windows NT" for _OS. That's actually pretty simple - some DSDTs only check for various _OSI strings if _OS is "Microsoft Windows NT". This is stupid of them, but not a violation of the ACPI spec. The second is why Linux returns false for _OSI("Linux"). This is a little more subtle, but it basically boils down to "There is no Linux interface". The behaviour of Linux changes over time. We make no guarantees that its behaviour will be consistent over time as we find and fix bugs. Microsoft take a different approach. Their ACPI behaviour has few changes over time. Something claiming to be Windows 2000 will always behave the same way. We can't even bump the interface string per release - doing so would require us to maintain every broken behaviour we've ever implemented and switch between them depending on what the BIOS asked us for. Linux does not provide a stable ACPI API to platform firmware, and we make no guarantees that it ever will. The only behaviour you can depend on is that Linux will conform to either the ACPI spec or (where it differs) the behaviour of Windows. If you find behaviour that does not fall into either of those categories, then it's likely that the behaviour will change when we notice. The reason we removed Linux from the supported interfaces list in 2.6.24 was that we were beginning to see BIOSes that changed behaviour when they detected that they were running on Linux, and changes we wanted to make could have potentially broken these BIOSes.

Anyway. That's why we claim to be Windows. Now, why would a DSDT want to do anything with that information? It should be noted that making decisions based on this is not a contravention of the ACPI spec - section 5.7.2 even describes a case of this. There are a few situations under which this can be helpful. The first is due to varying interpretations of the spec. Early versions of Windows require that hotswap bays signal their removal by sending a bus check notification to the parent IDE bus. Later versions want a device check notification on the device itself. Both are valid, but you ideally want to use the right one on the corresponding OS. Checking the OS version lets you do so.


Red Stripe makes a lousy spec-compliant ACPI implementation. That is because it is not a spec-compliant ACPI implementation, it is beer! Hooray beer!


Another is user experience. Not all hardware is supported by all operating systems. For instance, older versions of Windows don't support high-precision event timers (HPET). The HPET is defined as an ACPI device, and so will show up as an unknown device in the Windows device manager unless the firmware disables it. This is acheived by altering a flag in the _STA response depending on the Windows version - earlier versions are told that the device should be invisible, and later versions are told that it should be exposed. Finally, there's bug workarounds. An example of this is Windows 98 crashing if a thermal zone reports a temperature of less than 15.8 degrees celsius. Working around that seems like a perfectly reasonable thing for a piece of hardware to do, since Microsoft don't make it easy for vendors to provide hotpatched versions of Windows.

The final part of this mystery is why various BIOSes attempt to check whether they're running on Linux. Almost all the BIOSes I've examined do nothing with this information, which is consistent with someone writing an OS lookup table many years ago and adding Linux just in case someone found it useful. People don't generally write ACPI tables from scratch (doing so would be very dull), so they'll base it on the one from a previous version. Code won't be removed unless it's breaking things, so you'll end up with various odd evolutionary dead ends that have persisted anyway. If your ACPI firmware checks for Linux, this is not inherently a bug in your BIOS. It's more likely that nobody cared enough to remove two lines of code that might turn out to be useful one day.

Syndicated 2008-07-31 20:01:23 from Matthew Garrett

The Linux Plumbers Call for Papers is open for another day or so. It's looking like we'll have an excellent spread of interesting power management topics, but if you've got anything interesting to say then throw it forward. We're looking for kernel issues that have an impact on userspace, userspace issues that need kernel support and anything else that involves the interesting interactions between the two.


Hats for everyone and cab it to the Gold Club


San Francisco has been somewhat hectic, but yesterday included a quick trip to the office to discuss community involvement in improving power management. Look forward to developments there. Independently, I now join Brian the guy who had been clean too long to get onto the drugs rehab program (hurrah voicemail) in being an ex-owner of my US phone number. Probably best to delete it if you have it.

In other news, Foxconn are sending me a board with the "controversial" AMI BIOS in order to figure out what's going on. Progress by this time next week, with luck.

Syndicated 2008-07-30 22:51:29 from Matthew Garrett

FOR THE LOVE OF CHRIST T-MOBILE WHY ARE YOU TRIGGERING VOICEMAIL NOTIFICATIONS AT THE RATE OF ONE EVERY FIVE MINUTES WHEN I AM UNABLE TO EVEN ACCESS MY VOICEMAIL I AM GOING TO BURN YOU TO THE GROUND

Syndicated 2008-07-27 21:29:47 from Matthew Garrett

The Farm Cafe - baked brie with fruit sauce followed by goat cheese ravioli, accompanied by an excellent range of beers. The assault on dessert was abandoned after the realisation that emergency dessert capacity had been pressed into service as backup cheese repository.

I'd forgotten that I liked Portland.

Syndicated 2008-07-27 07:09:39 from Matthew Garrett

Further Foxconn fun

Ryan kindly sent me a copy of the ACPI tables for his motherboard, so I've had the opportunity to look at them in a little more detail. There's nothing especially surprising. The first method of interest is OSFL, which I've annotated below:
   Method (OSFL, 0, NotSerialized)
    {
        If (LNotEqual (OSVR, Ones))
        {
            Return (OSVR)
        }
This block simply skips the checks if they've already been evaluated and returns the cached value
        If (LEqual (PICM, Zero))
        {
            Store (0xAC, DBG8)
        }
If the programmable interrupt controller has been set up in PIC mode rather than APIC mode, 0xAC is written to i/o port 0x80. This would then show up on a plug-in card if one were attached. Simply debug code
        Store (One, OSVR)
Set OSVR to 1, which in this case clearly means "Unknown OS"
        If (CondRefOf (_OSI, Local1))
This checks whether the OS supports the _OSI method. If it does, the following block is executed. If not, the older _OS method is used to detect the OS
        {
            If (_OSI ("Windows 2000"))
            {
                Store (0x04, OSVR)
            }
Newer versions of Windows will also claim to support the interfaces defined in older versions, so this set of checks is done in release order
            If (_OSI ("Windows 2001"))
            {
                Store (Zero, OSVR)
            }

            If (_OSI ("Windows 2001 SP1"))
            {
                Store (Zero, OSVR)
            }

            If (_OSI ("Windows 2001 SP2"))
            {
                Store (Zero, OSVR)
            }

            If (_OSI ("Windows 2001.1"))
            {
                Store (Zero, OSVR)
            }

            If (_OSI ("Windows 2001.1 SP1"))
            {
                Store (Zero, OSVR)
            }

            If (_OSI ("Windows 2006"))
            {
                Store (Zero, OSVR)
            }
If we've got this far, OSVR is now set to 0. Linux will claim to support all of these interfaces, and so OSVR should be 0 on Linux systems. Note that there is no _OSI check for Linux - the 2.6.24 change to remove Linux from the set of claimed interfaces is therefore irrelevant
        }
        Else
        {
Linux supports _OSI, so we should never be here. But if we somehow are...
            If (MCTH (_OS, "Microsoft Windows NT"))
            {
                Store (0x04, OSVR)
            }
Linux has responded to _OS with "Microsoft Windows NT" since 2.6.9. MCTH is simply a string matching routine defined elsewhere in the DSDT. So, worst case here is that OSVR is 4
            Else
            {
                If (MCTH (_OS, "Microsoft WindowsME: Millennium Edition"))
                {
                    Store (0x02, OSVR)
                }

                If (MCTH (_OS, "Linux"))
                {
                    Store (0x03, OSVR)
                }
..because this could never be true unless you're running 2.6.8.1 or earlier. But even so, getting here would still indicate failure - we've supported _OSI since before then, and so should never come anywhere near this code block.
            }
        }

        Return (OSVR)
    }
In summary, we end up with the following values:
Value OS
0 Windows XP, 2003 or Vista. Linux (assuming absence of bugs)
1 Unknown OS
2 Windows ME
3 A version of Linux that doesn't implement _OSI and is from before 2.6.9
4 Windows NT 4 and 2000. A version of Linux that doesn't implement _OSI and is 2.6.9 or later (I don't believe any such version exists

Now, where is this used? The majority of the OSFL checks only check whether the return value is 1 or 2, which will only be true for an OS that (a) doesn't claim to be Windows or (b) is Windows ME. Linux doesn't fall into either of these categories, so we can ignore them. The first interesting hit we have is in the HPET code, where _STA will return 0xf (device present and working) if OSFL is 0 and 0xb (device present and working, but should not be shown in the UI) otherwise. This is just to keep the HPET from showing up in versions of Windows that don't know what it is. The only other interesting hit is the following code from the PCI bus initialisation pathway:
 
                               If (LEqual (OSFL (), Zero))
                                {
                                    Store (0x59, SMIC)
                                }
                                Else
                                {
                                    If (LEqual (OSFL (), 0x04))
                                    {
                                        Store (0x5A, SMIC)
                                    }
                                    Else
                                    {
                                        Store (0x58, SMIC)
                                    }
                                }
This writes different values to SMIC (which turns out to be i/o port 0xb2) depending on the OS. 0xb2 is the standard(ish) way to trigger a system management interrupt, which causes the CPU to execute some code from a memory region that can't be accessed by the OS. This isn't that unusual, but it's a little weird. In any case, note that there's no check for whether OSFL is 3 here (which would be true if the _OS call returned Linux), and so Linux is being treated identically to Windows ME and any unknown OS. In reality, Linux will be treated identically to either Vista or 2000. This block provides no evidence of conspiracy. Finally, the OS version flag is written to a region of memory before suspend and read back afterwards. Nothing appears to be done with this information - it's conceivable that the low-level resume code in the BIOS has conditionals based on this, but I suspect that it's just boilerplate code that's ignored.

To summarise:
  • There is no code in this DSDT that could determine that the system is running any Linux kernel of 2.6.9 or later. This may even be true of earlier versions - I'm not sure when _OSI support was added
  • Even if the code did manage to determine that the system was running Linux, there are no codepaths that are Linux specific. Every piece of code is run on at least one version of Windows
What's the problem, then? I've no idea. The only "significant" issue is that the OEMB table provided by the BIOS has an incorrect checksum. Given that the OEMB table is never used by Linux (it's a vendor extension of some kind, with the best hint I've been able to find being that it can be used to pass information from the BIOS to the OS - kind of like the rest of ACPI, then...), this is pretty unimportant. And given that the OEMB table isn't part of the ACPI spec, it's certainly entirely irrelevant when it comes to determining whether the system is ACPI compliant or not.

Are there ACPI issues with Ryan's system? It sounds like it. The "Error attaching device data" complaints indicate some kind of failure on the part of the kernel to work out how the devices correspond to the ACPI namespace, but I strongly suspect that this is a Linux bug. Failure to reboot after suspend? Could be anything (I'd need direct access to the hardware to figure it out properly), but again it's almost certainly a Linux bug. The standard way Linux reboots systems is to bang the keyboard controller, and it's conceivable that something we're doing on resume is leaving the keyboard controller in a slightly confused state. We're clearly doing something wrong there, given that my Dell comes up without a keyboard about one resume in twenty - I just haven't had time to look into it yet.

The only remaining thing is the mutex handwaving. I've got no clue what's going on there. Ryan's suggested change (from Acquire (MUTE, 0x03E8) to Acquire (MUTE, 0xFFFF)) simply means that the OS will wait forever until it acquires the mutex - in the past it would only wait a second. The reason the compiler generates a warning here is that the firmware never checks whether it acquired the mutex or not! Bumping the timeout to infinity obviously fixes this warning (there's no need to check the return code if you're happy to wait forever rather than failing), but the original code is merely stupid as opposed to a spec violation.

Take home messages? There's no evidence whatsoever that the BIOS is deliberately targeting Linux. There's also no obvious spec violations, but some further investigation would be required to determine for sure whether the runtime errors are due to a Linux bug or a firmware bug. Ryan's modifications should result in precisely no reasonable functional change to the firmware (if it's ever hitting the mutex timeout, something has already gone horribly wrong), and if they do then it's because Linux isn't working as it's intended to. I can't find any way in which the code Foxconn are shipping is worse than any other typical vendor. This entire controversy is entirely unjustified.

Syndicated 2008-07-27 02:47:39 from Matthew Garrett

As an update to this, it turns out that I can't read and talking about _OSI isn't very helpful when the DSDT in question is calling _OS, but it does leave me somewhat more confused - Linux has claimed to be Windows NT since approximately forever. Without the original DSDT I've got absolutely no clue what's going on, but comments like:

Find and replace all seven occurences of Acquire (MUTE, 0x03E8) and replace with Acquire (MUTE, 0xFFFF), it appears they're trying to crash the kernel by locking a region of memory that shouldn't be locked, but without access to their source code comments, I can only speculate, this tells it to lock a memory address that is always reserved instead. ;)

don't give me a lot of confidence in any of this being a correct diagnosis given that the second argument to Acquire is a timeout and not an address (0x03e8 gives a timeout of a second, while 0xffff is "Block on acquiring this mutex forever").

In any case, it's highly unlikely that this is any attempt by Foxconn to prevent Linux from working. The majority of checks for Linux in ACPI tables are copy and pasted from reference tables that Intel (and other manufacturers) have provided at various points - even the Intel Macs attempt to check for Linux! Most vendors will never attempt to boot Linux on their boards or validate them appropriately, so it's entirely conceivable that they'll end up screwing things up in such a way that the only tested paths are the ones that are run by Windows. This is why we now attempt to ensure that Linux reports itself as Windows. If we're running Linux-specific code in the DSDT, then that's a bug in Linux.

Anyway. Accusing companies of conspiring against us when the most likely explanation is simply that they don't care is a fucking ridiculous thing to do and does nothing to get rid of the impression that Linux users are a bunch of whining childish hatemongers. Next time, try talking to someone who actually understands this stuff first?

Syndicated 2008-07-26 18:06:32 from Matthew Garrett

Linux hasn't claimed to be Linux in response to OSI queries since 2.6.24, so this is an interesting sidenote but basically irrelevant.

Syndicated 2008-07-25 17:26:35 from Matthew Garrett

The fallacy of the completely inclusive community

Emma Jane Hogbin gave a presentation on the gender gap in free software at Lugradio Live this weekend. One of the central messages was that a great deal of how to avoid putting women off computing can be distilled down to "Don't be a dick". This ties in well with Mako's restatement of the Ubuntu code of conduct as "Be excellent to each other" (a wonderful phrasing which demonstrates that Bill and Ted's Excellent Adventure is the most philosophically worthwhile film that Keanu Reeves has ever appeared in, and certainly not The fucking Matrix) and led to my 5 minute rant on why I hate the Linux community slightly later in the day, but does leave a certain problem. What standards are used to define whether given behaviour is dickish or not? The comments here show that there's disagreement even within a single sub-community of the larger free software world. Ben suggests that the reaction to perceived inappropriate behaviour is perhaps even more discouraging than the original behaviour, suggesting that bitching about things that offend you is dickish behaviour in and of itself. How do we decide whether someone is being a dick or helping the community? Is lack of tolerance a form of exclusionary behaviour?

This topic is actually one of the issues discussed in the Geek Social Fallacies, but here's a nice easy example. Would tolerating planet posts encouraging the eradication of the Jewish population be inclusive or exclusive? I suspect that most people would agree that it wouldn't be acceptable behaviour, which leads us to the next question. Why? There's two obvious arguments here. The first is that at a community level we have some form of rough moral consensus that advocating genocide is Just Wrong, and so criticising Nazis is obviously the right thing to do. The second argument is more pragmatic than philosophical - alienating millions of people in order to avoid alienating hate groups could be considered to reduce our potential contributor base in an unfortunate way.

I'm a fan of the second argument. The example I gave is emotive and relatively recent history has resulted in people tending to be pretty uniform in considering genocide to be a bad thing, but many other cases aren't clear cut. What I consider to be objectification of women is seen by others as appreciation of natural beauty. What I think of as sexist jokes are perceived by others as acceptable humour. When I advocate intolerance of certain behaviour, people are going to see me not having enough tolerance. So we end up in a situation where people make the "We should all just get along and be tolerant of each other" argument, which sounds fine but is fundamentally flawed in one significant way.

Advocating tolerance excludes the intolerant.

The reason people fail to see this is that it doesn't sound like a flaw. If you ask people whether we need to support intolerance, the immediate answer is probably no. But by advocating exclusion of intolerance, you're excluding all those who have good reasons to be intolerant. You're excluding the women who don't want to feel that the community sees them as a pair of breasts attached to some legs. You're excluding the ethnic groups who would prefer to avoid racist slurs or ethnic stereotyping. Telling anti-fascism protestors that they're being intolerant isn't likely to endear you to them. Advocating tolerance is telling the intolerant that they're wrong and should just deal with whatever it is that makes them unhappy.

So, ironically, tolerating certain types of intolerance is probably required in order to avoid alienating many potential contributors. That means some way of deciding what kinds of behaviour are acceptable and which are unacceptable. In the absence of either pre-existing community consensus or some philosophical breakthrough that allows unambiguous determination of the "rightness" of a given action, I'm going to suggest that we look at it from a pragmatic viewpoint. How many potential contributors do we discourage by criticising a certain type of behaviour? How many do we discourage by tolerating it?

The fallacy of the completely inclusive community is the idea that it includes everyone. The reality is that a certain level of social exclusion is required in order to include a wider range of people. So don't criticise people purely for criticising someone else's behaviour - make an argument for why that behaviour benefits the community. And when you see behaviour that you think discourages others, call people on it. Even if nobody's behaviour changes as a result, you're sending a signal that not everyone in the community agrees. Sometimes all people want is to know that there'll be some people on their side.

But, above all, try not to be a dick.

Syndicated 2008-07-21 12:35:30 from Matthew Garrett

144 older entries...

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!