Older blog entries for richdawe (starting at number 88)

Life

Not done much free software / open source hacking recently. Work's been manic. Fortunately I'm on holiday right now recovering. ;)

Edinburgh Festival

In RL (*) I've went up to Edinburgh for five days to the fringe and literature festivals, which were both excellent. Highlights for me at the fringe were "Shakespeare For Breakfast" and "Lick and Chew" for me. At the literature I really enjoyed a talk on the recent history of Turkey, culminating in me buying and reading the book "The New Turkey" by Chris Morris. Very interesting book. I hope to make it over to Turkey sometime.

The whole trip was a bit rushed this year, since I was only up there for 5 days, not a whole week. I think I need the whole week to get the most out of it. I would have liked to go to some of the film festival, for instance. Plus touristy stuff (Carlton Hill, Arthur's Seat), which I've failed to do three times now! But there's always next year...

Handy Edinburgh street pronunciation tip: "Cockburn Street" is pronounced "Co-burn". I tell you this, to save yourself embarrassment!

(*) RL = real-life, a term taken from Pure Pwnage AKA "Pure Ownage", a show about gamerz which is hilarious, but also quite scary. I haven't made up my mind whether the lead character is acting. Even if he's acting, say, 50% of the time, that's still quite scary. Episode 2 is especially hilarious.

Sony Ericsson K750i

I got a K750i from Vodafone at the start of July, mainly because it has a 2 megapixel camera with auto-zoom. I'm really pleased with it. I got some great pictures of a vivid sunset in Edinburgh with it. The Bluetooth works pretty well, although I have managed to crash the phone once transferring ~20 images. I'm using an MSI 20 GBP USB Bluetooth dongle with Fedora Core 3.

Music

"Deliverance" by K90

Dem Headers

metaur wrote:

One solution to this problem might be to package the broken HTTP requests or e-mail messages. If a server receives an e-mail message with two Content-Type headers, it should construct a new head with the Content-Type text/plain and put the entire broken message (head+body) in the body, so it is shown as text. That way, the users will get to read any potentially valuable textual information in the broken messages, but they won't be exposed to any vulnerabilities or bugs resulting from different programs parsing broken messages differently.

Yes, wrapping an e-mail message like that would work. I'm not sure what users would think of it, though.

I don't think the same solution would work for HTTP, because the HTTP request is an HTTP request containing an HTTP request, using the Content-Length header to "hide" a third HTTP request. I think the only sane thing you can do is reject the request. Apache apparently does this, if you use it as a proxy. Other software doesn't.

I wonder if it's worth someone writing a BCP (Best Common Practices) RFC on how to parse e-mails. RFC 2822 says that mail should only have one Subject header. What should you do with a message with multiple Subjects? Reject it? Consider it to be more spammy than a normal mail? Consider it likely to be malware?

Music

"Successful Enter" by Headquarter (Axel Konrad Remix)

HTTP Request Smuggling, e-mail

I read an interesting paper on HTTP Request Smuggling the other day. This is a technique for exploiting the different ways web servers / proxies interpret an HTTP request with multiple Content-Length headers. They don't interpret the requests the say way, which means you can do HTTP cache poisoning and cross-site scripting attacks. Some server bugs also help.

While reading this I was struck by how similar the problems with this are with e-mail parsing. Both HTTP and Internet e-mail use the same header continuation format (space on a line following a header indicates header continuation) and end-of-headers (blank line).

Various HTTP and e-mail programs seem to take a blank line containing only a space as the end of headers. Outlook Express is especially magical at finding something from completely broken mails (broken according to the standards). That kind of vagueness leads to multiple interpretations of the same mail, which means that various things can be snuck past, say, anti-virus programs. Unless the AV programs are aware of these techniques.

One example I've found is with the Subject, Message-ID and Content-ID headers. The standard says there should only be one of each. But what happens if there are multiple occurrences of any of these headers in mails? Answer: It depends on the program. Some programs take the first occurrence, some take the last. Which is right?

Music

"Global Underground 025 - Toronto" mix by Deep Dish

28 Jun 2005 (updated 28 Jun 2005 at 19:41 UTC) »

auto2rpm

Just pushed out a new release of auto2rpm, my tool for converting autoconfiscated packages to installable rpms. Nothing dramatic, just bugfixes:

New in 1.4 (2005-06-28):

* Bugfixes:

- Find well-known documentation, when the source tarball is not in the current directory.

- Build correctly, when the source tarball is not in the current directory.

- Include gzip'd info files and man pages in %files list, so the build does not break for packages including those.

Perl

I gave a talk at Birmingham Perl Mongers last Wednesday on debugging with strace & ltrace. I was quite surprised and heartened how many of the audience had used strace before (4/5 out of about 10/11 people). A large part of it was a live demo of the tools in action. It seemed to go down quite well.

Sadly I discovered the evening before that ltrace isn't much use for debugging XS, because that dlopen()s shared libraries. This seems to circumvent ltrace's hooking of the library calls. I wonder if it's possible to trace even dlopen'd libraries. Perhaps ltrace would have to hook dlsym() (eek).

At some point I'm going to give some training on strace at work for our Operations people. I did a two hour training session last year and it went pretty well.

Music

"Priceless" by Incubus

ACCU

At the end of April I went to ACCU 2005 Conference in Oxford. It was a useful conference. It was good to get away from the daily grind of implementation/bugfixing, take a step back and think about things. I haven't had much chance to do that recently.

Mostly I went to the C++ track, which had various talks on C++ by people such as Bjarne Stroustrup, Herb Sutter (of the Exceptional C++ series) and David Abrahams (from Boost). The metaprogramming talks went over my head, so I need to speed some time digesting those. It sounds like C++ 200x will feature some usability improvements. Personally I think the biggest thing they can do is to make C++ error messages more intelligible - admittedly that's a compiler thing. (I do know about various utilities to help simplify the messages, but why not have the compiler do it by default?)

Another talk I enjoyed was Ross Anderson's talk on finding bugs in software. The basic question was how long do you have to test a piece of software to ensure it has, say, a 1 million hour MTBF (mean time before failure). He used a Boltzmann distribution to model how long it takes to find bugs and assumed that the bugs were independent (i.e.: uncoupled). In the Boltzmann model the "high energy" bugs are the ones that are found quickly, the "low energy" ones found slowly. The answer to the question was that 1 million hours of testing is required to achieve that MTBF. There was also a comparison between the bugginess of open source vs. closed source software, but I have to admit I didn't really follow his assumptions. Some magical constant appeared which he used to suggest that closed source was harder to test and therefore more buggy. I guess that depends whether you expect your testers to look at your code or not. He seemed to be assuming that open source software would be whitebox tested, whereas closed source software would be blackbox tested. I don't know if that's a valid assumption. Where I've worked, the testers could have examined the sources as part of their testing, but haven't.

Herb Sutter's talk comparing Java generics, .Net generics and C++ templates was interesting. Java generics are enforced at the source level and are problematic if you mix code old code with new code that uses generics - you get exceptions, due to some weirdness in the way the type information is passed around. I found .Net generics impressive, because they work in all languages. My interest in .Net has been piqued - I need to look at Mono seriously sometime. C++ templates are the most flexible, but also the most daunting.

I also enjoyed a very practical talk on XSLT. I've done a very small bit of XSLT, but I wanted to learn more. It was certainly less intimidating than I thought.

Music

"Digital Reason" by Ashtrax

xmlfs

I did some more hacking on xmlfs and the Perl bindings for FUSE. I've got extended attributes working nicely now. XML attributes appear as extended attributes.

E.g.:

<?xml version="1.0"?>
<example>
  <node1>Some content</node1>
  <node2 flarble="true" flurble="false"/>
  <node2/>
  <node3><node3>
  <node4>Blah</node4>
</example>

gives:

[rich@meelo mnt]$ find
.
./example
./example/node4@@0
./example/node3@@0
./example/node2@@1
./example/node2@@0
./example/node1@@0
[rich@meelo mnt]$ getfattr -d -R .
# file: example/node2@@0
user.flarble="true"
user.flurble="false"

The patch to the Perl bindings is done. I need to write some test cases. I added support for the flush, release and fsync operations - I need to check those work too.

prozilla

Debian bug 295268 describes a bug in prozilla 1.3.7.x (a download accelerator) where sometimes it downloads the file with the right size, other times not. This seems to be caused by a web cache. When there are cache misses for all portions of the file, the file is downloaded correctly. Otherwise, it fails. The cache hits seem to ignore prozilla's Range request, which seems legitimate according to the HTTP 1.1 standard. So prozilla is broken. This looks a bit tricky to solve.

Music

"It's Personal" by The Ganja Kru

xmlfs

I did a talk at Birmingham Perl Mongers last night about "Writing Filesystems in Perl". I talked about how you can write filesystems for Linux using FUSE (Filesystems in User-space), which is a Linux kernel module and user-space library, plus various language bindings. The particular example I used was xmlfs. xmlfs is a filesystem that I've been working recently. It allows you to mount an XML file a filesystem and then manipulate the nodes as normal files, e.g.:

xmlfs example.xml mnt
cd mnt/example
mkdir node@@0
echo foo > node@@0/leaf@@0
cat node@@0/leaf@@0

The talk went pretty well, although I cocked some bits of it up - I'm never quite sure if I'm explaining things clearly. The best bit was definitely the live demo.

I haven't thought of any real practical use for xmlfs. I'm interested in the semantics and how an XML filesystem would work. It's a research project.

Music

"Nil By Mouth" by the Wiseguys

prozilla

(prozilla is console download accelerator.)

prozilla 1.3.7.4 is out - please upgrade ASAP. This release has some important fixes:

  • Support for downloading files > 2GB.

  • Fix a remotely exploitable format string security bug.

There are other minor fixes: command-line option error handling, typos.

FUSE

I've been hacking on a user-mode filesystem on Linux using FUSE, File Systems in Userspace. I'm using the Perl bindings. I have a working read-only filesystem in a couple of hours. Write support is taking a bit longer. I'm writing a filesystem for a talk called "Writing Filesystems in Perl" that I'm going to give at Birmingham Perl Mongers at the end of the month. FUSE rocks!

Music

"The Rescue Blues" by Ryan Adams

Apache 2 and gzip'd mailing list archives

For sometime I've been using mhonarc to archive various mailings lists that I read into a bunch of HTML pages. I've set up mhonarc to compress these archives. They're placed in a subdirectory on my webserver.

The idea is that at some point I can index the mail archive and search it. That's the theory, but until today I couldn't actually view the web pages in a web browser. That's because the HTML pages came back with a Content-Type of application/x-gzip, which my web browser (rightly) refuses to display.

Now I've solved it. There were two steps:

  • Enable MultiViews for the directory containing the mail archives, so that asking for the file index.html would actually fetch index.html.gz transparently.

  • Force the MIME type of the files in the directory to be text/html. This is slightly evil, but seems to be the only way to do it. Without this step, .html.gz will always be returned as application/x-gzip. I'm not the only person who's had had this problem - see the Apache Compression HOWTO.

So for your enjoyment, here's the Directory directive for my mailing list archives:

<Directory "/var/www/html/ml">
    Options Indexes MultiViews
    AllowOverride None
    Order allow,deny
    Allow from all

ForceType text/html </Directory>

Music

"Words" by The Doves (excellent in concert, BTW)

Stuff

I seem to have spent most of my time recently working. I probably would have enjoyed it a bit more if it wasn't so stressful. Anyhow, I've finally got round to some things that I've been meaning to do for a while, which felt like a holiday for some reason.

I'm planning to set up a CPAN mirror on my boxes at home and a box at work. This will use the CPAN::Mini module to mirror just the latest modules. search.cpan.org seems to be down a lot recently and I use it a reasonable amount for browsing for modules and reading docs. I find its interface a lot nicer than reading perldoc docs in a terminal or Emacs. Another nice way of reading the Perl docs is via perldoc.perldrunks.org, which is done by one of the Birmingham Perl Mongers.

Got wireless? Check. I've got a Linksys WRT54GS which I've got plugged into the back of my Draytek Vigor ADSL router. Seems to work pretty well. I got about 2MB/sec out of it yesterday. I'll have to give NetworkManager a go.

So my patch for the r8169 in the Linux kernel was accepted. I need to do some testing on a slightly updated patch that the maintainer sent me. Then hopefully it'll go in 2.6.11 or 2.6.12.

I may do some hacking on ethtool, to make it more friendly. It seems that telling it the speed and/or duplex doesn't flip the card of autonegotiation mode, which seems kind of silly.

Distros seem to be including prozilla again, after the security fixes I did, which is nice. I need to review the 2.0.x code at some point.

Music

"Static & Silence" by the Sundays

79 older entries...

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!