Older blog entries for badger (starting at number 64)

28 Sep 2008 (updated 28 Sep 2008 at 19:58 UTC) »

When people tell me that python3000 will "solve" the unicode problem in python, I always shake my head and say that unicode handling will be better but there's still plenty of places where they'll consider it "broken".

That's because people are really asking whether they'll stop getting UnicodeErrors in their code and stop having to manually convert between byte strings and unicode. This portion will never stop simply because the data stored on computers is stored as bytes and it has to go through a translation before it can be recognized as unicode.

Take filenames on web servers as an example. I recently created two files on my apache web server with filenames of ½ñ.html encoded in utf-8 and ½ñ.html encoded in latin-1. I tested that apache was serving both files by hitting them in firefox using the hex for their encoded names (%c2%bd%c3%b1.html for the utf-8 version; %bd%f1.html for the latin-1).

In python3.0rc1, this resulted in total failure (I wasn't able to retrieve either one of these names as urllib would only take unicode strings which encoded to the ASCii subset) bug

But what is the "right" solution? A naive guess would be that the programmer should deal with the url as unicode internally. This is because the programmer may have to manipulate the path as a string. In a web app, for instance, a user might submit a form at "http://localhost/mysite/files/vaña.txt?edit" and the web app needs to parse that url and redirect the user to "http://localhost/mysite/files/vaña.txt?view" afterwards.

However, this approach suffers from a major problem. If the URL is converted into the unicode type before the web app gets it, then the web app will not know what character set to encode the ñ in in order to get proper data back. Is the ñ written in latin-1? Is it written in utf-8? The difference will cause a 404 (or worse, access to the wrong file) if guessed wrong. So the underlying libraries have to pass a byte string up to the web application and the web application has to translate that string into a unicode string just long enough to operate on it with the proper string tools before sending it back as an encoded byte string.

So then, what is the big deal with python3000? How does it make the unicode situation better at all? Consistency.

In python2.x, there's many times where a module will only work with byte strings or only work with unicode strings despite the fact that either one would be valid for the work being done. In one case I've seen, it's possible to get to a point where the Windows filesystem requires a unicode type to do the right thing but the subprocess module requires a byte string so there's no way to operate on non-ASCii filenames.

python3000 will make unicode strings the norm when dealing with strings in code. If you get a byte type returned to you you'll know that there's a reason you're getting it rather than just assuming that this module didn't convert things to unicode. This will hopefully make errors from non-conversion occur closer to their origin and make potential error conditions easier to find as seeing a byte type where you expect a string will be a major clue that something has gone wrong.

24 Sep 2008 (updated 24 Sep 2008 at 17:50 UTC) »

Since I moved to my new home I have been receiving phone calls for political polls for the House of Representatives seat that I would vote for. The first time I listened to the questions, answered dutifully, and had a vague disquiet. The questions all seemed leading and meant to portray one of the candidates in a positive light (for my area) and portray the other candidate negatively.

Yesterday, I received another one of these calls and this time I even recognized some of the questions as coming directly from an ad for one of the candidates that I had received in the mail. So after the poll questions, I started asking questions of my own. None of the people I was able to talk to knew who paid for the poll (supposedly so as not to bias the poll givers when they talked to people on the phone. OTOH, with questions like these, it would take a pretty oblivious questioner to not be able to tell who was paying for this.)

I could just be used to normal American political ads where there's always a "Paid for by XXXX" disclaimer in the ad but this feels wrong to me. Does anyone out on the lazyweb know if there are regulations about polls paid for by political candidates and parties?

Update spot was kind enough to tell me this practice is known as push polling. It's an interesting technique since the poll doesn't even have to ask questions based in fact. I wonder how many of my friends and neighbours have been taken in by this campaign....

Yesterday I deployed a new version of the PackageDB. This is becoming a rather regular event due to the fact that maploin is working very hard on a lot of important UI improvments. For this update we have the answer to one of our commonly requested features: ability to filter the package list by letter.

pkgdb-letter-nav2

A few weeks back, at the prompting of Lyos Norezel, I updated the orphaned packages page in the packagedb to display the releases a package was orphaned for. Unfortunately, I managed to break sorting of the packages at the same time which made things a bit unusable.

Now that that's been pointed out, I've fixed things to be alphabetical again: New orphan list

Flatten a list in a python one-liner

I sometimes need to flatten a list in a list comprehension or generator expression (usually because I'm doing work inside of a template where keeping temporary variables is somewhat painful). This has stumped me for a while but today I found itertools.chain() which worked wonderfully. Here's how I found out I could make use of it:


>>> import itertools
>>> data = [(1,2), (3,4,5), (6,7)]
>>> [i for i in itertools.chain(*data)]
[1, 2, 3, 4, 5, 6, 7]
New Election Software

As Fedora folk know, the Fedora Board Election is currently taking place. What people might not all be aware of is that the election software has been completely rewritten by one of our all-star contributors, Nigel Jones. Since I wrote the former election software, I'm well aware of its shortcomings: a manual process for setting up an election, no confirmation pages, limited styles of voting, only a single election at a time, and it just plain looked ugly. Nigel looked at all the problems and wrote a new app in TurboGears that fixed them. Not only does it solve all the problems we had before but it gives us a much more flexible framework for further enhancements. If we want new styles of voting or a new Fedora theme, it will be much easier to implement that with the new code.

So if you're on one of the Fedora IRC channels, give a warm thank-you to "G" for the excellent work he's done!

8 May 2008 (updated 8 May 2008 at 08:36 UTC) »

Why I Love Open Source

I've had a bunch of items on my TODO queue for the pkgdb that would be major enhancements to its usability but haven't been able to work on them due to higher priority things constantly coming up (FAS, figuring out why common operations are so slow, eliminating unicode crashes, mass branching, etc) This doesn't mean that I don't know there are usability issues with the pkgdb, though. And every time I see those issues I have to shudder and promise myself that I'll have time to fix them soon.

Well, for at least a few of those most requested features that's no longer necessary! Christopher Aillon (caillon) submitted a patch to add UI to the user package page to filter packages according to whether you own them or not. This is great as it means I'll be able to set the default filter to owner, approveacls, and commit and people will be able to easily change the settings themselves.

An even bigger enhancment is being worked on by Ionuț Arțăriși (maploin) who has recently started hacking on the PackageDB. He's adding search capability so that we can find packages by keyword a la yum search. Because of the infrastructure change freeze that we're enforcing until Fedora 9 is out the door, I don't have a test server running his code yet. If you want to try it out, though, it's available by checking out the bzr repository:


bzr branch bzr://bzr.fedorahosted.org/bzr/packagedb/mapleoin-devel

Instructions for configuring a test server are in the README file.

Here's a few screenshots to whet your whistles:

pkgdb-search-front.png
Front Page with Search Box



pkgdb-search-results.png
Results!
Mugshot to the Rescue!

While doing other things after dinner I happened to see this entry go by in my mugshot stacker and thought, hmmm... I should look at that. One click later I was looking at the suspect changeset on fedorahosted's trac instance and confirmed that it would break bugzilla sync. A few minutes of editing and a fix was committed.

Moral of the story #1: When set up to watch the RSS feed for your project's commits mugshot can help get the right eyes on a problem quickly.

Moral of the story #2: If you have any doubts about a change make sure to mention them in the first line of your commit message.

Does it say good things about inkscape or bad things about the other programs we have available that it's what I use to model objects and relationships when I program?

Mmm... Matzo ball soup and roasted tomatoes.

55 older entries...

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!