Older blog entries for badger (starting at number 69)

I've been looking for this quote all day and like magic, it appeared in someone's email sig.

Any intelligent fool can make things bigger, more complex, and more violent. It takes a touch of genius -- and a lot of courage -- to move in the opposite direction.

Albert Einstein 1879 - 1955

Now, apply that to git.

Greg Dekoenigsberg pointed this out to a few of us and it's worth repeating to a wider audience:

Except where otherwise noted, third-party content on this site is licensed under a Creative Commons Attribution 3.0 License. Visitors to this website agree to grant a non-exclusive, irrevocable, royalty-free license to the rest of the world for their submissions to Whitehouse.gov under the Creative Commons Attribution 3.0 License.
-- http://www.whitehouse.gov/copyright/

Mass branching was a success! For Fedora 9, mass branching of packages took 2 days to complete. For Fedora 10 it took 2 hours.

The savings is thanks to Bill Nottingham who cut out some unnecessary steps to the cvs branching script and to Jesse Keating and myself who implemented some more efficient code for the Package Database portion.

JavaScript

JavaScript is a mass of failures. But it's also a fun language. Why the contradiction? I get the impression that it just isn't a very complete language in a lot of ways. Here's my short, top-of-the-morning list of JavaScript shortcomings:

  1. No way to specify dependencies between modules.
  2. Lack of a comprehensive standard library.
  3. No standard method of taking well organized code (for instance, separate files) and turning them into something that is efficient for the network.
  4. Using prototype for inheritance has a few corner cases that you have to have knowledge of in order to avoid pitfalls (this could just be me, though).

Most of the major JavaScript frameworks out there have some code in them to address these failings. This is both a plus and a minus. On the one hand, it means that it is possible for JavaScript to be extended to work around these problems. On the other hand, it means that instead of learning JavaScript, a serious programmer has to learn jQuery or Dojo or Prototype. And it means that when you find that you really love one feature about one framework and a second feature about another framework you either have to make the user download both frameworks (with all the duplication in functionality between them) or suffer through a suboptimal means of doing some things since you are restricted to just one.

And now that I've got the rant out of my system, let me tell you what I like about Dojo and jQuery and why I wish JavaScript made it easy to use both together:

jQuery

  • jQuery has a very rich querying method for selecting JavaScript nodes that extends the standard CSS selectors in useful ways.
  • jQuery has an equally rich series of methods that operate on the queried nodes. Even though Dojo has a query() method that can do similar work as jQuery, the richness of methods that can apply is somewhat restricted in comparison.
  • In other words, jQuery has an area of specialty: selecting nodes from the DOM and manipulating them. It's the best framework I've encountered for doing that

Dojo

  • Asynchronous programming in Dojo is easy with the twisted-style Deferred. Programming network requests and other long running or recurring events asynchronously is natural with Dojo.
  • Specify dependencies from within JavaScript files. You don't have to use html script tags to specify all the JavaScript files to load or where to load them from; the dependencies mechanism does that for you.
  • Build system to take your organized code and make it suitable for deployment. Lots of little files become one JavaScript file so the browser doesn't have to make multiple requests or even an option to allow asynchronous loading.
  • dojo.declare() makes object oriented javascript much easier. It's fun working with prototype and the nitty gritty of JavaScript's prototype-based OO because it's different than the class-based inheritance that I'm used to but I also get extremely tired of debugging problems with prototype inheritance because there's just one little thing that I don't understand (and seemingly, no one else on the internet does either :-) or writing boilerplate over and over.
  • In other words, where jQuery makes it easy to do all things DOM related, Dojo makes it easier to develop and deploy the rest of your application.

So anyone who thinks that unicode in python is bad, take a look at uri parsing :-)

28 Sep 2008 (updated 28 Sep 2008 at 19:58 UTC) »

When people tell me that python3000 will "solve" the unicode problem in python, I always shake my head and say that unicode handling will be better but there's still plenty of places where they'll consider it "broken".

That's because people are really asking whether they'll stop getting UnicodeErrors in their code and stop having to manually convert between byte strings and unicode. This portion will never stop simply because the data stored on computers is stored as bytes and it has to go through a translation before it can be recognized as unicode.

Take filenames on web servers as an example. I recently created two files on my apache web server with filenames of ½ñ.html encoded in utf-8 and ½ñ.html encoded in latin-1. I tested that apache was serving both files by hitting them in firefox using the hex for their encoded names (%c2%bd%c3%b1.html for the utf-8 version; %bd%f1.html for the latin-1).

In python3.0rc1, this resulted in total failure (I wasn't able to retrieve either one of these names as urllib would only take unicode strings which encoded to the ASCii subset) bug

But what is the "right" solution? A naive guess would be that the programmer should deal with the url as unicode internally. This is because the programmer may have to manipulate the path as a string. In a web app, for instance, a user might submit a form at "http://localhost/mysite/files/vaña.txt?edit" and the web app needs to parse that url and redirect the user to "http://localhost/mysite/files/vaña.txt?view" afterwards.

However, this approach suffers from a major problem. If the URL is converted into the unicode type before the web app gets it, then the web app will not know what character set to encode the ñ in in order to get proper data back. Is the ñ written in latin-1? Is it written in utf-8? The difference will cause a 404 (or worse, access to the wrong file) if guessed wrong. So the underlying libraries have to pass a byte string up to the web application and the web application has to translate that string into a unicode string just long enough to operate on it with the proper string tools before sending it back as an encoded byte string.

So then, what is the big deal with python3000? How does it make the unicode situation better at all? Consistency.

In python2.x, there's many times where a module will only work with byte strings or only work with unicode strings despite the fact that either one would be valid for the work being done. In one case I've seen, it's possible to get to a point where the Windows filesystem requires a unicode type to do the right thing but the subprocess module requires a byte string so there's no way to operate on non-ASCii filenames.

python3000 will make unicode strings the norm when dealing with strings in code. If you get a byte type returned to you you'll know that there's a reason you're getting it rather than just assuming that this module didn't convert things to unicode. This will hopefully make errors from non-conversion occur closer to their origin and make potential error conditions easier to find as seeing a byte type where you expect a string will be a major clue that something has gone wrong.

24 Sep 2008 (updated 24 Sep 2008 at 17:50 UTC) »

Since I moved to my new home I have been receiving phone calls for political polls for the House of Representatives seat that I would vote for. The first time I listened to the questions, answered dutifully, and had a vague disquiet. The questions all seemed leading and meant to portray one of the candidates in a positive light (for my area) and portray the other candidate negatively.

Yesterday, I received another one of these calls and this time I even recognized some of the questions as coming directly from an ad for one of the candidates that I had received in the mail. So after the poll questions, I started asking questions of my own. None of the people I was able to talk to knew who paid for the poll (supposedly so as not to bias the poll givers when they talked to people on the phone. OTOH, with questions like these, it would take a pretty oblivious questioner to not be able to tell who was paying for this.)

I could just be used to normal American political ads where there's always a "Paid for by XXXX" disclaimer in the ad but this feels wrong to me. Does anyone out on the lazyweb know if there are regulations about polls paid for by political candidates and parties?

Update spot was kind enough to tell me this practice is known as push polling. It's an interesting technique since the poll doesn't even have to ask questions based in fact. I wonder how many of my friends and neighbours have been taken in by this campaign....

Yesterday I deployed a new version of the PackageDB. This is becoming a rather regular event due to the fact that maploin is working very hard on a lot of important UI improvments. For this update we have the answer to one of our commonly requested features: ability to filter the package list by letter.

pkgdb-letter-nav2

A few weeks back, at the prompting of Lyos Norezel, I updated the orphaned packages page in the packagedb to display the releases a package was orphaned for. Unfortunately, I managed to break sorting of the packages at the same time which made things a bit unusable.

Now that that's been pointed out, I've fixed things to be alphabetical again: New orphan list

Flatten a list in a python one-liner

I sometimes need to flatten a list in a list comprehension or generator expression (usually because I'm doing work inside of a template where keeping temporary variables is somewhat painful). This has stumped me for a while but today I found itertools.chain() which worked wonderfully. Here's how I found out I could make use of it:


>>> import itertools
>>> data = [(1,2), (3,4,5), (6,7)]
>>> [i for i in itertools.chain(*data)]
[1, 2, 3, 4, 5, 6, 7]

60 older entries...

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!