Older blog entries for titus (starting at number 148)

The 30-second Guide to Making Eggs

Courtesy of some prodding from some irish guy.

At the top of your setup.py (you do have one, right?) add:


### ez_setup.py and 'use_setuptools()' will automagically download and ### install setuptools. The downside to this code is that then ### you need to include ez_setup.py in your distribution, too.

try: from ez_setup import use_setuptools use_setuptools() except ImportError: pass

### this is the critical line from setuptools import setup # instead of the 'distutils.core' setup

### also import Extension, etc -- anything else you need -- from setuptools.

then you can make eggs with the bdist_egg command. Try:

% python2.3 setup.py bdist_egg
% python2.4 setup.py bdist_egg

to build eggs for each version of Python you have installed. The eggs will end up in build/. If you're distributing precompiled binary code, you'll need to make an egg for each platform/Python version.

Anyway, hope that helps!

--titus

p.s. I should probably write a "5 minute guide to distributing Python software properly". Lessee... (1) make setup.py; (2) add it to PyPi; (3) post to c.l.p.a. ;) Oh, and (1a) run cheesecake!

For future reference: BBQ

Educated Guesswork posted a link to a mail order BBQ store. Yummmm.

The first comment on the post has a few more references.

(Oddly, this diary is turning out to be a great place to keep such links: I can always find them, even weeks later. Much better -- and more persistent -- than my own fairly random collection of HTML pages...)

Good Google idea (?)

Wouldn't it be cool if Google could show you when a page last changed significantly? (I was once involved in a company that was doing things like this; obviously it's hard without the proper infrastructure, but Google clearly has that.)

Hmm, I just realized that Google might already be ordering search results that way... although the amount of outdated Linux stuff I google up is large enough that I'd guess they're not.

New package overload

I've spent much of my programming time over the last two weeks getting a fingerhold on a bunch of library modules and packages. In our tutorial application, Grig and I are using CherryPy, Commentary, Durus, AMK's code implementing JWZ's mail threading algorithm, imaplib, and poplib. (And that's not counting the installation stuff and testing packages, like coverage, Selenium, Fitnesse, and soon Buildbot.) We're trying out some of these new packages because I want to play with new technologies (that's why I'm using CherryPy instead of Quixote); for others, like Durus (or some Python embedded db...) and the e-mail stuff, it's a required part of the project.

The last two packages I'm planning to start using are buffet and Cheetah; the time has come to do a proper job of HTML display in our application, and I wanted to try something other than Quixote's PTL.

I have to say that learning this much new stuff all at once is mildly overwhelming, but it's a lot of fun. I'm trying to get it all out of the way by the first release (which was scheduled for last week, but got moved a week because of all of the holidays). The second release will focus on performance, and the third release -- just in time for the tutorial... -- will focus on compleat testing coverage and frilly features. For now, I'm trying to get a sort of tracer bullet implementation of the basic features going. Grig is having a lot of fun playing with the different testing packages, I think, and we're both having a lot of fun assigning tickets to each other in Trac. No fistfights yet; perhaps that's a good feature of remote development ;).

In the process of doing all of this random grokking, I've made some simple sequence-style wrapper interfaces for poplib and imaplib, and fixed up the jwzthreading code a tad. Nothing big, but I guess I'll post them somwhere in the next few weeks...

Oh, one plea: does anyone know of a high-performance lazy-parsing e-mail parser? I'd like something that will retrieve header fields on demand, rather than pre-parsing them, and will also not parse the entire message all at once. The closest thing I've found is this cookbook recipe which talks a bit about the 2.4 email module.

--titus

CherryPy is horrible!

Well, no, it's actually quite nice -- but too many people are happily burbling about TurboGears, so I felt I had to say something mean. ;).

Ivn Krstc's madness: to be revealed soon

Scanning through the PyCon talk schedule, I saw this. So soon we will be able to see what Ivan Krstic was talking about back then.

In other news, enough people have signed up for our PyCon tutorial that I can definitely afford to come. Thanks, all -- now we just have to worry about the presentation...

Jobbing you

Indeed, as haruspex pointed out, Steve Jobs never did graduate from Reed. We still consider him one of ours -- Reed officially defines an alumnus as anyone who has attended one full semester. (...because that way, they can call and ask for $$, I think.)

--titus

Perils of Javaity

Joel Grossberg's comment in Ian Bicking's post on Joel Spolsky's article reminds me: my college, Reed, didn't have a CS department or curriculum. There were two intro programming classes, but that was it.

Despite this lack of formal CS education, Reed graduated Keith Packard (X-anything), Nelson Minar (emacs HTML mode, among other things), and half-a-dozen other excellent programmers. Well, and Steve Jobs, too ;).

Just thought I'd mention it.

The Tom Peters Blog Challenge

Via this blog, John O'Leary asks:

1. What's the most important thing you've done this year?

Attending the Woods Hole Embryology Course was definitely the most important thing I did.

Computationally, I'd have to say that twill is probably the most important thing I did. Sigh, it's the first year in a while that some component of my research programming isn't the most important thing I've done, but I'm avoiding that in an effort to actually graduate.

2. What's the most important thing you'll do in the next year?

Graduate, I hope -- or leave graduate school in some way. I'm stuck in a miserable situation where fairly slow experiments are dictating my future. I'm applying for jobs, but without the experimental evidence I'm going to have a tough time of it.

In addition, our Agile Development & Testing project could turn into something quite nice, but I refuse to depend on that ;).

Also, three posts -- Dave Winer (on RSS), Ian Bicking (with his Python docs stuff), and Jonathan Ellis (on ORM stuff) -- have made me decide that one New Year's resolution (category: "programming") will be to learn to play better with other projects. In particular, I'd like to put together a Trusted Commentary application that uses the Advogato trust metric together with Commentary and a simple central-server-based commenting setup to make it easy to deploy a commenting system on individual sites. I'd also like to see if I can integrate cucumber2 ideas into SQLAlchemy or PyDO. We'll see how well these projects go ;).

--titus

Python quote

Here's an entertainingly odd excerpt from ziggy's post on Joel Spolsky's The Peril of Java Schools:

...

Python strives to have one (obvious) way to do it, which makes all source code "look" the same. Thus, looking at a piece of random Python code, it's difficult to tell at a glance whether it's good or not.

Perl, on the other hand, with its trademark TMTOWTDI, makes it easier to see when code is good or bad. Because it blends aspects of C, Lisp, and OOP, a good Perl programmer will have to think at multiple levels of abstraction to get a complex job done.

...

Elsewhere, Ned Batchelder weighs in with some commentary on the Perils of Java -- the comments are worth reading.

Also worth reading: 'Illusions of Chaos, Illusions of Calm' (via Sean McGrath).

--titus

The Perils of Java Training

JoelOnSoftware takes aim at Java. Hmm, I wonder if you advertised for programmers with experience in Haskell, OCaml, or Scheme, you'd get better interviewees?

Paranthetically, Joel Spolsky is one of the two or three people I find always worth reading. Paul Graham is another, of course, and so is Martin Fowler. (I used to read Philip Greenspun, too, but I feel that he aims for controversy over content and that started to annoy me.)

Our cat is smokin'

Our cat, Muika, is a smoked Egyptian Mau (see the picture titled "Smoke"). He's a dead ringer for the kitty on this page. What's funny is that his original owner got him from a pound! However, based on his near identity to the cats in these photos, I'd guess that he's a purebred.

Google-Fu

I've been noticing that advogato.org has powerful google-juice. This has amusing consequences: for example, whenever I try to find the home page for nosetest or nosetests I find my own diary entries on nose. (I guess I could be the only person blogging about it, too.)

Subversion, Darcs, and Trac, oh my!

I've been adding unit tests to some of my older packages, and revamping them as I have time; now I want to put them on my new development server.

Problem:

  1. Many different projects. (Just focusing on my moderately used bioinformatics analysis stuff, I have: Cartwheel (Python server & Python/C++/Java client stuff); FamilyRelationsII (C++ GUI written with FLTK); motility (C++ toolkit with Python interface); paircomp (C++ toolkit with Python interface); and sister (XML-ish parsing code in C++ and Python).

  2. Many interdependencies. (FRII uses the C++ Cartwheel client stuff, as well as all C++ code. All of them interact with data produced by Cartwheel, which in turn uses some of the toolkits to produce the data, but this time from the Python code.)

  3. Other users. Not only do I run a Cartwheel server myself, but other people run them and use FRII to interact with them. There are also several other people using the toolkits.

  4. Poor hosting on SourceForge. All of these projects are spread among the Cartwheel and FamilyJewels projects on SF. No subversion or darcs; no trac; lousy bug tracking interface; etc. At this point I'm just using the centralized CVS & the mailing lists.

Possible Solutions:

I'll definitely be using Trac and something a bit newer than CVS. But what? My two best options are:

  • Switch to using Darcs.

    • Pluses: I like Darcs; no complicated setup/maintenance; users (including me) can customize my deployment settings for those projects I deploy in place; users can "fork" at will.

    • Minuses: No subtrees, so I'd have to have a distinct Trac site for each Darcs repository. Maintaining version synch between them might also be annoying. Using darcs on Windows sounds miserable.

  • Switch to using Subversion.
    • Pluses: support for subtrees, e.g. "bio-tools/cartwheel", "bio-tools/motility", "bio-tools/paircomp" can all be distinct check-out-able projects within a single repository and with a single Trac instance.
    • Minuses: centralized, so users cannot fork at will without moderate investment in technology like SVK or tailor; moderately nasty setup/maintenance.

After writing this all down, I think the winner is going to be a moderately complicated hybrid setup.

Proposed solution:

  1. Dump each interdependent group of projects into Subversion, and attach a centralized Trac server.

  2. Develop in this svn repository, and synchronize releases among all projects.

  3. Export trunk & release branches to darcs via tailor.

This answers most of the minuses above, albeit while making my life more miserable with respect to configuration.

  • users can branch off of the darcs stuff without requiring r/w access to my svn repositories;
  • they can use svn if they just want an easy checkout, without investing effort in new technology, or they can use darcs;
  • All of my interdependent projects are maintained in the same repository, so I can synchronize stuff with tags;
  • I get a single Trac instance for all of my interdependent projects.

Re-reading this, I think I might be nuts -- but it's a good kind of nuts ;). I'll think on it; there's no urgency in implementing this.

The only thing that might make my life easier would be a real-time tailor-style svn-to-darcs converter, so that I don't have to maintain separate tailor directories. But that's a minor issue.

--titus

SCGI links

I'm a big fan of SCGI, a Fast CGI-like way of running a persistent server in an external process. There are a number of reasons why SCGI is pretty nice: it's clean, simple, has a simple protocol specification, and has a nice library. One reason I like to use it is that it integrates well with Apache: I can easily configure virtual hosts that direct requests to Web apps via SCGI, and the actual Web app can run as whatever user you want.

I tend to run a lot of SCGI servers, however, because I deploy a number of Web sites for various people. At the moment, I simply run them in screen, but that's awfully tedious for 15 different Web apps, and it doesn't work well when you have to reboot machines ;).

So, today I googled around a bit for start/stop scripts -- I'm not sure exactly what I'm looking for, but figured a bit of basic research would be a good start -- and noticed two amusing factoids.

First, there's a Wikipedia entry for SCGI! Kinda cool, even if it just a stub.

Second, some Ruby folk seem to have embraced it: check out Zed Shaw's Ruby On Rails SCGI Runner page. Neat -- I didn't realize SCGI was used outside the Python community. This page also has a long list of reasons why SCGI is particularly neat; it's worth reading.

I also ran across some nice links: flup, and Deploying TurboGears with Lighttpd/SCGI.

--titus

Merry Christmas, Happy Holidays, and an Enthusiastic Kwanzaa!

I probably won't write much over the weekend, so... have a good time over the holidays, folks.

WSGI, Paste, marketing.

I was going to write a long entry on "what is WSGI", partly in response to Ian's post, but I'd rather code than opine. At least for today ;).

Still, here's my personal take on the situation. In general, I'm conservative: I didn't like list comprehension, iterators, and metaclasses when they were first implemented. (I still hate the decorator syntax.) I didn't think WSGI was worth much at first. And I still have a hard time comprehending the structure of Python Paste, if not the intent.

I've changed my mind about list comprehension, iterators, metaclasses, and WSGI. I suspect I will eventually end up changing my mind about Paste; I'm already contemplating implementing stuff with Paste hooks.

My hunch, based on my experience with new Python features and WSGI, is that Ian (like GvR and PJE) is busy solving problems that I will not encounter for quite some time. It may take me a while to figure that out, and I may even end up not liking Ian's choices. But I strongly believe that WSGI and Paste are better long-term bets than Yet Another Python Web Framework -- it's like betting on a decoupled, decentralized content delivery system rather than relying on a few large content providers to make the right technical choices.

If there's one problem I'd like to solve, it's the marketing problem we seem to have with WSGI and Paste. It's time to change the effin' names. More on that next time I'm feeling creative.

Paranthetically, I've also been thinking on and off about making a proposal to unify Web handling in the Python 3000 std lib with a 'web' module; 'web.interface' for WSGI, 'web.url' for URL handling, 'web.browser' for a mechanize-style browsing interface, 'web.cgi'... etc. Anyone interested?

--titus

(Sorry for the long post, I'll soon start putting "articles" on an articles page instead of posting them.)

New net_flow package

My adventures in trustiness continue: Raph Levien, the author of the net_flow C code & the admin of the advogato site, gave me the actual configured capacities of his site. Woot. The code seems to reproduce the actual certification levels of advogato with fairly good accuracy; I suspect my scrape was a bit out of date, which could account for the discrepancies. (I nail the robots.net certifications exactly, and that site is a bit less active, I think.)

Grab it here.

New development server & tools setup: part 1 of inf.

I finally took the plunge and ordered another server from JohnCompanies. I went with the lowest-cost Linux VPS, so I share a machine via virtualization, with 256 mb of RAM guaranteed (up to 1 gb burst), 40 gb of bandwidth a month, and 5 gb of disk space. I ordered it with Debian 3.1. This machine adds to my small collection of servers: I already have a FreeBSD server with them, and I also run a local development server for my lab (RH 7!), an e-mail & Web page server for my family and friends (Debian), and a home MythTV server (Debian). I'm hoping to swap up the servers so that the e-mail server is moved onto a JohnCompanies server; that way I won't have to worry about hardware or backups.

I've had the FreeBSD server with JohnCompanies for several years now, and I share the cost with several people. (The total is something like $40/mo., after a discount because of my open-source work.) JohnCompanies is really great; I get the sense that they know UNIX as well as I do, if not better. Unlike me, however, they have the focus to actually do a good job of network adminning ;). They're incredibly responsive and their servers Just Work. It's exactly what I need. (Although if anyone has any suggestions for similarly priced off-shore servers, I'd appreciate a tip. I'd rather our government be able to legally wiretap me without a warrant, you see ;).

This new server is going to be a svn, darcs, Trac, Web site, domain, and whatnot hosting system. I'm hoping to make it into a bit of a co-op, where friendly like-minded people can host projects; part of the project will be to produce infrastructure to run all this stuff nicely. Grig is already planning to host two domains there. If you're a Python OSS developer who doesn't bite and wants a root shell, Trac+ setup, and a Python-friendly installation, e-mail me to join up. (Note, this is not-for-profit; we're just trying to share out the costs of all our little projects. It may even end up being free, if I can scrape together sufficient google ads revenue over the next few months)

After JC set up the machine for me, I had a fun-filled evening of setting up software. Briefly, I:

  • upgraded to Debian 'testing' from stable;
  • installed SCGI for Apache 2.x;
  • installed tailor;
  • set up Trac with the WSGI patches.

I had such a good time doing this that I thought I'd give a little "tutorial-style" introduction to two aspects of the setup for which I found relatively little documentation on the 'net.

Using easy_install to manage your Python packages

This is so easy that it's almost silly to write a HOWTO, but ... I didn't realize how easy it was 'til I'd done it ;).

Motivation:

I'd like to give users on this machine the ability to use multiple versions of a Python package.

Solution:

Use easy_install to install everything. easy_install will build eggs for each version of each package and allow the import of specific, distinct versions through the 'require' function.

To install easy_install, I downloaded ez_setup.py and ran it:

python ez_setup.py

This downloaded and installed setuptools.

To set up multiple versions for multiple Python versions, you can do something like this:

# install for 2.4
python2.3 ez_setup.py
mv /usr/bin/easy_install /usr/bin/easy_install2.3

# install for 2.4 python2.4 ez_setup.py mv /usr/bin/easy_install /usr/bin/easy_install2.4

# default to 2.3 ln -fs /usr/bin/easy_install2.3 /usr/bin/easy_install

You can now use easy_install to install all of the packages that you don't want managed by apt-get (or your package manager of choice).

To install the latest version of a package via PyPi:

easy_install package_name
e.g.

easy_install twill

will install twill 0.8.1.

To install a .tar.gz that may or may not know about setuptools, try:

wget http://darcs.idyll.org/~t/twill-latest.tar.gz
easy_install twill-latest.tar.gz

This will install the latest development version of twill.

To install a specific egg:

wget http://issola.caltech.edu/~t/dist/CherryPy-2.1.0-py2.3.egg
easy_install CherryPy-2.1.0-py2.3.egg

And, finally, you can use to install files directly from an untarred distribution or a repository:

darcs get --partial http://darcs.arstecnica.it/tailor
easy_install tailor

easy_install will go grab tailor/setup.py, grok it, and install both the library code and the scripts.

easy_install: it's that easy ;). Note that I have yet to run into any problems with using it on local files; occasionally it fails to find PyPi packages, or does strange things while scanning random Web pages.

So what drawbacks are there to easy_install? I've only run into two problems: one is that automated tests may not work for packages installed as eggs, e.g. 'tailor test' doesn't work. The other is that 'help' apparently doesn't work, either. Neither are big problems for me, because I don't use 'help' much (emacs can browse zip/egg files just fine, and I prefer reading the source code) and I'm not developing on 'tailor'.

I'll describe using the Python API to setuptools to import only a specific version of packages in a bit; haven't found the online docs for that, otherwise I'd just link to it ;).

Subversion to Darcs Mirroring with Tailor

The back story:

Devoted readers may recall that I complained about having to maintain my own set of patches to Trac. At the time I said I didn't want to have to convert the Trac Subversion archive to darcs (my versioning system of choice) with tailor, and I also didn't want to fork the Trac code.

A few days ago, Stan Seibert e-mailed me to point out that SVK does a fine job of making "private" copies of svn archives, although in further conversation we agreed that it might not be the best way to make your changes available to other people. (I referred to this practice as a "patch stream", and it's something darcs does very well.) Another obstacle was that SVK required a certain amount of Perl-fu, and my development machine wasn't package-managed any more.

Long story short, I decided to go with tailor on my new Debian server. It's written in Python, it's maintained in darcs, and (best of all!) the author uses it to maintain his own patchset for Trac.

Actual information:

First, I installed tailor:


darcs get --partial http://darcs.arstecnica.it/tailor easy_install tailor

I then upgraded subversion and darcs from Debian stable to Debian testing. You can do this in several ways with Debian; I chose to reset my system-wide preferences to testing and then do an 'apt-get dist-upgrade', but the simplest way to do it is this, I believe:

apt-get install -t testing svn darcs

(Warning: if you don't upgrade subversion, your first tailor attempt will end with the error that --limit is an unknown flag to svn.)

At this point you're pretty much ready to run tailor, believe it or not! The tailor README advises you to specify command-line arguments corresponding to the configuration you want, and then just use the '--verbose' flag to output a configuration file. That's what I did, but you still need to read quite a bit to figure out exactly what options to use. Luckily for you I've already done the reading -- and here's the configuration to pull a remote SVN repository into Darcs.

File 'trac.tailor':

[DEFAULT]
verbose = True
# CTB *4*
encoding-errors-policy = ignore

[project] target = darcs:target start-revision = INITIAL # CTB *1* root-directory = /tailor/trac state-file = tailor.state source = svn:source subdir = trac

[darcs:target] # CTB *2* repository=/tailor/trac/trac

[svn:source] # CTB *3* module = /trunk repository = http://svn.edgewall.com/repos/trac

There are four configuration points in this file that need discussion.

  1. First, at CTB *1*, the root-directory. Tailor puts all of the files, including the repository, in this directory. It must be writeable by the user running tailor.

  2. Second, at CTB *2*, the target repository directory. As far as I can tell, this is entirely ignored.

  3. Third, at CTB *3*, you need to specify the source repository independently of the module that you're importing. The module doesn't need to be a top-level module, either.

  4. Fourth, at CTB *4*, you will get a unicode exception error midway through your import of Trac if you don't tell it to ignore encoding errors. I'm not sure how to fix this.

Once this configuration file is in place, run tailor trac.tailor, and watch & wait. (It took tailor about 30 minutes to pull in the entire Trac repository of over ~2000 patches -- not extraordinarily fast, but you only need to it once.)

At this point, you have a fully-functional darcs repository. I don't plan to modify it directly -- after all, I don't have check-in access to the Trac archive! -- but you can pull the darcs repository as usual:

darcs get /tailor/trac/trac

and work off of the downstream archive.

In summary, tailor "just worked". Try it out yourself!

I'll make my trac/darcs repository (with the WSGI/SCGI patches in it) available soon; there's still some machine configuration to do before I give out the hostname.

cheers,

--titus

More trustiness

On Sunday, I hacked together a quick Python wrapper for raph's 'net_flow' implementation. Another hour or two of hacking has produced a Python implementation of the advogato trust metric (which actually consists of three distinct trust flows).

Steven Rainwater of robots.net

graciously gave me access to his actual robots.net configuration file, and I verified that my net_flow Python code reproduces the actual robots.net certifications. So I'm fairly sure that the code functions properly -- this isn't too surprising, because it really is a very simple wrapper around raph's code.

In any case, you can download a tar.gz here; or, visit the README. It's fun to play with... note that the distribution contains HTML-scraped advogato and robots.net certifications from this Monday, so you can play with the actual network yourself. (Please don't scrape the sites yourself without asking raph or Steven; yes, I transgressed with advogato, but that doesn't mean you should ;)

Relative to raph's recent "ranting", I hope this little package inspires people to play with trust metrics. There are a couple of easy hacks people could do to with this code:

  • Write Ruby, Perl, etc. wrappings (mebbe with SWIG);
  • Liberate the code from the GLib 2.0 dependency;
  • Look at the actual topology of the advogato.org network in a variety of ways;
etc.

Incidentally, it seems like I really do think best in code. This little exercise has given me a bunch of ideas, most of which only popped up once I got a working Python API and it was clear just how easy it would be to implement them...

--titus

139 older entries...

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!