2 Nov 2011 apenwarr   » (Master)


optspec = """
bup save [-tc] [-n name] <filenames...>
r,remote=  hostname:/path/to/repo of remote repository
t,tree     output a tree id
c,commit   output a commit id
n,name=    name of backup set to update (if any)
d,date=    date for the commit (seconds since the epoch)
v,verbose  increase log output (can be used more than once)
q,quiet    don't show progress meter
smaller=   only back up files smaller than n bytes
bwlimit=   maximum bytes/sec to transmit to server
f,indexfile=  the name of the index file (normally BUP_DIR/bupindex)
strip      strips the path to every filename given
strip-path= path-prefix to be stripped when saving
graft=     a graft point *old_path*=*new_path* (can be used more than once)
o = options.Options(optspec)
(opt, flags, extra) = o.parse(sys.argv[1:])

I'm proud of many of the design decisions in bup, but so far the one with the most widespread reusability has been the standalone command-line argument parsing module, options.py (aka bup.options). The above looks like a typical program --help usage message, right? Sure. But it's not just that: it's also the code that tells the options.py how to parse your command line!

As with most of the best things I've done lately, this was not my idea. I blatantly stole the optspec format from git's little known "git rev-parse --parseopt" feature. The reimplementation in python is my own doing and includes some extra bits like [default] values in square brackets and the "--no-" prefix for disabling stuff, plus it wordwraps the help output to fit your screen. And it all fits in 233 lines of code.

I really love the idea of an input file that's machine-readable, but really looks like what a human expects to see. There's just something elegant about it. And it's *much* more elegant than what you see with most option parsing libraries, where you have to make a separate function call or data structure by hand to represent each and every option. Tons of extra punctuation, tons of boilerplate, every time you want to write a new quick command-line tool. Yuck.

options.py (and the git code it's blatantly stolen from) is designed for people who are tired of boilerplate. It parses your argv and gives you three things: opt, a magic (I'll get to that) dictionary of options; flags, a sequence of (flag,parameter) tuples; and extra, a list of non-flag parameters.

So let's say I used the optspec that started this post, and gave it a command line like "-tcn foo -vv --smaller=10000 hello --bwlimit 10k". flags would contain a list like -t, -c, -n foo, -v, -v, --smaller 10000, --bwlimit 10k. extra would contain just ["hello"]. And opt would be a dictionary that can be accessed like opt.tree (1 because -t was given), opt.commit (1 because -c was given), opt.verbose (2 because -v was given twice), opt.name ('foo' because '-n foo' was given and the 'name' option in optspec ends in an =, which means it takes a parameter), and so on.

The "magic" of the opt dictionary relates to synonyms: for example, the same option might have both short and long forms, or even multiple long forms, or a --no-whatever form. opt contains them all. If you say --no-whatever, it sets opt.no_whatever to 1 and opt.whatever to None. If you have an optspec like "w,whatever,thingy" and specify --thingy --whatever, then opt.w, opt.whatever, and opt.thingy are all 2 (because the synonyms occurred twice). Because python is great, 2 means true, so there's no reason to *not* just make all flags counters.

If you write the optspec to have an option called "no-hacky", then that means the default is opt.hacky==1, and opt.no_hacky==None. If the user specifies --no-hacky, then opt.no_hacky==1 and opt.hacky==None. Seems needlessly confusing? I don't think so: I think it actually reduces confusion. The reason is it helps you write your conditions without having double negatives. "hacky" is a positive term; an option --hacky isn't confusing, you would expect it to make your program hacky. But if the default should be hacky - and let's face it, that's often true - then you want to let the user turn it off. You could have an option --perfectly-sane that's turned off by default, but that's a bit unnatural and overstates it a bit. So we write the option as --no-hacky, which is perfectly clear to users, but write the *program* to look at opt.hacky, which keeps your code straightforward and away from double negatives, while letting you use the word that naturally describes what you're doing. And all this is implicit. It's obvious to a human what --no-hacky means, and obvious to a programmer what opt.hacky means, and that's all that matters.

What about --verbose (-v) versus --quiet (-q)? No problem! "-vvv -qq" means opt.verbose==3 and opt.quiet==2. The total verbosity is just always "(opt.verbose or 0) - (opt.quiet or 0)". (If an option isn't specified, it's "None" rather than 0, so you can tell the difference with options that take arguments. That's why we need the "or 0" trick to convert None to 0.)

Sometimes you want to provide the same option more than once and not just have it override or count previous instances. For example, if you want to have --include and --exclude options, you might want each --include to extend, rather than overwrite, the previous one. That's where the flags list comes from; it contains all the stuff in opt, but it stays in sequence, so you can do your own tricks. And you can keep using opt for all the options that don't need this special behaviour, resorting to the flags array only where needed. See a flag you don't recognize? Just ignore it, it's in opt anyway.

Options that *don't* show up in the optspec will give a KeyError when you try to look them up in opt, whether they're set or not. So given the --no-hacky option above, if you tried to look for opt.hackyy (typo!) it would crash when you try checking for the option, not just silently always return False or something.

Oh yeah, and *of course* options.py handles clusters of short options (-abcd means -a -b -c -d), equals or space (--name=this is the same as --name this), doubledash to end option parsing (-x -- -y doesn't parse the -y as an option), and smooshing of arguments into short options (-xynfoo means -x -y -n foo, if -n takes an argument and -x and -y don't).

Best of all, though, it just makes your programs more beautiful. It's carefully designed to not rely on any other source files. Please steal it for your own programs with the joy of copy-and-paste (leaving the copyright notice please) and make the world a better place!

Update 2011/11/04: The license has been updated from LGPL (like the rest of bup) to 2-clause BSD (options.py only), in order to ease copy-and-pasting into any project you want. Thanks to the people who suggested this.

Syndicated 2011-11-02 03:10:37 (Updated 2011-11-05 01:01:59) from apenwarr - Business is Programming

Latest blog entries     Older blog entries

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!