Older blog entries for apenwarr (starting at number 611)

2 Nov 2011 (updated 5 Nov 2011 at 01:01 UTC) »

bup.options

optspec = """
bup save [-tc] [-n name] <filenames...>
--
r,remote=  hostname:/path/to/repo of remote repository
t,tree     output a tree id
c,commit   output a commit id
n,name=    name of backup set to update (if any)
d,date=    date for the commit (seconds since the epoch)
v,verbose  increase log output (can be used more than once)
q,quiet    don't show progress meter
smaller=   only back up files smaller than n bytes
bwlimit=   maximum bytes/sec to transmit to server
f,indexfile=  the name of the index file (normally BUP_DIR/bupindex)
strip      strips the path to every filename given
strip-path= path-prefix to be stripped when saving
graft=     a graft point *old_path*=*new_path* (can be used more than once)
"""
o = options.Options(optspec)
(opt, flags, extra) = o.parse(sys.argv[1:])

I'm proud of many of the design decisions in bup, but so far the one with the most widespread reusability has been the standalone command-line argument parsing module, options.py (aka bup.options). The above looks like a typical program --help usage message, right? Sure. But it's not just that: it's also the code that tells the options.py how to parse your command line!

As with most of the best things I've done lately, this was not my idea. I blatantly stole the optspec format from git's little known "git rev-parse --parseopt" feature. The reimplementation in python is my own doing and includes some extra bits like [default] values in square brackets and the "--no-" prefix for disabling stuff, plus it wordwraps the help output to fit your screen. And it all fits in 233 lines of code.

I really love the idea of an input file that's machine-readable, but really looks like what a human expects to see. There's just something elegant about it. And it's *much* more elegant than what you see with most option parsing libraries, where you have to make a separate function call or data structure by hand to represent each and every option. Tons of extra punctuation, tons of boilerplate, every time you want to write a new quick command-line tool. Yuck.

options.py (and the git code it's blatantly stolen from) is designed for people who are tired of boilerplate. It parses your argv and gives you three things: opt, a magic (I'll get to that) dictionary of options; flags, a sequence of (flag,parameter) tuples; and extra, a list of non-flag parameters.

So let's say I used the optspec that started this post, and gave it a command line like "-tcn foo -vv --smaller=10000 hello --bwlimit 10k". flags would contain a list like -t, -c, -n foo, -v, -v, --smaller 10000, --bwlimit 10k. extra would contain just ["hello"]. And opt would be a dictionary that can be accessed like opt.tree (1 because -t was given), opt.commit (1 because -c was given), opt.verbose (2 because -v was given twice), opt.name ('foo' because '-n foo' was given and the 'name' option in optspec ends in an =, which means it takes a parameter), and so on.

The "magic" of the opt dictionary relates to synonyms: for example, the same option might have both short and long forms, or even multiple long forms, or a --no-whatever form. opt contains them all. If you say --no-whatever, it sets opt.no_whatever to 1 and opt.whatever to None. If you have an optspec like "w,whatever,thingy" and specify --thingy --whatever, then opt.w, opt.whatever, and opt.thingy are all 2 (because the synonyms occurred twice). Because python is great, 2 means true, so there's no reason to *not* just make all flags counters.

If you write the optspec to have an option called "no-hacky", then that means the default is opt.hacky==1, and opt.no_hacky==None. If the user specifies --no-hacky, then opt.no_hacky==1 and opt.hacky==None. Seems needlessly confusing? I don't think so: I think it actually reduces confusion. The reason is it helps you write your conditions without having double negatives. "hacky" is a positive term; an option --hacky isn't confusing, you would expect it to make your program hacky. But if the default should be hacky - and let's face it, that's often true - then you want to let the user turn it off. You could have an option --perfectly-sane that's turned off by default, but that's a bit unnatural and overstates it a bit. So we write the option as --no-hacky, which is perfectly clear to users, but write the *program* to look at opt.hacky, which keeps your code straightforward and away from double negatives, while letting you use the word that naturally describes what you're doing. And all this is implicit. It's obvious to a human what --no-hacky means, and obvious to a programmer what opt.hacky means, and that's all that matters.

What about --verbose (-v) versus --quiet (-q)? No problem! "-vvv -qq" means opt.verbose==3 and opt.quiet==2. The total verbosity is just always "(opt.verbose or 0) - (opt.quiet or 0)". (If an option isn't specified, it's "None" rather than 0, so you can tell the difference with options that take arguments. That's why we need the "or 0" trick to convert None to 0.)

Sometimes you want to provide the same option more than once and not just have it override or count previous instances. For example, if you want to have --include and --exclude options, you might want each --include to extend, rather than overwrite, the previous one. That's where the flags list comes from; it contains all the stuff in opt, but it stays in sequence, so you can do your own tricks. And you can keep using opt for all the options that don't need this special behaviour, resorting to the flags array only where needed. See a flag you don't recognize? Just ignore it, it's in opt anyway.

Options that *don't* show up in the optspec will give a KeyError when you try to look them up in opt, whether they're set or not. So given the --no-hacky option above, if you tried to look for opt.hackyy (typo!) it would crash when you try checking for the option, not just silently always return False or something.

Oh yeah, and *of course* options.py handles clusters of short options (-abcd means -a -b -c -d), equals or space (--name=this is the same as --name this), doubledash to end option parsing (-x -- -y doesn't parse the -y as an option), and smooshing of arguments into short options (-xynfoo means -x -y -n foo, if -n takes an argument and -x and -y don't).

Best of all, though, it just makes your programs more beautiful. It's carefully designed to not rely on any other source files. Please steal it for your own programs with the joy of copy-and-paste (leaving the copyright notice please) and make the world a better place!

Update 2011/11/04: The license has been updated from LGPL (like the rest of bup) to 2-clause BSD (options.py only), in order to ease copy-and-pasting into any project you want. Thanks to the people who suggested this.

Syndicated 2011-11-02 03:10:37 (Updated 2011-11-05 01:01:59) from apenwarr - Business is Programming

Vortex Update: 7 months later

Previously I wrote about my upcoming trip through the Vortex. It's no longer upcoming and I'm still on the other side, so I'm severely biased and you can't trust anything I say about it. But I thought I'd give a quick status update on my stated goals from last time:

  • Work on customer-facing real technology products. Success. Not released yet, but you'll see.
  • Help solve some serious internet-wide problems, like traffic shaping, etc. Yes, a bit, on my 20% project for now. You'll see that too. :) Must try harder.
  • Keep coding, maybe manage a small team. Yes, with more of the latter and less of the former, but the ratio is under my control.
  • Keep working on my open source projects. Sort of. Spreading myself so thin with cool projects that these are suffering, but that's nobody's fault but mine.
  • Eat a *lot* of free food. Yes, though in fact I've lost weight, giving lie to the so called "Google 20."
  • Avoid the traps of long release cycles and ignoring customer feedback. Total fail. The mechanics of this (totally under my control, but with lots of pressure to do it "wrong") are kind of interesting and I might be able to post about it later. For now let's just admit that I wanted to say my team had a product out by now (at least in public invite-only testing), and we don't, and that's mostly my fault.
  • Avoid switching my entire life to Google products. Partial success, I have an Apple TV instead. But they gave me a free Android phone that (as far as I can tell) has actual garbage collection freezes *while I'm typing on the virtual keyboard*. So... fail.
  • Produce more valuable software, including revenue, inside Google than I would have by starting a(nother) startup. Won't know until we release something.
Conclusion: mixed results. But the good news is that where things aren't as good as I'd like, the root cause can be traced to me. Does that sound like a bad thing? No! It's pretty much the ideal case when it comes to motivating me to learn fast and produce more. I'm working on the right stuff in the right ways, and the environment is well configured for me to do some amazing work. There is effectively no management interference (or input) at all. I just need to correct some of my own methodological flaws, especially trimming and prioritizing what I work on.

More later.

Syndicated 2011-10-16 07:12:00 from apenwarr - Business is Programming

16 Oct 2011 (updated 19 Oct 2011 at 02:02 UTC) »

Slides from my PyCodeConf presentation

Many thanks to Github and friends for hosting PyCodeConf in Miami this year. Normally I don't like conferences, and I imagined I wouldn't like Miami either but I was proven completely wrong on both counts. Call it low expectations, but hey, they delivered!

I quite like the way my presentation turned out. It has real actual facts (tm) including two benchmarks where Java loses to python in literally every possible way. This isn't that surprising - since I hate java, I like python, and I wrote the benchmarks - but my angst was somewhat increased since I had been actually trying on purpose to make a biased benchmark that Java would pass in order to make myself appear more well-balanced... and I completely failed. I was not able to make java appear better than python in any way, be it startup time, memory usage, code execution time, library power, or source code readability. This is surely some kind of perverse marketing gimmick: since it's actually literally impossible to make a benchmark that makes java look good, then everyone who publishes java benchmarks automatically looks biased, so java lovers can discount the opinion of the "haters." Insidious.

But enough with the conspiracy theories. Here are Avery's slides from pycodeconf in pdf form including some detailed speaker notes. I thought about giving them to you in Google Docs format, but naturally Docs is totally incapable of showing speaker notes by default, so forget it. Just open the pdf already. I put a lot of work into the notes so you wouldn't have to try to learn from my (rather sparse) slides.

Oh yeah, what's it actually about? My experiences writing fast code (like bup and sshuttle) in python. And how it's possible. And how not to do it.

One audience member said, "I thought about 40% of it was really insightful. The other 60% I had no idea what you were talking about." I consider that a rave review, I think.

Update 2011/10/18: They posted an audio recording of my talk. Also check out the other recorded talks and slides.

Syndicated 2011-10-16 03:28:29 (Updated 2011-10-19 02:02:13) from apenwarr - Business is Programming

18 more tidbits of randomness

Five years after my last post about the Montreal Fringe Festival, a lot of things have changed, but a lot of things are just how they were before. Have I changed in five years? At least in one way: this year, I don't feel any pressure to tell you what this has to do with programming :)

And so, 18 tidbits of randomness, in chronological order this time:

Karaoke gone pro. Hypnotic failure modes (and some success).

A tough time in Texas, and a fine time on Mars. Men re: men; poet re: angry poet. Proof that scriptwriting has gotten better over time, and proof that lack of writing can lead to self-incrimination.

New York style neuroses; Shakespeare style collaboration; Montreal style relationships.

Choose pretty much just the one adventure, or see work experience as an adventure, or grab your bicycle and have an actual adventure, or let someone who should know tell you about cosmic adventure.

Bunnies - exactly as advertised, but so much better than it sounds.

Angsty, but fruitless.

Radio, but visible.

And don't forget: 7 years of afterparties.

I only saw 18 shows this year, but it only took three days.

This year's record-breaking density has been made possible by Bixi.

(Previously.)

Syndicated 2011-06-23 02:01:28 from apenwarr - Business is Programming

9 May 2011 (updated 9 May 2011 at 03:03 UTC) »

Why bitcoin will fail

    Reading about bitcoin. Thought about writing a blog rant, but "OMG they're all totally crazy" wasn't long enough, so here we are. Filler.

        -- Me on twitter

    I now have had my foggy crystal ball for quite a long time. Its predictions are invariably gloomy and usually correct, but I am quite used to that and they won't keep me from giving you a few suggestions, even if it is merely an exercise in futility whose only effect is to make you feel guilty.

        -- E.W. Dijkstra

I'm in the "commerce" group at work, and I've done quite a bit of work in the world of banking, so it seemed vaguely relevant when I ran into the technical paper about bitcoin (Google it) and its associated various web sites. This led to my above, admittedly rather smarmy, twitter post...

...and then someone, yes, inevitably, asked me for clarification.

See, a bitcoin rant is almost too over-the-top for me. Asking why I think bitcoin won't work is like asking why the sky isn't red. I mean, wait, you think it *is* red? You actually took that seriously? Oh boy. Where do I even start?

But just for you, because I know all the valued subscribers to this diary have been deprived of my ranting lately, I will expand on it a little.

Just one more side note. Most of the time, I try to give projects the benefit of the doubt. If they don't affect me, then it's really no matter to me if they succeed or fail. I might keep an eye on them to see if my prediction (usually failure) comes true or not, and try to learn from the result. But I don't actively *want* projects to fail. I would much rather they succeed.

In this case as well, I don't really care. I don't own any bitcoins. I don't particularly want to. If one day I have to own some for some reason, I will buy them at the market rate and get screwed, just as I do today with U.S. and Canadian dollars.

Since I don't actually care, I had a bit of trouble motivating myself to write more than 140 characters about it. I wasn't going to bother. So, um, thanks to my followers on twitter for providing the motivation.

So here we go:

FAIL #1: If you like bitcoin, then you must think the gold standard was a good idea.

The gold standard, for those who don't know, was the (now thoroughly discredited) idea that for every dollar you print, you need to have an appropriate amount of gold stored away somewhere that someone, someday, theoretically, could demand to get back in exchange for your worthless piece of paper. If you honestly believe that abandoning the gold standard was a bad idea - and there are indeed people who believe this - then you might as well stop reading now. Wiser men than I have explained in excruciating detail why you're an idiot. This article will not convince you, it will just make you angry.

Still with me?

Okay, just for background, for people who don't already have a pre-formed opinion, the gold standard is a bad idea for several reasons. Here are some of them:

In order to create currency, you have to do a bunch of pointless busywork. Originally, that meant mining for gold, so you could take this gold (obtained at great expense) and hide it in a fortress where nobody would ever see or feel or admire it. In all of history, it is extremely doubtful that anybody has *ever* walked into a U.S. government office and demanded their gold in exchange for dollars. That's because:

Gold is a stupid inconvenient currency that's worse than paper. Go up to the street vendor selling a hot dog, and try to get him to give you a hot dog in exchange for the equivalent value in gold dust. (That's really not very much dust.) See what happens. Gold is the universal currency, is it? The thing that anybody would and will take, any time, throughout history? No. It's heavy, messy, hard to measure, and I can't get my ATM to withdraw or deposit it. If I want 1000x as much gold as one gold nugget, I can't just get a $1000 bill; I have to get a gold nugget 1000x as big and heavy. Who wants this?

Believing in the gold standard is disbelieving in capitalism. The magic of capitalism is entirely contained in the following two words: MAKING MONEY. Have you ever thought about those two words? What's interesting about them is they don't seem to make any sense. When I go into the office and do work, am I literally "making" money? Why do they call it that? Well, as a matter of fact, you *are* literally making money. You are a machine: you eat food and breathe air and magically, you produce outputs that can be sold for much more money than the cost of the food and air. You produced actual value, and that value can be measured, and that measurement is called money. You made money. Out of nothing. *That* is capitalism. (Compare with digging up useless coloured rocks and then hiding them in a fortress so nobody can see them. Those people make the economy go round?)

If the gold standard worked, the 1930s depression wouldn't have happened, and we couldn't have recovered, period, from the recent banking crisis.

Back in the 1930s, the U.S. still had gold-backed currency. Why was there a depression? Because people stopped producing valuable stuff. The amount of money was constant; the gold didn't disappear. But somehow, suddenly people didn't have enough food or housing. Why? Because they refused to produce unless they got paid for it. When they didn't get paid, they couldn't spend that money, and so they couldn't pay for other things, and so other people refused to produce since they wouldn't get paid either, and so on in a giant cycle. The money was there, but it stopped moving.

How did the depression get resolved? In short, people started doing stuff (especially a big war) whether they could afford it or not. It turned out that all those idle people could be productive if they had a good reason. Gold turned out not to be a good enough reason.

Relatedly, the U.S. survived the 2008 banking crisis - which had a legitimate opportunity to convert itself into another depression - by spending its way out. As it happens, the U.S. was able to spend money it didn't actually have. Why? Because they don't care about the gold standard. If they had had a constant amount of gold, then they would not have been able to spend more than they had, and so people wouldn't have been paid, and those people would have refused to produce, and they wouldn't be able to buy things, so more people would refuse to produce... and we're back to square one.

Motivation is everything. Gold is nothing.

Which leads us to the last, most important reason to abandon the gold standard:

The ability of governments to print (and destroy) money is a key tool in economic management.

The Federal Reserve (and other related institutions in each country, like the Bank of Canada) has the right to print money. It largely does this through a pretty blunt mechanism, the interest rate. I won't go into a lot of detail - look up "federal funds rate" if you want to learn more. But in short, when they lower the rate, banks are willing to "borrow more money" from the federal reserve (which they then lend to you, and so on). When the federal funds rate is high, banks need to give back this money, so they don't give out as many loans, and so on.

What is this money that the federal reserve "lends" to banks? It's fictional. Bits in a computer database. No fancy encryption. They just manufacture it on the spot, as needed. And when it's returned, they make it disappear.

    Update 2011/05/08: Some gold standard supporters will tell you that in fact, this ability to print money is what causes hyperinflation, which causes economic collapse. But no, the causality doesn't work like that. Hyperinflation occurs when government nutbars try to stop an economic collapse by wildly printing money. No economic system can protect you when nutbars are in charge. But yes, the early symptoms of failure will look somewhat different.

The Federal Reserve uses this control to speed up or slow down the economy and try to reduce fluctuations. The results aren't always perfect (humans aren't very good at acting like math equations) but it's actually not too bad overall.

If governments can't control the money supply, then they can't set interest rates. If they can't set interest rates, they can't control the economy, and if nobody is controlling the economy, then the economy will act like any uncontrolled complex system: it'll go crazy.

(Incidentally, this is also why it's important that the Federal Reserve not be controlled directly by politicians. Find me a politician who will say anything other than, "OH YEAH! MAKE THAT ECONOMY GO FASTER!" at election time.)

...

Okay, so, back to bitcoin. Bitcoin is exactly like the gold standard, only digital:

  • "Mining" (they even borrowed the word!) bitcoins is pointless busywork that produces nothing of real value.
  • Bitcoins are less convenient than paper currency.
  • Bitcoin denies the truth of capitalism, that it's about *value*, not about money, by preventing the money supply from expanding when the economy does.
  • Bitcoin allows for random unrecoverable effects like the 1930s depression.
  • Bitcoin removes government control over the economy, which means there is *no* control over the economy.
By comparison, look at our current currencies:

  • Generating money is essentially free (most money isn't even paper, but printing the paper is pretty cheap too).
  • Current currency is very convenient and has many convenient forms.
  • When more valuable stuff is created, more money appears without having to mine for unrelated crap first.
  • Current currency allowed us to spend out of a depression caused by the banking crisis.
  • The current system allows the government to reduce economic fluctuations.
FAIL #2: Even if it was a good idea, governments would squash it.

In the previous section, it might have sounded like I think governments are altruistic peace-loving tea-drinking hippie commies.

I don't actually think that. (I think the government of British Columbia might be like that, which explains why they don't get any work done, but that's another story.)

The truth is that governments are power structures. Governments control ("govern") things. And while the economy - like any complex engineering construction - needs to have controls on it, some of the controls end up going too far, and all of them end up being manipulated by people in power.

One of the lures of bitcoin is the idea of taking power away from the people in power. Admit it. That's one of the reasons why you like it.

Well, word to the wise: if there's one thing the people in power already know, it's that money is power. It's not like you're going to catch them by surprise here. They don't have to be the smartest cookies in the jar to figure that part out.

Digital money is *not* like pirating digital music and movies. The government sort of cares about those, but let's be serious: pirating a few movies will not topple the U.S. government. Losing control of money will.

Governments have big weapons and propaganda machines and actual secret agents and citizens who believe that keeping the economy under control is a good idea. If you threaten the currency, you are threatening the entire power structure of the civilized world. You are, quite literally, an enemy of the state. You are attempting to build nuclear weapons in your bedroom. Or at least they'll see it that way.

Do you think you'll get away with it because your monopoly money is made of bits instead of paper? I don't.

The only reason you'd get away with it is if you're too small to matter. Which is certainly the current situation.

FAIL #3: The whole technological basis is flawed.

Bitcoin is, fundamentally, a cryptosystem. Some people argue that it's "as strong as SHA256" and that "if someone could break SHA256, then banks would be in trouble as it is."

Wrong on both counts.

First of all, I admit, I don't totally understand the bitcoin algorithms and systems. I don't really need to. I understand only this: the road to crypto hell is paved with the bones of people who thought that a good cryptosystem can be designed by combining proven algorithms in unproven ways. SHA256 may be the strongest part of bitcoin, but a cryptosystem is only as strong as its weakest link.

You want to replace the world economy with a hard-to-guess math formula? Where's your peer review? Where are the hordes of cryptographers who have spent 30+ years trying to break your algorithm and failed? Come talk to me in 30 years. Meanwhile, it's safe to assume that bitcoin has serious flaws that will allow people to manufacture money, duplicate coins, or otherwise make fake transactions. In that way, it's just like real dollars.

But what's *not* like real dollars is the cost of failure. With real dollars, when people figure out how to make counterfeit bills, we find those people and throw them in jail, and eventually we replace our bills with newer-style ones that are more resistant to failure. And the counterfeiters are limited by how many fake bills their printing press can produce.

With bitcoin, a single failure of the cryptosystem could result in an utter collapse of the entire financial network. Unlimited inflation. Fake transactions. People not getting paid when they thought they were getting paid. And the perpetrators of the attack would make so much money, so fast, that they could apply their fraud at Internet Scale on Internet Time.

(Ha, and don't even talk to me about how your world-changing financial system would of course also be protected by anti-fraud laws so we could still punish people for faking it. If we still need the government, what is the point of your currency again?)

The current financial system is slow, and tedious, and old, and in many ways actually broken or flawed. But one thing we know is that it's *resilient*. One single mathematical error will not send the whole thing into a tailspin. With bitcoin, it will.

And no, a break in SHA256 would not break the current financial system or ruin any banks. How could it? What would even be the mechanism for such an attack? How would it make the paper bills in my pocket stop working for buying hot dogs? Can't we just hunt down and arrest the people who forged the fake transactions?

FAIL #4: It doesn't work offline.

Stupid, crappy, printed paper money is old fashioned and flawed, but you know what? It actually works offline, because the easily-forged piece of paper is just barely hard enough to forge that normal people won't try to forge it. It's the original peer-to-peer financial network, although there's a "central coordinator" somewhere issuing tokens.

As soon as you go electronic, forgery becomes trivial to do on a massive scale, so offline just isn't an option. Yes, there are "offline" mechanical paper-based credit card readers, but they aren't anonymous: they have your name and card number. If you bounce too many transactions from one of those, someone will be sent to hunt you down. The risk is contained.

There is no way to make bitcoin even remotely safe offline. There is no fallback mechanism except exchanging your bitcoins for cash. But if you're going to rely on a paper currency anyway, what is bitcoin buying you? It's just yet another way to spend money. As a person currently suffering through managing U.S. versus Canadian dollars, I can tell you, exchange rates are just not worth the hassle.

...

Summary

1. Like the gold standard, a successful bitcoin would send our economy back into the dark ages.

2. Even if it became popular, governments would squash it because of #1 and because they like being in power.

3. A single mathematical or other error in the cryptosystem would cause instant, unresolvable, worldwide hyperinflation. After hundreds of years of analysis, there are no known flaws in the current financial system that could lead to that. (Other than the known causes of hyperinflation of course, ie. total gross mismanagement of the entire country.)

4. It's not even useful except as an online-only addition to normal currency, and my normal currency already works fine online.

The sky is JUST NOT RED, dammit.

Tell me again why you think it is?

...

Update 2011/05/08: Counterpoint!

An anonymous (really, they anonymized their return address) reader replies with the following. I'll just reprint it in full because it's awesome. There's nothing quite like just letting an unelected representative of a movement embarrass himself. Oh Internet, how I love you.

    Was that for real? I'm not sure if your stupid or just trolling.

    The US dollar has lost 97% of it's value since leaving the gold standard.

    Germans in the Weimer republic had to buy their sausage with wheelbarrows full of paper currency. Too long ago for you? Mid-90's Yugoslavia, samething.

    "Believing in the gold standard is disbelieving in capitalism" and how do you think capitalism came about?

    "If the gold standard worked, the 1930s depression wouldn't have happened, and we couldn't have recovered, period, from the recent banking crisis."

    You really don't know history.

    "The ability of governments to print (and destroy) money is a key tool in economic management."

    REALLY? Then why did the Soviet Union fail? They should have been the richest country in the world if your statement was true.

    "What is this money that the federal reserve "lends" to banks? It's fictional. Bits in a computer database. No fancy encryption. They just manufacture it on the spot, as needed. And when it's returned, they make it disappear."

    Loaned with interest. How does the interest disappear when more money is owed then exists?

    I wouldn't be surprised if you said somewhere else that current debts are managable.

    "If governments can't control the money supply, then they can't set interest rates. If they can't set interest rates, they can't control the economy, and if nobody is controlling the economy, then the economy will act like any uncontrolled complex system: it'll go crazy."

    From, "disbelieving in capitalism"" to that? See also; Soviet Union.

    I don't use BitCoin (yet) but your reasoning there is even more pathetic.

    No reply because your the stupidest person I've seen this week and that's saying something.

For the record, I'm stupid *and* trolling. That's why it was hard to tell.

Syndicated 2011-05-08 23:48:33 (Updated 2011-05-09 03:03:32) from apenwarr - Business is Programming

26 Apr 2011 (updated 2 May 2011 at 22:06 UTC) »

Avery, sshuttle, and bup at LinuxFest Northwest (Bellingham, WA) April 30

Where: LinuxFest Northwest conference (Bellingham, WA)
When: 1:30-3:30 on Saturday, April 30 (conf runs all weekend)
Price: FREE!

You might think that now that I live in New York, I would stop doing presentations on the West coast. But no. Ironically, right after moving to New York, I'll have done three separate presentations (four, if you count this one as two) on the West coast in a single calendar month.

In this particular case, it's because I proposed my talks back when I lived in BC, when Bellingham was a convenient one-hour drive from Vancouver's ferry terminal. Now it's a day-long trip across the continent (and twice across the US/Canada border). But oh well, it should be fun.

Also, I foolishly took someone's advice from a Perl conference one time (was it Damian Conway?) and proposed *two* talks, under the theory that if you propose two talks, you double your chances that the conference admins might find one of them interesting, but of course nobody would be crazy enough to give you *two* time slots. Clearly this theory is crap, because this is the second time I've tried it out, and in both cases *both* of my talks have been selected. Thanks a lot.

The good news is that at least they're in consecutive time slots. So while I'll be hoarse by the end, I only have to psych myself up once.

Bellingham is convenient to reach from Vancouver, Seattle, and Portland, among other places, and the conference is free, so take your chance to come see it! If you like open source, it promises to be... filled with open source.

Um, and I promise to start writing something other than my presentation schedule in this space again soon. I realize how annoying it is when a blog diary turns into a glorified presentation schedule. I'm working on it.

Update 2011/05/02: By popular request, my slides from the conference:

Enjoy.

Syndicated 2011-04-26 03:02:29 (Updated 2011-05-02 22:06:06) from apenwarr - Business is Programming

Avery is doing a presentation in Mountain View (maybe about bup)

Where: Hacker Dojo in Mountain View, California
When: 7:30pm on Tuesday, April 12
Why: Because (as Erin tells me) I have trouble saying no.

I have heard unconfirmed rumours that there are programmers of some sort somewhere in the region of Silicon Valley, despite how silly this sounds in concept. (Silicon? That stuff you make beaches out of? Why would any nerd go anywhere near a beach?) Nevertheless, after my thrilling and/or mind boggling presentation and/or rant about bup in San Francisco on Saturday, there was some interest in having me do something similar out in the middle of nowhere, so I accepted.

You're invited! I'm not expecting a very big crowd, given the short notice, which means it will probably be more Q&A and less presentation. But I'll bring my presentation slides just in case. There will be demos. There will be oohing and aahing, I guarantee it, even if I have to do it myself.

I might also talk about sshuttle or redo, or maybe Linux arcnet poetry, if there are any poetry lovers in the audience. (I doubt there will be any arcnet users on the beach, so a talk on arcnet is unlikely.)

Syndicated 2011-04-11 17:11:33 from apenwarr - Business is Programming

Avery's doing a bup presentation in San Francisco

When: Saturday, April 9, 2:30 PM
Where: San... Francisco... somewhere

The venue is not quite certain yet since we don't know how many people are actually interested.

If you want to see me talk about how we took the git repository format and made a massively more scalable implementation of it, as well as my evil (disclaimer: I do not speak for my employer) schemes for the future, please email [tony at godshall.org] with subject "bup SF". Thanks to Tony for organizing this little adventure.

Tell your friends!

Or skip the boring meatspace stuff and jump straight to bup on github.

Syndicated 2011-04-05 22:41:46 from apenwarr - Business is Programming

3 Apr 2011 (updated 3 Apr 2011 at 18:04 UTC) »

What you can't say

Normally I don't write a new post just about updates to a previous post, but I have added quite a lot of clarifications and notes to I hope IPv6 never catches on, in response to copious feedback I've received through multiple channels. Perhaps you will enjoy them.

One very interesting trend is that the comments about the article on news.ycombinator were almost uniformly negative - especially the highly upvoted comments - while I simultaneously saw tons and tons of positive comments on Twitter (it was my most popular article ever) and numerous people wrote me polite messages with agreement and/or minor suggestions and clarifications. Only one person emailed me personally to say I was an idiot (although it was my first, at least from people I don't know).

The trend is curious, because normally news.yc has more balanced debate. Either I'm utterly completely wrong and missed every point somehow (as a highly upvoted point-by-point rebuttal seems to claim)... or I seriously pinched a nerve among a certain type of people.

All this reminds me of Paul Graham's old article, What You Can't Say. Perhaps some people's world views are threatened by the idea of IPv6 being pointless and undesirable.

And all *that*, in turn, reminded me of my old article series about XML, sadomasochism, and Postel's Law. I was shocked at the time that some people actually think Postel's Law is *wrong*, but now I understand. Some people believe the world must be *purified*; hacky workarounds are bad; they must be removed. XML parsers must not accept errors. Internet Protocol must not accept anything less than one address per device. Lisp is the one truly pure language. And so on.

Who knows, maybe those people will turn out to be right in the end. But I'm with Postel on this one. Parsers that refuse to parse, Internet Protocol versions that don't work with 95% of servers on the Internet, and programming languages that are still niche 50+ years later... sometimes you just have to relax and let it flow.

Update 2011/04/02: Another big example of a "less good" technology failing to catch on for similar reasons as IPv6: x86 vs. non-x86 architectures. Everyone knows x86 is a hack... but we're stuck with it. (ARM is making an impact in power-constrained devices, though, just like IPv6 is making an impact in severely IPv4-constrained subnets. Who will win? I don't know, I just know that IPv4 and x86 are less work for me, the programmer, right now.)

Thinking about the problem in that way - why "worse" (hackier) technologies tend to stick around while the purified replacements don't - reminded me of Worse is Better by Richard Gabriel. In retrospect, when I divided people above into "purists" and "Postel's Law believers", I guess I was just restating Gabriel's much better-written point about "worse-is-better" vs. "the right thing." If you haven't read the article, read it; it's great. And see if you're firmly on one side or the other, and if you think the other side is clearly just crazy.

If you think that, then *that*, in turn, brings us back to "What You Can't Say." The truth is, you *can* say it, but people will jump down your throat for saying it. And not everybody will; only the very large group of people in the camp you're not in.

That's why we have religious wars. Figurative ones, anyway. I suspect real religious wars are actually about something else.

Syndicated 2011-04-03 01:30:12 (Updated 2011-04-03 18:04:57) from apenwarr - Business is Programming

28 Mar 2011 (updated 29 Mar 2011 at 02:13 UTC) »

I hope IPv6 *never* catches on

(Temporal note: this article was written a few days ago and then time-released.)

This year, like every year, will be the year we finally run out of IPv4 addresses. And like every year before it, you won't be affected, and you won't switch to IPv6.

I was first inspired to write about IPv6 after I read an article by Daniel J. Bernstein (the qmail/djbdns/redo guy) called The IPv6 Mess. Now, that article appears to be from 2002 or 2003, if you can trust its HTTP Last-Modified date, so I don't know if he still agrees with it or not. (If you like trolls, check out the recent reddit commentary about djb's article.) But 8 years later, his article still strikes me as exactly right.

Now, djb's commentary, if I may poorly paraphrase, is really about why it's impossible (or perhaps more precisely, uneconomical, in the sense that there's a chicken-and-egg problem preventing adoption) for IPv6 to catch on without someone inventing something fundamentally new. His point boils down to this: if I run an IPv6-only server, people with IPv4 can't connect to it, and at least one valuable customer is *surely* on IPv4. So if I adopt IPv6 for my server, I do it in addition to IPv4, not in exclusion. Conversely, if I have an IPv6-only client, I can't talk to IPv4-only servers. So for my IPv6 client to be useful, either *all* servers have to support IPv6 (not likely), or I *must* get an IPv4 address, perhaps one behind a NAT.

In short, any IPv6 transition plan involves *everyone* having an IPv4 address, right up until *everyone* has an IPv6 address, at which point we can start dropping IPv4, which means IPv6 will *start* being useful. This is a classic chicken-and-egg problem, and it's unsolvable by brute force; it needs some kind of non-obvious insight. djb apparently hadn't seen any such insight by 2002, and I haven't seen much new since then.

(I'd like to meet djb someday. He would probably yell at me. It would be awesome. </groupie>)

Still, djb's article is a bit limiting, because it's all about why IPv6 physically can't become popular any time soon. That kind of argument isn't very convincing on today's modern internet, where people solve impossible problems all day long using the unstoppable power of "not SQL", Ruby on Rails, and Ajax to-do list applications (ones used by breakfast cereal companies!).

No, allow me to expand on djb's argument using modern Internet discussion techniques:

Top 10 reasons I hope IPv6 never catches on

Just kidding. No, we're going to do this apenwarr-style:

What I hate about IPv6

Really, there's only one thing that makes IPv6 undesirable, but it's a doozy: the addresses are just too annoyingly long. 128 bits: that's 16 bytes, four times as long as an IPv4 address. Or put another way, IPv4 contains almost enough addresses to give one to each human on earth; IPv6 has enough addresses to give 39614081257132168796771975168 (that's 2**95) to every human on earth, plus a few extra if you really must.

Of course, you wouldn't really do that; you would waste addresses to make subnetting and routing easier. But here's the horrible ironic part of it: all that stuff about making routing easier... that's from 20 years ago!

Way back in the IETF dark ages when they were inventing IPv6 (you know it was the dark ages, because they invented the awful will-never-be-popular IPsec at the same time), people were worried about the complicated hardware required to decode IPv4 headers and route packets. They wanted to build the fastest routers possible, as cheaply as possible, and IPv4 routing tables are annoyingly complex. It's pretty safe to assume that someday, as the Internet gets more and more crowded, nearly every single /24 subnet in IPv4 will be routed to a different place. That means - hold your breath - an astonishing 2**24 routes in every backbone router's routing table! And those routes might have 4 or 8 or even 16 bytes of information each! Egads! That's... that's... 256 megs of RAM in every single backbone router!

Oh. Well, back in 1995, putting 256 megs of RAM in a router sounded like a big deal. Nowadays, you can get a $99 Sheevaplug with twice that. And let me tell you, the routers used on Internet backbones cost a lot more than $99.

It gets worse. IPv6 is much more than just a change to the address length; they totally rearranged the IPv4 header format (which means you have to rewrite all your NAT and firewall software, mostly from scratch). Why? Again, to try to reduce the cost of making a router. Back then, people were seriously concerned about making IPv6 packets "switchable" in the same way ethernet packets are: that is, using pure hardware to read the first few bytes of the packet, look it up in a minimal routing table, and forward it on. IPv4's variable-length headers and slightly strange option fields made this harder. Some would say impossible. Or rather, they would, if it were still 1995.

Since then, FPGAs and ASICs and DSPs and microcontrollers have gotten a whole lot cheaper and faster. If Moore's Law calls for a doubling of transistor performance every 18 months, then between 1995 and 2011 (16 years), that's 10.7 doublings, or 1663 times more performance for the price. So if your $10,000 router could route 1 gigabit/sec of IPv4 in 1995 - which was probably pretty good for 1995 - then nowadays it should be able to route 1663 gigabits/sec. It probably can't, for various reasons, but you know what? I sincerely doubt that's IPv4's fault.

If it were still 1995 and you had to route, say, 10 gigabits/sec for the same price as your old 1 gigabit/sec router using the same hardware technology, then yeah, making a more hardware-friendly packet format might be your only option. But the router people somehow forgot about Moore's Law, or else they thought (indications are that they did) that IPv6 would catch on much faster than it has.

Well, it's too late now. The hardware-optimized packet format of IPv6 is worth basically zero to us on modern technology. And neither is the simplified routing table. But if we switch to IPv6, we still have to pay the software cost of those things, which is extremely high. (For example, Linux IPv6 iptables rules are totally different from IPv4 iptables rules. So every Linux user would need to totally change their firewall configuration.)

So okay, the longer addresses don't fix anything technologically, but we're still running out of addresses, right? I mean, you can't argue with the fact that 2**32 is less than the number of people on earth. And everybody needs an IP address, right?

Well, no, they don't:

The rise of NAT

NAT is Network Address Translation, sometimes called IP Masquerading. Basically, it means that as a packet goes through your router/firewall, the router transparently changes your IP address from a private one - one reused by many private subnets all over the world and not usable on the "open internet" - to a public one. Because of the way TCP and UDP work, you can safely NAT many, many private addresses onto a single public address.

So no. Not everybody in the world needs a public IP address. In fact, *most* people don't need one, because most people make only outgoing connections, and you don't need your own public IP address to make an outgoing connection.

By the way, the existence of NAT (and DHCP) has largely eliminated another big motivation behind IPv6: automatic network renumbering. Network renumbering used to be a big annoying pain in the neck; you'd have to go through every computer on your network, change its IP address, router, DNS server, etc, rewrite your DNS settings, and so on, every time you changed ISPs. When was the last time you heard about that being a problem? A long, long time ago, because once you switch to private IP subnets, you virtually never have to renumber again. And if you use DHCP, even the rare mandatory renumbering (like when you merge with another company and you're both using 192.168.1.0/24) is rolled out automatically from a central server.

Okay, fine, so you don't need more addresses for client-only machines. But every server needs its own public address, right? And with the rise of peer-to-peer networking, everyone will be a server, right?

Well, again, no, not really. Consider this for a moment:

Every HTTP Server on Earth Could Be Sharing a Single IP Address and You Wouldn't Know The Difference

That's because HTTP/1.1 (which is what *everyone* uses now... speaking of avoiding chicken/egg problems) supports "virtual hosts." You can connect to an IP address on port 80, and you provide a Host: header at the beginning of the connection, telling it which server name you're looking for. The IP you connect to can then decide to route that request anywhere it wants.

In short, HTTP is IP-agnostic. You could run HTTP over IPv4 or IPv6 or IPX or SMS, if you wanted, and you wouldn't need to care which IP address your server had. In a severely constrained world, Linode or Slicehost or Comcast or whoever could simply proxy all the incoming HTTP requests to their network, and forward the requests to the right server.

(See the very end of this article for discussion of how this applies to HTTPS.)

Would it be a pain? Inefficient? A bit expensive? Sure it would. So was setting up NAT on client networks, when it first arrived. But we got used to it. Nowadays we consider it no big deal. The same could happen to servers.

What I'd expect to happen is that as the IPv4 address space gets more crowded, the cost of a static IP address will go up. Thus, fewer and fewer public IP addresses will be dedicated to client machines, since clients won't want to pay extra for something they don't need. That will free up more and more addresses for servers, who will have to pay extra.

It'll be a *long* time before we reach 4 billion (2**32) server IPs, particularly given the long-term trend toward more and more (infinitely proxyable) HTTP. In fact, you might say that HTTP/1.1 has successfully positioned itself as the winning alternative to IPv6.

So no, we are obviously not going to run out of IPv4 addresses. Obviously. The world will change, as it did when NAT changed from a clever idea to a worldwide necessity (and earlier, when we had to move from static IPs to dynamic IPs) - but it certainly won't grind to a halt.

Next:

It is possible do do peer-to-peer when both peers are behind a NAT.

Another argument against widespread NATting is that you can't run peer-to-peer protocols if both ends are behind a NAT. After all, how would they figure out how to connect to each other? (Let's assume peer-to-peer is a good idea, for purposes of this article. Don't just think about movie piracy; think about generally improved distributed database protocols, peer-to-peer filesystem backups, and such.)

I won't go into this too much, other than to say that there are already various NAT traversal protocols out there, and as NAT gets more and more annoyingly mandatory, those protocols and implementations are going to get much better.

Note too that NAT traversal protocols don't have a chicken-and-egg problem like IPv6 does, for the same reason that dynamic IP addresses don't, and NAT itself doesn't. The reason is: if one side of the equation uses it, but the other doesn't, you might never know. That, right there, is the one-line description of how you avoid chicken-and-egg adoption problems. And how IPv6 didn't.

IPv6 addresses are as bad as GUIDs

So here's what I really hate about IPv6: 16-byte (32 hex digit) addresses are impossible to memorize. Worse, auto-renumbering of networks, facilitated by IPv6, mean that anything I memorize today might be totally different tomorrow.

IPv6 addresses are like GUIDs (which also got really popular in the 1990s dark ages, notably, although luckily most of us have learned our lessons since then). The problem with GUIDs are now well-known: that is, although they're globally unique, they're also totally unrecognizable.

If GUIDs were a good idea, we would use them instead of URLs. Are URLs perfect? Does anyone love Network Solutions? No, of course not. But it's 1000x better than looking at http://b05d25c8-ad5c-4580-9402-106335d558fe and trying to guess if that's *really* my bank's web site or not.

The counterargument, of course, is that DNS is supposed to solve this problem. Give each host a GUID IPv6 address, and then just map a name to that address, and you can have the best of both worlds.

Sounds good, but isn't actually. First of all, go look around in the Windows registry sometime, specifically the HKEY_CLASSES_ROOT section. See how super clean and user-friendly it isn't? Barf. But furthermore, DNS on the Internet is still a steaming pile of hopeless garbage. When I bring my laptop to my friend's house and join his WLAN, why can't he ping it by name? Because DNS sucks. Why doesn't it show up by name in his router control panel so he knows which box is using his bandwidth? Because DNS sucks. Why can the Windows server browse list see it by name (sometimes, after a random delay, if you're lucky), even though DNS can't? Because they got sick of DNS and wrote something that works. Why do we still send co-workers hyperlinks with IP addresses in them instead of hostnames? Because the fascist sysadmin won't add a DNS entry for the server Bob set up on his desktop PC.

DNS is, at best, okay. It will get better over time, as necessity dictates. All the problems I listed above are mostly solved already, in one form or another, in different DNS, DHCP, and routing products. It's certainly not the DNS *protocol* that's to blame, it's just how people use it.

But still, if you had to switch to IPv6, you'd discover that those DNS problems that were a nuisance yesterday are suddenly a giant fork stabbing you in the face today. I'd rather they fixed DNS *before* making me switch to something where I can't possibly remember my IP addresses anymore, thanks.

Server-side NAT could actually make the world a better place

So that's my IPv6 rant. I want to leave you with some good news, however: I think the increasing density of IPv4 addresses will actually make the Internet a better place, not a worse one.

Client-side NAT had an unexpected huge benefit: security. NAT is like "newspeak" in Orwell's 1984: we remove nouns and verbs to make certain things inexpressible. For example, it is not possible for a hacker in Outer Gronkstown to even express to his computer the concept of connecting to the Windows File Sharing port on your laptop, because from where he's standing, there is no name that unambiguously identifies that port. There is no packet, IPv4 or IPv6 or otherwise, that he can send that will arrive at that port.

A NAT can be unbelievably simple-minded, and just because of that one limitation, it will vastly, insanely, unreasonably increase your security. As a society of sysadmins, we now understand this. You could give us all the IPv6 addresses in the world, and we'd still put our corporate networks behind a NAT. No contest.

Server-side NAT is another thing that could actually make life better, not worse. First of all, it gives servers the same security benefits as clients - if I accidentally leave a daemon running on my server, it's not automatically a security hole. (I actually get pretty scared about the vhosts I run, just because of those accidental holes.)

But there's something else, which I would be totally thrilled to see fixed. You see, IPv4 addresses aren't really 32-bits. They're actually 48 bits: a 32-bit IP address plus a 16-bit port number. People treat them as separate things, but what NAT teaches us is that they're really two parts of the same whole: the flow identifier, and you can break them up any way you want.

The address of my personal apenwarr.ca server isn't 74.207.252.179; it's 74.207.252.179:80. As a user of my site, you didn't have to type the IP (which was provided by DNS) or the port number (which is a hardcoded default in your web browser), but if I started another server, say on port 8042, then you *would* have to enter the port. Worse, the port number would be a weird, meaningless, magic number, akin to memorizing an IP address (though mercifully, only half as long).

So here's my proposal to save the Internet from IPv6: let's extend DNS to give out not only addresses, but port numbers. So if I go to www2.apenwarr.ca, it could send me straight to 74.207.252.179:8042. Or if I ask for ssh.apenwarr.ca, I get 74.207.252.179:22.

Someday, when IPv4 addresses get too congested, I might have to share that IP address with five other people, but that'll be fine, because each of us can run our own web server on something other than port 80, and DNS would transparently give out the right port number.

This also solves the problem with HTTPS. Alert readers will have noticed, in my comments above, that HTTPS can't support virtual hosts the same way HTTP does, because of a terrible flaw in its certificate handling. Someday, someone might make a new version of the HTTPS standard without this terrible flaw, but in the meantime, transparently supporting multiple HTTPS servers via port numbers on the same machine eliminates the problem; each port can have its own certificate.

(Update 2011/03/28: zillions of people wrote to remind me about SNI, an HTTPS extension that allows it to work with vhosts. Thanks! Now, some of those people seemed to think this refutes my article somehow, which is not true. In fact, the existence of an HTTPS vhosting standard makes IPv6 even *less* necessary. Then again, the standard doesn't work with IE6.)

This proposal has very minor chicken-and-egg problems. Yes, you'll have to update every operating system and every web browser before you can safely use it for *everything*. But for private use - for example, my personal ssh or VPN or testing web server - at least it'll save me remembering stupid port numbers. Like the original DNS, it can be adopted incrementally, and everyone who adopts it sees a benefit. Moreover, it's layered on top of existing standards, and routable over the existing Internet, so enabling it has basically zero admin cost.

Of course, I can't really take credit for this idea. It's already been invented and is being used in a few places.

Embrace IPv4. Love it. Appreciate the astonishing long-lasting simplicity and resilience of a protocol that dates back to the 1970s. Don't let people pressure you into stupid, awful, pain-inducing, benefit-free IPv6. Just relax and have fun.

You're going to be enjoying IPv4 for a long, long time.

Syndicated 2011-03-27 02:00:38 (Updated 2011-03-29 02:13:30) from apenwarr - Business is Programming

602 older entries...

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!