Older blog entries for kr (starting at number 15)

How to Handle Job Failures

There’s a discussion on the beanstalkd mailing list right now about queue introspection and handling failures. My response got a little long, and it could be interesting to users of other queueing systems as well, so here’s a blog post instead.

When we first started using beanstalkd at Causes, some things in our worker development and deployment process took a while to iron out, bur our strategy for handling job failures worked quite well right from the start. In hindsight, I’m happy about it. This is what we did.

The Basic Rule

Never clean up jobs by hand. If a failure happens once, it can happen again. Always write code to handle newly-discovered failure types automatically, then run the new code to do the cleanup.


Before you begin, note that your workers will be numerous, possibly even more so than your web front-ends. I assume you have good logging infrastructure and analysis tools for your web front ends. Use the same infrastructure for the workers, too. It will make your life easier to see all failures and performance data in one place.

  1. Start by having your workers bury any failed jobs.

  2. See what sorts of failures happen in production (by using the high-quality logging that you have to do anyway).

  3. You will see some failures where the job can simply be deleted, others where it’s better to retry the job, and possibly some rare cases where you want to save the job to be inspected by a human (though this sort of hand-holding does not scale and should be avoided). It might also make sense to retry some jobs only a limited number of times before deleting them.

  4. Add unit tests and update the code to deal with these known failure types appropriately (i.e. delete or retry the job), but continue to bury unanticipated failures. For retries, don’t bother with changing the priority, but do add a time delay with exponential backoff. Of course, you must also fix the business logic to recover from these failures or avoid them entirely whenever possible.

  5. Redeploy your application.

  6. When the new code is in production, kick all buried jobs. They will be handled correctly, and you won’t lose any jobs.

  7. Now look at your worker logs again. This process will have removed a lot of noise from your production logs, and new failure types will float to the surface (though the total volume will of course be much smaller). So repeat.

After a couple of iterations, true failures will be very rare indeed. Your system will be running smoothly and it won’t need much attention.

Syndicated 2010-05-02 07:00:00 from Keith Rarick

The Closed iPad is a Moral Problem

At issue here is control. Apple wants to control what you can and can’t do with your computer. (To my knowledge no one has claimed this is false. Speculate all you like on Apple’s motivation for wanting this control; that’s beside the point.) I happen to find this morally objectionable.

Cory Doctorow and others have astutely noticed that people don’t respond much to arguments based on morality, so they framed their complaints differently, emphasizing practical effects. That was a smart strategy, because it let them be more persuasive, but make no mistake, this is a moral issue.

Unfortunately, some have failed to see past the surface of these arguments, causing them to write a bunch of increasingly irrelevant rebuttals.

Ultimately, I think both sides of this “debate” are falling victim to a massive confirmation bias. If you read a statement like this:

What makes products great is their innovation, their creativity, other ineffable qualities. Not the applicability of the first-sale doctrine.

You may just nod in agreement, or you may say, “hold on there, bucko, that’s a hefty assertion, but an assertion is not an argument (or even evidence).” Same goes for something like this:

Buying an iPad for your kids isn’t a means of jump-starting the realization that the world is yours to take apart and reassemble; it’s a way of telling your offspring that even changing the batteries is something you have to leave to the professionals.

A hundred little implicit (dis)agreements get strung together when you read one of these essays, and determine whether you find it convincing or repulsive.

The confirmation bias is especially strong here because everyone dances around the real issue without saying it outright: the closed nature of the iPad is morally wrong. As with any moral issue, it isn’t something you can argue for or against effectively without a groundwork of shared values. Either you recognize this issue or not. Either you consider it important or not.

Folks, of course the iPad will sell lots of units, because, in spite of its moral bankrupcy, it appeals to mass-market consumerism, and because it is backed by Apple’s powerful marketing machine. This may or may not qualify as “success”, depending on your point of view.

Untouchable Design

Why are Apple fans so worked up about this device, really?. Because of its revolutionary design?

The bet is roughly that the future of computing:

  • has a UI model based on direct manipulation of data objects
  • completely hides the filesystem from the user
  • favors ease of use and reduction of complexity over absolute flexibility
  • favors benefit to the end-user rather than the developer or other vendors
  • lives atop built-to-specific-purpose native applications and universally available web apps

Thing is, that describes the litl spot-on. I think excitement about the iPad is much less about its design, and much more about the simple fact of Apple’s market position. If these radical design principles were really so important, folks would have been just as excited about the litl’s launch way way back in November.

This especially undermines all those put-up-or-shut-up arguments about how nobody else competes with Apple’s design and that’s why the iPad is great despite its closed nature. I have yet to see a single thoughtful comment claiming that the iPad is good while the litl simultaneously is not. If someone manages to do this, not through speculation, but having actually used the litl (and even if we may disagree on the details or the conclusion), then great. Until then, you can’t credibly claim that no-one but Apple produces good design.

Syndicated 2010-04-06 07:00:00 from Keith Rarick

Don’t Copy the Call Stack

Some runtimes claim to provide first-class continuations, but implement this by copying the entire call stack. This implementation strategy makes continuations totally unusable in production code, and it should be outlawed. Or maybe such runtimes should be required to call them “shitty continuations” instead of just “continuations”.

Syndicated 2010-02-06 08:00:00 from Keith Rarick

What to Look For in a Programming Language

Occasionally, I get asked what I look for in a programming language, what makes a good language, or what I would do to improve an existing language. Mainstream programming languages (which are almost always applicative with call-by-value evaluation) vary surprisingly little in their abilities, but there are a few significant differences. Aside from all the usual, obvious aspects that don’t need to be repeated, I look for two big things:

  • Powerful flow-control semantics including tail recursion and continuations. For example, scheme.
  • Asynchronous API, especially with a built-in event loop. For example, node.js.

These features are even more useful when combined, yet I’ve never seen a language with both. (So, to my knowledge, sodium will be the first!)

Syndicated 2010-01-14 08:00:00 from Keith Rarick

Asynchronous Programming in Python

Twisted is pretty good. It sits as one of the top networking libraries in Python, and with good reason. It is properly asynchronous, flexible, and mature. But it also has some pretty serious flaws that make it harder than necessary for programmers to use.

This hinders adoption of Twisted, and (worse) it hinders adoption of asynchronous programming in general. Unfortunately, fixing most of these flaws in the context of Twisted would cause massive compatibility problems. This makes me think the world could use a new, Pythonic, asynchronous programming library. Perhaps it should be a fork of Twisted; perhaps it should be a brand-new project. Either way, it would make life much nicer for programmers like you and me in the future.

Toward A Better Event Library

Here is what Twisted gets right:

  • Pervasive asynchrony.
  • The “reactor” pattern.
  • Each asynchronous function does not take callbacks as parameters; it just returns a promise (which Twisted calls a “deferred”).
  • Able to use modern select()-replacements like kqueue, epoll, etc.
  • Integrates with the GTK+ event loop. Same for other GUI toolkits.

These things are all absolutely crucial and Twisted nails them. This is what makes Twisted so great.

Here is what I would do differently:

  • Limited scope – This project has no need to include dozens of incomplete and neglected protocols. Instead, such things can easily (and should) be maintained as separate projects. For example, we don’t need two-and-a-half half-assed HTTP modules, but what if we had one excellent asynchronous HTTP library, which incidentally achieves asynchrony by means of this event library. It might start off as a port of httplib2, which is well designed even if it lacks asynchrony. This would leave the maintainers of the core event system more time to focus on making a really coherent, “as-simple-as-possible-but-no-simpler”, useful tool.
  • User-focused design – A good library, like a good language, should make simple things simple and hard things possible. Twisted makes simple things possible and hard things possible. Most users don’t particularly want to extend base classes to implement derived Factories and Protocols and Clients, they just want to fetch the document at a URL (asynchronously!). Achieving this ease of use is not terribly hard, but it requires conscious effort. You must start by designing the ideal top-level interface, then work downward and make it operate correctly. Twisted’s web modules (I don’t mean to pick only on HTTP here; these are just examples) look to me as if they started from the basic building blocks and added on pieces until HTTP was achieved. We are left with whatever interface happened to come out at the end. Further, they look like they were written by former C++ and Java programmers who haven’t yet fully realized that Python code doesn’t have to be so complicated.
  • Arbitrary events – Promises should let you send more than just “success” and “failure”. You should be able to emit arbitrarily-named events. For example, suppose you make an HTTP request. In the simple case, you just want the complete document when it is fully received. But what if you also want to update a progress bar for the transfer? You shouldn’t have to start digging through the HTTP library’s low-level unbuffered innards. Instead, the promise that eventually delivers you the HTTP response should also emit progress signals that you may observe if you wish.
  • Simple promises – Do not implicitly “chain” callbacks.

Simple Promises

This last problem deserves special attention. The rest are mere annoyances and could be suffered through, if not for implicit chaining. It is a fundamental design flaw, and I wouldn’t be surprised to learn that it’s responsible for more bugs in Twisted-using programs than any other single factor.

Let me first spell out exactly what I mean here by “implicitly chained” callbacks and “simple” promises. In Twisted, you can write:

  deferred = background_job()

Each callback here will be given the return value of the previous callback. I’ll refer to that as implicit chaining.

Instead I advocate having the promise give each callback simply the same value – the original result of the background job. So I’ll call this a simple promise. (In these examples, I’ll use deferred for objects with implicit chaining and promise for simple promises.)

  promise = background_job()

In this example, each callback will get the exact same value. Nothing that any one of them does can affect the others.

Simple promises are more general. The key is to have addCallback and its friends return a new promise for the result of the callback. With this feature, you can still chain callbacks, but you must do it explicitly. That is a good thing. Consider a deferred with implicit chaining:

  def add4(n):
  return n + 4

deferred = background_job()

Supposing background_job supplies a value of 3, this example will log 7. We can just as easily do that without implicit chaining:

  def add4(n):
  return n + 4

promise = background_job()
promise2 = promise.addCallback(add4)

This also logs 7. Now let’s look at an example starting with a simple promise:

  promise = background_job()

This logs 3, twice. But try doing this with implicit chaining. It can’t be expressed. (Yes, you could achieve the same output in many different ways, but here I’m concerned with the structure of control flow.)

More importantly, implicitly chained callbacks are confusing. You must pay careful attention to the order in which you add callbacks. They require complicated diagrams to explain how they behave. If you want to insert a new callback somewhere, you have to be extra careful when you do it, to ensure it goes in the right place in the chain. By contrast, if you want to insert a new callback somewhere with simple promises, you have only to stick it on the correct promise.

Further, implicit chaining makes you do extra work to propagate return values and errors, even when your callback properly doesn’t care about such things. Let’s say you have the following snippet (which is the same for either a promise or a deferred):

  either = background_job()
either.addCallbacks(my_result, my_error_handler)

You just want some basic logging to check what’s going on. With a simple promise, that’s easy:

  promise = background_job()
promise.addCallbacks(my_result, my_error_handler)

With implicit chaining, it’s more work:

  def safe_identity_log(x):

  # If log raises an exception, we still
  # want our real callback to fire, so we
  # have to catch everything here, even
  # though that has nothing to do with the
  # function of this callback.

  # Likewise, we must take care to return
  # the original value, or else the
  # callback will just get None.
  return x

deferred = background_job()
deferred.addCallbacks(my_result, my_error_handler)

This Post is Too Long

Anyway. That’s all I got. I really want to see this exist. So badly, I might actually do it myself. But it will have to wait a bit.

Addendum: Coroutines and Continuations

Writing good code in most asynchronous systems (including Twisted, node.js, and even E) feels inside-out. Your results are passed in as parameters; they don’t come out as return values like they normally would. Same for exceptions. This results in more verbosity, and it just feels weird.

My earlier post The Wormhole describes a transformation that turns things right-side out again. (It’s built out of continuations, but it could just as well be done with coroutines, say in Python.) It makes writing correct asynchronous code almost as easy as writing correct synchronous code. However, it can only be done correctly if your promises are of the simple variety. I’ve since learned that Twisted has attempted this trick. That implementation is useful, but it has several sharp corners. For example, this will not do what you would hope:

  background_deferred = None
can_background_job_complete = False

def f():
  global background_deferred
  background_deferred = background_job()
  value = yield background_deferred
  returnValue(value + 4)

final_deferred = f()
can_background_job_complete = True

Supposing background_job supplies 3, what will this log? In real life: None, then 7. If these were simple promises, it would log 3, then 7.

Syndicated 2009-12-10 08:00:00 from Keith Rarick

Projects I Want

Some software I’d love to see get made, but that I don’t have time to write myself.

CouchDB URL Wrapper

CouchDB’s interface is almost good enough to expose directly to the public, but its URLs are ugly and crufty. See “URL as UI” and “Cool URIs don’t change” for some philosophy.

I’d like to see a thin wrapper that lets you design arbitrary clean URLs for CouchDB. It needn’t do anything else.

Face Detection in Javascript

When I upload photos to Facebook, why to I have to locate the faces by hand? Why doesn’t Facebook do it for me? Javascript (in modern browsers) is now fast enough to do real face detection.

Many other web sites could benefit from this, too. There should be a free, high-quality Javascript library for face detection. At vasc.ri.cmu.edu/NNFaceDetector/ you can find papers describing a pretty good algorithm.

Face Recognition in Javascript

As an extension of the above, try to identify the people whose faces have been detected. Recognition is less generally-applicable than detection, but certainly Facebook-like sites can benefit from this. They could completely automate the process of tagging photos. Leave humans to identify the missing faces and false matches.

Doctest-Style Tool for Network Protocols

I think doctest is pretty nice. I want a tool much like doctest, but for network conversations rather than python interpreter conversations.

So you could write something like:

  >>> GET / HTTP/1.1
>>> Server: localhost
HTTP/1.1 200 OK
Content-type: text/plain
Content-Length: 6


and this tool will check that your new web server is working.

Syndicated 2009-11-18 08:00:00 from Keith Rarick

File Management Ideas

In theory, Gnome Zetigeist makes me happy; kudos to the people working on it for daring to make something new. Its concept excites me, but its design specifics leave me wanting. I think we can do better, and here are some concrete design ideas to back up my claim.

First I want to recognize that the iTunes-style (or Rhythmbox-style or Banshee-style) interface is successful at mitigating the complexity of a huge collection of files. There are just a few salient aspects of such an interface: a list of “data sources” on the left (including “playlists” and “smart playlists”) and a data display on the right. The data display is usually, but not always, a scrolling table with sortable columns. It is also searchable and the results update instantly (sub-250ms or, ideally, before the next frame).

This interface should be applied to all user files, not just media. I don’t advocate an auto-generated interface. Rather each data source (each item in the blue side bar) must be thoughtfully designed. Here are a few examples.


For the Music, Movies, and TV Shows, start with an exact copy of iTunes. It’s pretty good. Then go ahead and make changes if you wish, but your changes must actually be better, not merely different.


The Documents, Spreadsheets, and Presentations items would simply give appropriately-filtered lists of files.


I’m staring at three windows. The first one is drawn by Firefox; it shows a list of files I am currently downloading. The second is drawn by Nautilus; it shows a list of files I have recently downloaded. The third is drawn by Transmission; it shows a list of files I am currently downloading with BitTorrent.

Most of the time I do not care about these distinctions. I simply want to get the file I just downloaded. If it hasn’t finished yet, tell me so. I do not know, a priori, if it has finished downloading – that’s part of why I want to see it.

The Downloads item should present a list of files that have been or are being downloaded (in reverse chronological order by default). It should handle HTTP, FTP, BitTorrent, and all other things (hello, plugins…) that conceptually allow you to “download” a file.


Firefox lets me make bookmarks of web pages. Gnome lets me make bookmarks of things on my computer. Again, I mostly do not care about this distinction. Any open window that represents a file should let me make a bookmark. The window could be showing a spreadsheet, video, web page, mail message, or something else. I don’t care. Just give me a little star button like in Gmail. The star should probably be in the window title bar. (I think this would require an EWMH extension.)

The Bookmarks item should then present a list of things I have starred.

It should also understand Delicious, Weave, and similar things (hello, plugins…).


Gnome keeps a list of recently opened files. Firefox keeps a list of recently visited web pages. You guessed it; I don’t often care about that distinction. The actual data display should be at least as useful as what you get in a modern web browser.


Trash is relatively prosaic, but I’ll go out on a limb and suggest that items in the Trash should be deleted automatically after 30 days, or when space is needed. I should not have to empty the Trash myself.


This one is interesting because it probably makes most sense to show thumbnail images rather than a table of text. Sort of like F-Spot.

Preferences & Applications

These aren’t in my mockup, but might make sense to include.

Syndicated 2009-09-23 07:00:00 from Keith Rarick

The Wormhole

I’m working on an experimental programming language called Sodium. Although I haven’t introduced the language yet, I must share a wonderful idea that recently occurred to me. These examples are written in Sodium, but I’ve avoided some unusual idioms; hopefully they will make sense to anyone with some programming background.

If I do a CPS transform, I can easily do exception-passing with continuations, exactly like return values. Every expression will have two continuations: success and failure (corresponding to return and raise). This implies call-with-current-continuations, plural. And that means I can write the following mind-blowing function:

  def (wormhole promise):
  call/ccs (fn (success failure)
              (promise.on-success success)
              (promise.on-failure failure)

(Here, escape is just another continuation, saved at the beginning of the process, that pops back out to the main event loop.)

What does wormhole do? Consider the following snippet of typical asynchronous I/O code:

  site = "example.net"
path = "/robots.txt"
promise = (http.get site path)
promise.on-success (fn (robots-txt)
                      (spider site robots-txt))
promise.on-failure (fn (error)
                      (spider-all site))

This sort of thing can get pretty deeply nested and ugly, especially if you need to use lots of local variables.

Using wormhole, you can rewrite it:

  site = "example.net"
path = "/robots.txt"
  robots-txt = (wormhole (http.get site path))
  spider site robots-txt
catch error:
  spider-all site

This new example is fully asynchronous, just like the first one, but it is much more readable (as long as you know what’s really going on under the hood).

The wormhole operator has pretty serious implications for any callers of your code who expect things to happen in the usual synchronous order. It’s not clear if the resulting code is really any better, on balance.

Certainly, anyone tempted to use this device should think hard first. At least until we understand it better and develop rules of thumb. I think this function should be discouraged for library code, but, in the right circumstances, it might do for top-level application code. The end result can be simple and pretty while still fully asynchronous.

Syndicated 2009-09-05 07:00:00 from Keith Rarick

Upset at Apple?

Steven Frank has a post about Apple’s iPhone app store policies (or lack thereof).

“I’m furious with Apple and AT&T right now, with regard to the iPhone.”

Really? Dude, what did you expect? It’s Apple.

“The boat may turn slowly, but nothing before has ever suggested to me that Apple are actively malicious.”

No, actually, Apple has a long history of active maliciousness.

But there is hope for the future. The Palm Pre (and to some extent, Android) is set to bring serious competition. When that happens, Apple will be forced to adopt rational policies or lose their developers, followed by their users.

Why expect this? Compared to the iPhone, the Pre currently has essentially three disadvantages, all of which will probably disappear over time, and one big advantage, which will never go away.

Pre’s Disadvantages:

  1. Inferior design. Debatable, and certainly fixable.
  2. Fewer (good) apps. Of course. iPhone has a two-year head start. But this will reverse, because of the advantage below.
  3. Worse network coverage. Sprint < AT&T; Verizon > AT&T. Just wait six months.

Pre’s Advantage:

  1. Javascript. Don’t underestimate the effect of millions of brilliant, hungry web programmers and designers who will never write a line of Objective-C in their lives.

    The iPhone can never overcome this without becoming a Pre clone, and Apple is too proud to do that. I think. Actually, I’m not so sure about this last statement. Apple will probably just copy the Pre and then pretend like they invented HTML-CSS-Javascript-based phone apps.

Syndicated 2009-07-31 07:00:00 from Keith Rarick

Places to Eat

Presenting a list of food places Tracy and I frequent. These all have four- or five-star food in our book. If you know of a great place that isn’t on here, we probably have never been there! Tell us about it.

Or see Tracy & Keith’s Food List in a larger map.

Syndicated 2009-07-29 07:00:00 from Keith Rarick

6 older entries...

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!