5 Feb 2003 follower   » (Journeyer)

Talkback ID functionality in Python
Interesting task at work today... While working on a wrapper to catch otherwise unhandled exceptions and deal with them "nicely", I started wondering how Full Circle Software's TalkBack (their website is surprisingly difficult to find using Google...) software and other similar products calculate their unique "Incident ID".

By my understanding, the key attributes of an incident id are:

  • It is short, say 5 to 7 digits.
  • Unique for the incident, i.e. no matter what machine the error occurs on, the same cause generates the same ID. (With some degree of certainty anyway...)

I had a look around on the net but after discovering that TalkBack is actually a closed source product, didn't find any implementation details. I eventually found Anet: A Network Game Programming Library which includes crash logging functionality. After a quick glance at the code (note: only the tar file seems to exist now) I couldn't readily identify a routine that calculates an incident id.

After further reading on crash signatures and the like I've hypothesized that the TalkBack ID and other similar incident ids are probably calculated by running some sort of hash algorithm over items (e.g. relative function call addresses) from the stack trace. But, I don't know for sure. Does anyone around here have any details on how TalkBack or BugToaster crash signatures/incident ids are created?

Anyway, I wanted to come up with some way to calculate a similar incident ID for otherwise uncaught Python exceptions.

The two approaches (with different trade-offs) I ended up with were to generate an incident id from either of the following sets of information found in the trace-back:

  • Line numbers
  • Function names

Using line numbers has the advantage that each exception in a particular function has a unique incident id, but the disadvantage that changes in the code (even the addition or deletion of comments) affects the id dramatically. This method is most suitable for stable final-release type code.

I think we've decided to use function names to generate the ids. Given a series of calls to the following functions (with an uncaught exception handled in the last function):

f1(), f2(), f3

we assemble a string of the form "f1f2f3" (i.e. the function names concatenated together) and then get a hash from the string with Python's hash() function. Then, in order to get a "nicer" number we do a mod 100000 operation on the hash to get our incident id.

(Actually I ended up adding a line number (mod 100) to the end of the incident id also. If you had some standard way of enumerating exception types you could probably throw the exception type into the mix too.)

I think this will serve our purposes for a start, I figured the hashing algorithm can afford to be relatively simple (so using Python's builtin hash() is probably overkill...) as one would hope that the number of items in the hash space would be small (being the total number of paths to uncaught exceptions after all!).

Would be interested in comments on this from anybody who's had more experience with this type of thing than me.

P.S. Being paid to work in Python is nice... :-)

Latest blog entries     Older blog entries

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!