Older blog entries for sab39 (starting at number 32)

Privacy

I often find myself working on my laptop on the train, or in another public place. Sometimes I'd like to spend that time writing things that I'd prefer not to be read over my shoulder by random fellow travelers. It occurred to me that this goal could probably be attained with a utility that would rot13 all text on a tty (my laptop is text-mode-only due to being a 486 with 4Mb RAM :) ).

However, my searches for software to do this have come up empty so far. I'd even settle for a VIM mode that would accept keystrokes normally, but display everything rot13'd - everything I'm concerned about right now would be in VIM. Is it really true that nobody's come up with a "conceal what you're doing" rot13 program?

Chicago: You can do much better than the three-fold increase in filesize that your solution implies. Off the top of my head, I can see an immediate way to increase the filesize by just twofold-plus-a-small-constant with almost the same benefit as your solution. Since this is entirely thought up in a couple of minutes by someone who's not well-versed in error correction mathematics, I guarantee it's possible to do much better than my solution, too. So don't go ahead and implement my solution - rather take the existence of my solution as a heads-up that there are a lot of ways to improve, and read some literature on the subject to find out what the state of the art is. I'd be interested to know myself how close I was able to get to that!

First, represent a as 10 and b as 01. By using this technique, you can identify any single-bit error, but not correct it - if you get 11, you know that's wrong, but you can't tell whether it's supposed to be a or b. (Choosing 01 and 10 rather than 00 and 11 protects against the kind of error where the channel is dead and always reports zeros, or ones). After reading this full stream, the recipient will have a result that contains three states: definitely a, definitely b, or unknown (we deliberately ignore the possibility of getting errors in two consecutive bits; we can catch that case later).

Then, after transmitting your entire stream, send a checksum of the correct data. Use a well-known checksumming algorithm like md5 that is known to have good properties, rather than rolling your own, which almost certainly won't work as well. Transmit the checksum using the same 10 / 01 encoding as the rest of the content.

Here's how the recipient of the data can figure out the corrected data from the original stream in conjunction with the checksum: Assume each "definitely known" bit is correct, but try both values for each unknown bit (in both the data and the checksum). Calculate the checksum of the data and see if it matches the checksum that was sent. This algorithm is O(2^n) in the number of "unknowns" found, but that number is expected to be low. If the number is too high to calculate in a reasonable amount of time, the transmission can be flagged as corrupt and the user can try again. If all has gone well, though, there should be exactly one combination of values for the unknown bits that leads to the checksum matching. Voila, you've now got your correct sequence of 'a's and 'b's.

There are a couple of ways that this can fail to give a corrected stream, but even those cases can be identified. One is mentioned above - too many "unknowns". Another is if there were errors in two consecutive bits, turning a b into a "definite a". That situation will be flagged because no combination of values for the "unknowns" will result in a matching checksum. The other possibility is the astronomical possibility that an incorrect sequence of 'a's and 'b's would generate a correct checksum. With a good checksumming algorithm, that should be absolutely impossible without also producing a too-large number of errors (and hence "unknowns"), so the transfer would be flagged as corrupt anyway.

As an Englishman living in the US, I'm double-gutted today... I felt that if an England-US final was still possible by today, then it would be the most likely outcome. So much for that idea.

Sing along with me:

We're going home, we're going -- England's going home
We're going home, we're going home, we're going -- England's going home
We're going home, we're going home, we're going -- England's going home
We're going home, we're going home, we're going -- England's going home

Everyone seems to know the line
They say it every ti-i-ime
They just know
They're so sure
That this time we could really make it
We could be a big hit
But I know that we're shit
Cause I remember

Three lions on the shirt
Awful refereeing
Let the Hand of God
Send us home all seething

So many hopes so many dreams
But all those England tea-ea-eams
Couldn't break
With the theme:
Cause I still see those penalties missed
And the refs that we hissed
And how we were all pissed
At England losing

Three lions on the shirt
Another fricking shootout
Beckam lying hurt
Had to get the boot out

Yeah I know you believe
But don't be naive...

Three lions on the shirt
Two Brazilians scoring
When the final comes
I will still be snoring

We're going home, we're going home, we're going -- England's going home
[repeat ad infinitum]

25 Apr 2002 (updated 25 Apr 2002 at 18:16 UTC) »
bjf: See my past diary entry for an idea for a conceptually simple but not-yet-written application. At least, I've never been able to find an existing implementation...

This might also be interesting to habes because it sounds like something that Audacity might be able to implement extremely easily. Or be done using a plugin to Audacity, if such a mechanism exists. It's basically a problem in user-interface design with respect to sounds, and it seems like Audacity has the bulk of that problem solved.

[updated to split paragraphs and to make a link to the relevant past diary entry]

D'oh!

Forgot to grab my laptop on the way out of the door. That's two train-traveling hours of potential japitools hacking that turned into sitting around doing nothing.

Life

dyork: Congratulations! My first son or daughter (uncooperative little thing wouldn't tell us which) is due June 4th. Been busily painting the room and assembling cribs, etc...

From the sad-that-this-is-so-rare department

Gateway supports my right to enjoy digital music legally. Isn't that nice of them? Anyone know if their computers are any good? I'd like to recommend an ethical company to friends who ask about buying computers, but I don't want to reccommend a crappy computer...

I guess I'll just keep doing my part by listening to the dude and the cow singing their song...

Umm...

That seems to be everything I can think of to say right now. Huh.

Nothing

Nothing much of note has happened with regard to free software, which is why I haven't posted any diary entries despite having advodiary to make it painless. Of course, "nothing to do with free software" doesn't mean nothing at all. I've been busy painting and assembling furniture for the baby's room, and with my "real" job. I'm not sure how appropriate that kind of discussion is for this forum; I know that I don't mind seeing it in other people's diaries, but I do tend to just skim over it. But it's more important to my life than japitools is, so I guess posting a little bit about it even though it's technically offtopic is okay.

japitools

I've been working mostly on revamping the first pass of japitools, which is the Japize java program. I've revamped the algorithms for iterating over the classes to ensure that the output is sorted in strict alphabetical order with an exception for java.lang.Object. I also rethought the commandline options and added the capability for Japize to dump it's output directly into a .gz file by using Java's GZIPOutputStream class. Finally, I wrote a perl wrapper called japize to (a) allow running the program as japize rather than as "java net.wuffies.japi.Japize", and (b) check the program's arguments for invalid syntax early, so that you don't have to wait for a JVM to start up just to get told that you misspelled "packages".

In the process I discovered a few issues in the jode.bytecode library that Japize relies on for loading classes from zipfiles. I mailed the Jode author about them, and it turns out that the latest Jode in CVS already solves all my problems. Although this code hasn't been publically released yet, I think I'll definitely be using it once it is: it will allow me to vastly simplify several places where I'm currently fighting against the Jode API rather than working with it.

japitools - mind-numbing details

I just need to make a few more tweaks to Japize, and then it will be time to move on to pass 2: the japicompat perl script that tests APIs for compatibility with each other. Aside from making this use a constant and bounded amount of memory (instead of O(size-of-the-JDK-measured-in-methods), which is pretty huge and growing exponentially every release, it seems) I want to add some more options, like comparing bidirectionally instead of unidirectionally, more generalized filtering of the output, and machine-readable error output instead of human-readable. This last change is in order to support the new planned third pass - japifilter.

japifilter will perform the part of the japicompat process that couldn't be done as part of the second pass without using O(N) memory, optionally translate the errors to humanly-readable, and add new options like the ability to exclude errors that appear in another file. That will enable me to say "Show all errors between kaffe and jdk1.1 except for those that also appear between jdk1.2 and jdk1.1" - so I can evaluate kaffe's progress to full 1.1 compatibility without false positives for parts of kaffe that are already jdk1.2 compatible instead.

4 Apr 2002 (updated 4 Apr 2002 at 18:27 UTC) »

advodiary

cmiller: A nice enhancement to advodiary would be to use a temporary filename that ends in .html, to trigger the correct vim syntax hilighting mode. :set syntax=html works well in the meantime, though.

japitools

A recent post on the GNU Classpath mailing list set me thinking about japitools again. The problem with it has been that when faced with an API as mind-bogglingly huge as JDK1.2 and above, it starts swapping to hell even on a well-endowed system. And of course dies entirely on my trusty 486DX-4MB laptop, which is the only place I can take the time to actually develop it. This problem had led me to get discouraged and demotivated with it, so I responded to the Classpath post advocating its use but also angling for a possible new maintainer.

It turns out that after 2 years of demotivation this got me thinking about the problem again, and I figured out some algorithms that would make it possible to operate on gargantuan APIs without any use of O(N) memory. It came down to a few key observations:

  • Comparing two lists can be done in O(1) space if you can guarantee that both lists are sorted identically. By imposing a strict ordering requirement on .japi files, I can make a minimal japicompat that uses O(1) space and straight O(N) time.
  • You can iterate through a hierarchical structure in order in O(Depth*NumItemsInAnyGivenDirectory) space, by loading the list of subdirectories, sorting it, and then recursively processing each one in the same way. This means that Japize, the tool that produces .japi files, can satisfy the strict ordering requirement without requiring O(N) space itself. Also, this nice feature applies even when combining multiple sources (directories and zip files, in my case) of hierarchy information. It does cause multiple redundant scans through the same zipfile index, but that doesn't take any memory. And on systems that do have memory to spare, the disk cache will ensure that the physical disk only gets read once anyway.
  • There's no need to use the same ordering through every stage of the processing. The japicompat first pass requires an alphabetical ordering because hierarchical information might be inconsistent between the two files. However, a filtering pass to eliminate duplicate errors requires a hierarchical sort to keep memory usage in O(NonDuplicateErrors) as opposed to O(ErrorsIncludingDuplicates) which is a much bigger number and can be O(N) in some common situations (exactly what we're trying to avoid). These conflicting needs can both be met by simply applying a sorting pass in between, rather than (as was previously done) keeping all errors in memory and filtering from that (possibly O(N) sized) data structure at the end.
  • Conventional wisdom about sorting doesn't apply to a partial order. Conventional wisdom says that sorting a list of length N requires at least O(N) space anyway. But the only requirement for the filtering pass is that all superclasses appear prior to their subclasses. This sort can be accomplished by simply counting the number of superclasses, and performing multiple passes through the data writing out the classes with 0 superclasses, then 1, then 2, etc. This can be optimized even further through creating a temporary file for each superclass-count, and then concatenating them all together. As above, I can rely on the disk cache to help out systems where all N items really can fit into memory. This optimization uses O(N) space again, but on disk rather than in memory. And in nicely ordered writes, rather than random-access page swapping.
  • Domain-specific knowledge can help in optimization. I happen to know that the only class with zero superclasses is java.lang.Object. Knowing this, if I can arrange that java.lang.Object is already first, I can save myself a pass. Since the requirement for the first pass was only "I must know for sure that both lists are sorted the same", I can alter the original sort to enforce this rule.

Armed with these tricks, it seems that I can completely avoid O(N) memory usage in every stage of japitools. I still have to actually implement it, but it seems that my call for a new maintainer was premature: I've found myself some motivation again. Always nice when that happens.

3 Apr 2002 (updated 3 Apr 2002 at 15:45 UTC) »

Advodiary

Maybe having advodiary to use will help me update more frequently... I've wanted to make more timely entries for a while...

"Killer App"

tk: I had read your last diary entry out of context, in that it mentioned wanting a "killer app" but didn't mention Delitalk explicitly. So the suggestion was just a "killer app" in general, not a killer app for Delitalk explicitly. Sorry.

Just for posterity's sake, I'll post the idea here also...

I have a number of cassette tapes that I don't have CD equivalents to, and most of these I simply cannot get CD equivalents to for any amount of money - either because the tapes pre-dated CDs being widely available, the band has split up and CDs are no longer available, or the tape was made by a "friend of a friend of a friend"'s band who I am no longer able to get in touch with (and who probably don't have the resources to make a CD anyway).

But I'd still like to be able to rip and encode these cassettes into digital form (Ogg, of course). It's relatively easy to use sox or equivalent to record the whole of one side of a cassette into a single wav file (using a walkman and a headphone-out-to-line-in cable). But the difficult part is then going through the cassette and identifying where one track starts and the next ends, and encoding those parts into ogg individually (and, of course, ID3 tagging them and auto-creating a playlist in the same way that abcde does for CDs).

My theory is that it should be possible to hook into xmms or something and let the user use xmms's position control to jump to the end of each track, hit a button indicating "track 1 ends and track 2 starts here" and, after repeating that process for all the tracks and typing in the track (and artist for multi-artist tapes) name, generate an ogg file for each track and a playlist.

Bonus points for making it easy to process two sides of a cassette into a single playlist, applying some kind of filter to the sound to eliminate tape hiss, and for coming up with an inventive UI for noting the exact point in the sound where one track ends and another begins. You could also support cases where tracks overlap slightly (eg the band starts playing the next song before the last chord of the previous one has died down) or where there's some dead space between tracks that you want to skip entirely.

Personal

Life is good...

mjg59: Interesting the way cam.ac.uk people are instantly recognizable by the format of their preferred nicks :)

Shame the RH/AOL merger rumours are apparently false, because I just thought of this AOLinux one-liner:

You've got mail(1)!

23 older entries...

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!