Older blog entries for sab39 (starting at number 28)

D'oh!

Forgot to grab my laptop on the way out of the door. That's two train-traveling hours of potential japitools hacking that turned into sitting around doing nothing.

Life

dyork: Congratulations! My first son or daughter (uncooperative little thing wouldn't tell us which) is due June 4th. Been busily painting the room and assembling cribs, etc...

From the sad-that-this-is-so-rare department

Gateway supports my right to enjoy digital music legally. Isn't that nice of them? Anyone know if their computers are any good? I'd like to recommend an ethical company to friends who ask about buying computers, but I don't want to reccommend a crappy computer...

I guess I'll just keep doing my part by listening to the dude and the cow singing their song...

Umm...

That seems to be everything I can think of to say right now. Huh.

Nothing

Nothing much of note has happened with regard to free software, which is why I haven't posted any diary entries despite having advodiary to make it painless. Of course, "nothing to do with free software" doesn't mean nothing at all. I've been busy painting and assembling furniture for the baby's room, and with my "real" job. I'm not sure how appropriate that kind of discussion is for this forum; I know that I don't mind seeing it in other people's diaries, but I do tend to just skim over it. But it's more important to my life than japitools is, so I guess posting a little bit about it even though it's technically offtopic is okay.

japitools

I've been working mostly on revamping the first pass of japitools, which is the Japize java program. I've revamped the algorithms for iterating over the classes to ensure that the output is sorted in strict alphabetical order with an exception for java.lang.Object. I also rethought the commandline options and added the capability for Japize to dump it's output directly into a .gz file by using Java's GZIPOutputStream class. Finally, I wrote a perl wrapper called japize to (a) allow running the program as japize rather than as "java net.wuffies.japi.Japize", and (b) check the program's arguments for invalid syntax early, so that you don't have to wait for a JVM to start up just to get told that you misspelled "packages".

In the process I discovered a few issues in the jode.bytecode library that Japize relies on for loading classes from zipfiles. I mailed the Jode author about them, and it turns out that the latest Jode in CVS already solves all my problems. Although this code hasn't been publically released yet, I think I'll definitely be using it once it is: it will allow me to vastly simplify several places where I'm currently fighting against the Jode API rather than working with it.

japitools - mind-numbing details

I just need to make a few more tweaks to Japize, and then it will be time to move on to pass 2: the japicompat perl script that tests APIs for compatibility with each other. Aside from making this use a constant and bounded amount of memory (instead of O(size-of-the-JDK-measured-in-methods), which is pretty huge and growing exponentially every release, it seems) I want to add some more options, like comparing bidirectionally instead of unidirectionally, more generalized filtering of the output, and machine-readable error output instead of human-readable. This last change is in order to support the new planned third pass - japifilter.

japifilter will perform the part of the japicompat process that couldn't be done as part of the second pass without using O(N) memory, optionally translate the errors to humanly-readable, and add new options like the ability to exclude errors that appear in another file. That will enable me to say "Show all errors between kaffe and jdk1.1 except for those that also appear between jdk1.2 and jdk1.1" - so I can evaluate kaffe's progress to full 1.1 compatibility without false positives for parts of kaffe that are already jdk1.2 compatible instead.

4 Apr 2002 (updated 4 Apr 2002 at 18:27 UTC) »

advodiary

cmiller: A nice enhancement to advodiary would be to use a temporary filename that ends in .html, to trigger the correct vim syntax hilighting mode. :set syntax=html works well in the meantime, though.

japitools

A recent post on the GNU Classpath mailing list set me thinking about japitools again. The problem with it has been that when faced with an API as mind-bogglingly huge as JDK1.2 and above, it starts swapping to hell even on a well-endowed system. And of course dies entirely on my trusty 486DX-4MB laptop, which is the only place I can take the time to actually develop it. This problem had led me to get discouraged and demotivated with it, so I responded to the Classpath post advocating its use but also angling for a possible new maintainer.

It turns out that after 2 years of demotivation this got me thinking about the problem again, and I figured out some algorithms that would make it possible to operate on gargantuan APIs without any use of O(N) memory. It came down to a few key observations:

  • Comparing two lists can be done in O(1) space if you can guarantee that both lists are sorted identically. By imposing a strict ordering requirement on .japi files, I can make a minimal japicompat that uses O(1) space and straight O(N) time.
  • You can iterate through a hierarchical structure in order in O(Depth*NumItemsInAnyGivenDirectory) space, by loading the list of subdirectories, sorting it, and then recursively processing each one in the same way. This means that Japize, the tool that produces .japi files, can satisfy the strict ordering requirement without requiring O(N) space itself. Also, this nice feature applies even when combining multiple sources (directories and zip files, in my case) of hierarchy information. It does cause multiple redundant scans through the same zipfile index, but that doesn't take any memory. And on systems that do have memory to spare, the disk cache will ensure that the physical disk only gets read once anyway.
  • There's no need to use the same ordering through every stage of the processing. The japicompat first pass requires an alphabetical ordering because hierarchical information might be inconsistent between the two files. However, a filtering pass to eliminate duplicate errors requires a hierarchical sort to keep memory usage in O(NonDuplicateErrors) as opposed to O(ErrorsIncludingDuplicates) which is a much bigger number and can be O(N) in some common situations (exactly what we're trying to avoid). These conflicting needs can both be met by simply applying a sorting pass in between, rather than (as was previously done) keeping all errors in memory and filtering from that (possibly O(N) sized) data structure at the end.
  • Conventional wisdom about sorting doesn't apply to a partial order. Conventional wisdom says that sorting a list of length N requires at least O(N) space anyway. But the only requirement for the filtering pass is that all superclasses appear prior to their subclasses. This sort can be accomplished by simply counting the number of superclasses, and performing multiple passes through the data writing out the classes with 0 superclasses, then 1, then 2, etc. This can be optimized even further through creating a temporary file for each superclass-count, and then concatenating them all together. As above, I can rely on the disk cache to help out systems where all N items really can fit into memory. This optimization uses O(N) space again, but on disk rather than in memory. And in nicely ordered writes, rather than random-access page swapping.
  • Domain-specific knowledge can help in optimization. I happen to know that the only class with zero superclasses is java.lang.Object. Knowing this, if I can arrange that java.lang.Object is already first, I can save myself a pass. Since the requirement for the first pass was only "I must know for sure that both lists are sorted the same", I can alter the original sort to enforce this rule.

Armed with these tricks, it seems that I can completely avoid O(N) memory usage in every stage of japitools. I still have to actually implement it, but it seems that my call for a new maintainer was premature: I've found myself some motivation again. Always nice when that happens.

3 Apr 2002 (updated 3 Apr 2002 at 15:45 UTC) »

Advodiary

Maybe having advodiary to use will help me update more frequently... I've wanted to make more timely entries for a while...

"Killer App"

tk: I had read your last diary entry out of context, in that it mentioned wanting a "killer app" but didn't mention Delitalk explicitly. So the suggestion was just a "killer app" in general, not a killer app for Delitalk explicitly. Sorry.

Just for posterity's sake, I'll post the idea here also...

I have a number of cassette tapes that I don't have CD equivalents to, and most of these I simply cannot get CD equivalents to for any amount of money - either because the tapes pre-dated CDs being widely available, the band has split up and CDs are no longer available, or the tape was made by a "friend of a friend of a friend"'s band who I am no longer able to get in touch with (and who probably don't have the resources to make a CD anyway).

But I'd still like to be able to rip and encode these cassettes into digital form (Ogg, of course). It's relatively easy to use sox or equivalent to record the whole of one side of a cassette into a single wav file (using a walkman and a headphone-out-to-line-in cable). But the difficult part is then going through the cassette and identifying where one track starts and the next ends, and encoding those parts into ogg individually (and, of course, ID3 tagging them and auto-creating a playlist in the same way that abcde does for CDs).

My theory is that it should be possible to hook into xmms or something and let the user use xmms's position control to jump to the end of each track, hit a button indicating "track 1 ends and track 2 starts here" and, after repeating that process for all the tracks and typing in the track (and artist for multi-artist tapes) name, generate an ogg file for each track and a playlist.

Bonus points for making it easy to process two sides of a cassette into a single playlist, applying some kind of filter to the sound to eliminate tape hiss, and for coming up with an inventive UI for noting the exact point in the sound where one track ends and another begins. You could also support cases where tracks overlap slightly (eg the band starts playing the next song before the last chord of the previous one has died down) or where there's some dead space between tracks that you want to skip entirely.

Personal

Life is good...

mjg59: Interesting the way cam.ac.uk people are instantly recognizable by the format of their preferred nicks :)

Shame the RH/AOL merger rumours are apparently false, because I just thought of this AOLinux one-liner:

You've got mail(1)!

Well, I'm pleased with myself. I just wrote an actual useful feature for mozilla - or at least, an actual useful enhancement to someone else's feature.

The new link toolbar in bug 87428 is absolutely awesome (and not in any way written by me, all credit goes to what appears to be a pretty large group of people hacking on this patch). But I found it frustrating that a toolbar that's only useful a tiny fraction of the time (because essentially 0% of sites currently support it) should take up screen real-estate all the time.

I'm not the first person to have suggested that the toolbar auto-hide, but I am apparently the first person to do something about it. So the latest version of the link toolbar comes with a three-way visibility option: Show, Hide, or Show As Needed. And for the first time there will actually be some lines of code in mozilla written by me, which makes me extremely happy (I feel like a little kid running around to show everyone what I did - this feeling will probably pass, but I'm enjoying the thrill of it right now :) )

Oh, and in case anyone here wants to try it, the latest XPI in the bug works on windows (although you have to save the file to disk with an .xpi extension and *then* load it in mozilla) and there are linux instructions about 4/5ths of the way down the bug (from me dated 2001-07-09 10:28). Sites I've found that support the feature are w3.org/TR (if you actually go inside any of the recommendations) and surprisingly, the blackdown.org java-linux FAQ. Having a "Next" button is great for navigating these things.

Now if only lwn.net would reply to my letter asking them to support it... :)

I can't reply to the moneyflow article because I'm no longer an apprentice, so I'll comment here.

I recently read "The Case Against Micropayments" and found myself persuaded by it - mostly because I find myself that I'd much rather pay a flat rate for something than pay a few pennies repeatedly. I know how much "a few pennies at a time" can add up. And whenever any site (such as wsj.com) I encounter has a subscription model, I think instinctively that "they don't get it".

So what might happen instead? I can think of one scenario, combining a couple of ideas I've read elsewhere. I've heard many people say they'd gladly pay to get rid of ads on the pages they view. I can certainly imagine being prepared to pay X dollars a month for the privilege of ad-free surfing.

So how would such a thing be implemented? Well, DoubleClick could probably do such a thing unilaterally, or it could be implemented by a cooperative group of sites that use advertising. The principle would be that every time you visit a page on one of their sites, it first checks a cookie to decide whether you have paid or not. If you have, it doesn't serve you any ads.

The payment would have to be made to an external organization, and should ideally be someone who can be trusted not to track personal information. This entity can probably tell every site within their "network" that you visit, so you would need to trust them to not use that information for anything other than distributing the money.

Since each page access (or at least each browsing session) would have to involve a round-trip to the external organization's server, they can check how often you visit the site and give that site a corresponding proportion of the proceeds (ideally at the same rate as a single ad view would - or an appropriately weighted average of the rates for a clicked-through vs non-clicked-through ad).

Just a thought...

Ooh, I'm back down to observer. Fair enough, I haven't posted a diary entry here since January...

Anyway, here's my rant for the day: Linus' and Alan's changelogs. Suck. When they merge changes with each other, they just say "Merged with Alan" or "Merged with Linus" - they don't say *which* of Linus' or Alan's changes they took. With Alan's you can at least safely assume he took almost everything, but I have *no* idea whether 2.4.2 contains the Matrox G450 changes from 2.4.1ac1 or not. And I need them, because (guess what?) I have a Matrox G450. And Linus' changelog only says "sync up more with Alan", which is no bloody help at all.

Challenge to all perl gurus out there... can you make the following script any shorter?

Original (80 char) version:

$|=++$x;for(;;){$i+=$x;$x=-$x if$i>77|$i<1;print" "x$i."* \r";select$u,$u,$u,.1}

Current (58 char) version:

select$u,$u,$u,.1while$|=print" "x(abs 78-++$i%156)."* \r"

I'll permit changing the phase (starting at a different point in the cycle) because I already had to do that to get to 58, and changing the speed provided that the sleep is neither removed altogether or increased to 1s or greater. Beyond that, the output should be visually indistinguishable.

I was convinced that I couldn't get it shorter when I was at 80! Still, it really looks like it's close to optimum now - the 80 character version had lots of obvious redundancies, I just couldn't figure out how to remove them. This one doesn't.

19 older entries...

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!