advodiary
cmiller: A nice enhancement to advodiary would be to use a temporary filename that ends in .html, to trigger the correct vim syntax hilighting mode. :set syntax=html works well in the meantime, though.
japitools
A recent post on the GNU Classpath mailing list set me thinking about japitools again. The problem with it has been that when faced with an API as mind-bogglingly huge as JDK1.2 and above, it starts swapping to hell even on a well-endowed system. And of course dies entirely on my trusty 486DX-4MB laptop, which is the only place I can take the time to actually develop it. This problem had led me to get discouraged and demotivated with it, so I responded to the Classpath post advocating its use but also angling for a possible new maintainer.
It turns out that after 2 years of demotivation this got me thinking about the problem again, and I figured out some algorithms that would make it possible to operate on gargantuan APIs without any use of O(N) memory. It came down to a few key observations:
- Comparing two lists can be done in O(1) space if you can guarantee that both lists are sorted identically. By imposing a strict ordering requirement on .japi files, I can make a minimal japicompat that uses O(1) space and straight O(N) time.
- You can iterate through a hierarchical structure in order in O(Depth*NumItemsInAnyGivenDirectory) space, by loading the list of subdirectories, sorting it, and then recursively processing each one in the same way. This means that Japize, the tool that produces .japi files, can satisfy the strict ordering requirement without requiring O(N) space itself. Also, this nice feature applies even when combining multiple sources (directories and zip files, in my case) of hierarchy information. It does cause multiple redundant scans through the same zipfile index, but that doesn't take any memory. And on systems that do have memory to spare, the disk cache will ensure that the physical disk only gets read once anyway.
- There's no need to use the same ordering through every stage of the processing. The japicompat first pass requires an alphabetical ordering because hierarchical information might be inconsistent between the two files. However, a filtering pass to eliminate duplicate errors requires a hierarchical sort to keep memory usage in O(NonDuplicateErrors) as opposed to O(ErrorsIncludingDuplicates) which is a much bigger number and can be O(N) in some common situations (exactly what we're trying to avoid). These conflicting needs can both be met by simply applying a sorting pass in between, rather than (as was previously done) keeping all errors in memory and filtering from that (possibly O(N) sized) data structure at the end.
- Conventional wisdom about sorting doesn't apply to a partial order. Conventional wisdom says that sorting a list of length N requires at least O(N) space anyway. But the only requirement for the filtering pass is that all superclasses appear prior to their subclasses. This sort can be accomplished by simply counting the number of superclasses, and performing multiple passes through the data writing out the classes with 0 superclasses, then 1, then 2, etc. This can be optimized even further through creating a temporary file for each superclass-count, and then concatenating them all together. As above, I can rely on the disk cache to help out systems where all N items really can fit into memory. This optimization uses O(N) space again, but on disk rather than in memory. And in nicely ordered writes, rather than random-access page swapping.
- Domain-specific knowledge can help in optimization. I happen to know that the only class with zero superclasses is java.lang.Object. Knowing this, if I can arrange that java.lang.Object is already first, I can save myself a pass. Since the requirement for the first pass was only "I must know for sure that both lists are sorted the same", I can alter the original sort to enforce this rule.
Armed with these tricks, it seems that I can completely avoid O(N) memory usage in every stage of japitools. I still have to actually implement it, but it seems that my call for a new maintainer was premature: I've found myself some motivation again. Always nice when that happens.
