jwb is currently certified at Journeyer level.

Name: Jeffrey Baker
Member since: 2000-03-26 18:18:02
Last Login: N/A

FOAF RDF Share This

Homepage: atari.saturn5.com/~jwb


GPG key fingerprint: E2C7 92D3 6C50 44D3 AE2F CE90 6370 F8B0 1D75 9D18


Recent blog entries by jwb

Syndication: RSS 2.0

There are two projects upon which I have embarked, both driven by my increasing discomfort with the modern computer experience.

Project the First: A desktop environment implemented entirely in Java

People give me strange looks when I talk about this project, but I am very excited about the prospects. For the last year or so I have been using Mac OS X, switching between its native interface and a full-screen X11/GNOME session hosted on my Linux box. (Switching between the two is a matter of a single keystroke). Prior to that I used Linux exclusively.

What strikes about these two systems is not their differences, but their similarities. Mac users and GNOME users (and Windows users) behave in ways which are grossly unsafe with regards to their files and data. We tend to just download and run programs from random places on the Internet, although at least with Linux the distributors exercise some editorial influence. These programs that we download and run are usually native programs. They have access to all the computer's facilities including the screen and keyboard and mouse, the audio output and input, the printer, the network, and the modem. Any of these programs can enumerate, access, and change any of the user's files. They can be trojans or viruses. And the worst feature of these systems is that the user is given woefully insufficient tools for finding out what the applications do.

My new system seeks to address this. The Java language was chosen for its structured method of limiting the actions of a program. Untrusted code can be forbidden to access files or the network or executing native code. It is perfectly safe to download and run any Java code, so long as the SecurityManager is correctly configured to limit that code's actions. Java's security policy framework is the basis of my new system.

This new system (which needs badly to be named) presents a new way of interacting with programs or applications or whatever you care to call them. Suppose you download a new music player, similar to iTunes. When the application is loaded it will lack any kind of file system or network capability. A file inside the archive will describe the kinds of capabilities the program's author feels are needed. For an example music player, access to the audio output will be necessary, access to a CD reader or writer may be optional, and the program might be expected to make network connections to musicbrainz.org, freedb.org, and last.fm. Before the application is even loaded, the system, using a full-screen input-capturing user interface, will ask the user to grant these capabilities. If everything seems to be in order, the user can confirm and the application will be loaded with the appropriate capabilities.

The nice thing about Java is that the security framework ensures that the application doesn't perform any hanky panky. You might have granted the ability to connect to musicbrainz.org on port 80, but if the application starts making connections to nsa.gov, or listening for connections on port 31337, or sending mail, then 1) these things will not work, and 2) the user can be alerted. The application can be killed if necessary and safely removed.

This system requires a file manager, because by default the applications are loaded only with the ability to read and write files in a private subdirectory which supports only that application. For an application to gain access to any of the user's files, the user must actually drag and drop that file to the appplication. In the example of the music player, the user might drag an entire folder, ~/Music, giving the application the ability to enumerate and read the files therein contained. But if the user chooses to not do this, then the application has no ability to spy on, steal, or alter the user's files and private data. Even if the user gives the music player the full ability to read, write, create, and delete files and folders under ~/Music, they can still be sure that the program won't intentionally or accidentally delete their email.

I only started this project one month ago, but already some of the issues to be tackled have become clear. The first and probably worst is my own ignorance of Java internals. A benefit of this system is supposedly that bugs in the native JPEG library won't cause buffer overflows like they can and commonly do on Windows, Mac OS, and Linux. But it's not entirely clear to me when and how Java might be resorting to native code behind my back. Is the entirety of the JDK pure Java? Can I be sure that Java is never using JNI to access something like libjpeg or libfreetype? If I restrict native code, what parts of Java will I break?

Another problem is that the Java look-and-feel libraries are universally horrible. The Windows L&F looks almost like Windows, the Mac L&F looks sort of like Mac OS, and the GTK+ L&F resembles GTK+. They are all flamboyantly buggy in ways that the user will notice, and ugly as well. I need to find a L&F which I can use which is beautiful and well-implemented and comes with the right license. Any pointers in this area would be appreciated.

Project the Second: a web forum which is not horrible

The majority of forums and blogs out there are hideously bad. They tend to connect to databases dozens of times for every page load. They run a lot of code on the host CPUs even when generating exactly the same content for the millionth time. They have a knack for being slashdotted at just the wrong moment.

Normally I would not care about this junk but because of a website which my wife has started building I have been sucked into this project. The goal of the project is to build a forum and blog-with-comments system which maintains the bulk of its data as static files on the regular Unix filesystem. As we all know Linux can spray out static web content at impressive rates, so I believe this system will be much faster than current systems like Scoop and Slash and Moveable Type, at least for a read/write ratio above a certain but as yet undetermined level.

This kind of thing seems pretty easy, and I am thinking that a site of the scale of Daily Kos could easily be supported on modest hardware. Daily Kos and Kuro5hin and other Scoop sites love to visit the database, but the need to do so is severely overestimated by the implementors. After an author writes a diary or blog entry, that diary is either not going to change or will be update so seldom that the ratio of writes to reads will be indistiguishable from zero. The same is true of comments. A busy thread on Daily Kos will be read by hundreds of thousands of people, but will attract only a few hundred comments. There is, therefore, no reason why MySQL should be consulted on every page load.

I tried to do this same project many years ago, working from Slashcode, but for a number of reasons this is now much easier. We now have many wonderful high-quality implementations of the DOM in every practical language: Perl, Python, C, Java, and many more. We also have something on Linux that we certainly did not have in 1999: a filesystem capable of holding billions of files, with very large numbers of files in single directories. We now have Lustre, which in addition to scaling to huge sizes is also cache-coherent across nodes, allowing quite advanced multi-reader/multi-writer behavior even when you have a rack full of machines handling the HTTP requests.

In other words, I'm quite excited about this project and optimistic about the prospect for success.


If you think mysql didn't install itself in the right place, just pass some more parameters to the configure script. You can also change the path to the database files at runtime with the -h,--datadir option. Agree completely that mysql can seem like a bad joke after using pg, but then again mysql doesn't have the operational headaches that come with postgresql: vacuum, reindex, and so forth. I tend to use both systems.


My colleagues at work had the genius idea to install a mail filtering system called MailMarshal in front of their Exchange server. I would not recommend this product. It blocks mails containing GIF files, but the same image converted to PNG is not blocked. Also if the GIF is buried in one extra level of MIME hierarchy, it will be allowed. Finally it seems more interested in the memo From: header than the SMTP envelope sender.

The punch line is that I cannot send mail even to people within my own company without resorting to a comical rfc822-in-rfc822 tunneling system. It is the mail analog to ipip tunneling through firewalls. And it proves once again that control freaks don't understand networks.

Unfortunately I was unable to control my frustration after a week of trying to diagnose this mail problem, and flamed the administrators (gently). This never helps the situation but I always do it. I am broken in that regard.

Incremental apt-get update

I'd like to meddle with the way Debian distributed software. It is very silly that apt-get update downloads several megabytes of data, just to get the incremental changes since the last apt-get update. I believe it should be easy to implement an incremental update that sends only the changes since time t. This would require that each upload have a serial number, but either that already exists somewhere in the apt system, or it could be hacked in without too much trouble.

By my caculations a daily apt-get update could be reduced from several megabytes to tens of kilobytes.


After reading numerous enthusiastic endorsements of BitTorrent, I downloaded the client and tried to get Mandrake 9.1 images from the network. I must say I was not impressed. The estimated time to complete the download was 42 hours, at about 10KB/sec. Normally I can download at around 135KB/sec. Worse, my client was uploading at around 20KB/sec! On my assymetrical connection, uploads degrade downloads. Is this something BitTorrent does not account for? The client should limit uploads to at most, say, 10% of downloads. Otherwise there is little reason for the user to participate. Certainly if uploads were going to be 200% of downloads I would never use it.

Also their FAQ desperately needs a "How do run this farking program?" section. /usr/bin/btdownloadprefetched.py is not obvious.

25 Sep 2002 (updated 25 Sep 2002 at 18:58 UTC) »

Someone mailed me to request my spam filter system, so I packaged it up slightly with some command line arguments and documentation. You may download it here:


I added a --gram-length=n option, so you can play with that dimension of the system.


I recommend you try Workrave, to help keep your hands useful in later life.

9 older entries...


jwb certified others as follows:

  • jwb certified jwb as Apprentice
  • jwb certified pavlov as Journeyer
  • jwb certified ask as Master

Others have certified jwb as follows:

  • jwb certified jwb as Apprentice
  • mdorman certified jwb as Journeyer
  • cwinters certified jwb as Journeyer
  • ask certified jwb as Journeyer
  • image certified jwb as Journeyer
  • nixnut certified jwb as Apprentice
  • highgeek certified jwb as Journeyer
  • fxn certified jwb as Master
  • sdodji certified jwb as Journeyer
  • ncm certified jwb as Apprentice
  • wardv certified jwb as Journeyer
  • mascot certified jwb as Master

[ Certification disabled because you're not logged in. ]

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!

Share this page