Older blog entries for pipeman (starting at number 25)

PieSpy hacking

So I hacked PieSpy to feed it with data from the Lunarstorm community instread of IRC. I wrote a brief description in Swedish about it all. The main story is that I made a big-ass PieSpy social network graph out of guestbook data from Lunarstorm, and the current result can be viewed here, and a friend used Zoomify to throw together a Flash zoomification of the same image, which can be viewed here. Pretty cool. In hacking termes, most of my work went into changing PieSpy to draw a labeled grid onto the image and append an index to the image, containing an alphabetical list of all the nodes with pointers to each node's coordinates on the image. A smaller example can be viewed here. A package with source and required libraries is available here (it's the PieSpy distribution changed to use my own main class instead of the standard IRC bot thing).

welcome, 2005

I hope this year will prove more satisfying than the last.

Not much has been happening lately, as I yesterday came back home after spending five days in Sälen, skiing, drinking glühwein and in general having a darn good time. I learned the beauty of off-pist skiing - getting away from the public pistes and ski lifts and to the untouched, deep snow was really cool.

For my Swedish audience: yesterday, I reluctantly decided to publish all my diary postings from a certain web community here. Reluctantly, because some of these are really personal and may contain information that I might not want to publish publicly on the web - the diaries on the community are only available to registered members, and you can always see who has read your diary, which gives a comforting sense of control when writing about personal things. But I wanted to have somewhere to post the things that aren't of interest to non-swedes and the techie audience of Advogato, and didn't want to end up with three diares/blogthingies. To export all my diaries from the community, I made a small command-line version of my program Nular that uses basic screen-scraping techniques to retreive the data, and hacked a function that allows a user to save all diary entries to a directory. Then, I created a cron job that does this once a day so I can continue posting on the community as usual and all posts will show up on my web page as well (and complemented with a script that I can easily trigger manually when I want to force an update). The program and cron job runs on my coLinux Gentoo installation on my Win2K box at home, and uses rsync to transfer the data my real web server, which in turn contains a simple JSP page that reads the file structure to generate indexes, display the data and so on.

A few days ago, I was experimenting around with compiling static binaries with gcj to see if I could easily distribute programs such as md5i in native form to people who don't have or don't want Java. It turned out that the executable size was somewhere around 4 MB - for a Java program which has a bytecode of a few kilobytes. It would be nice to be able to tell GCJ to only link those classes that are referenced in the code - and perhaps be able to supplement that list if you use dynamic class references in the code. For example, I use the Sun MD5 implementation by invoking MessageDigest.getInstance("MD5"), so the compiler/linker would need a hint as to which classes I actually need as it is not explicitly reference syntactically. But one should be able to easily produce a list of needed classes by running the program and passing -verboce:class to the JVM. For all I know, gcj may already do this, and my small program really depends on over four megabytes of runtime class code. Anyway, another thing I'd like is to be able to suppress the WARNING: could not properly read security provider files when the program is run in machines without a GCJ installation, as in most cases, the standard GNU security provider will work fine.

Merry Christmas & a Happy New Year

Better late than never.

If you are in Sweden, please consider sending an SMS with the text "ASIEN" to 72105. This is a very easy way of contributing 20 SEK to the Red Cross Sweden's aid work in the areas affected by the earthquake and tsunami. Or give to Swedish Médecines Sans Frontieres, even though their donation pages require Internet Explorer. Much help is needed.

A hack

md5i - yet another MD5 file integrity checker.

"This Java program creates an MD5 database of all files in a directory structure, and then allows you to easily recheck the contents of the directory and notify you if any files has changed. It does not (currently) detect removed files. It only computes MD5 sum if the "last modified" timestamp of the file has changed since the last run, unless the -nomtime option is used."

More blog hacking

I just finished a personal hackfest in which I integrated all posts from the Advogato diary, my first blog-hack and my now off-line SnipSnap blog, which means that the archive on my homepage now covers all by posts since february 2003. I won't give out all the gory details (they are way too gory for public inspection this time), but I ended up reading the raw file structure created by SnipSnap as well as my own blog page, and creating a generic BlogEntry interface that old and new code was adapted to, so I could treat the different blog sources somewhat equally. The SnipSnap entries contains lots of SnipSnap-specific tags, some of which still aren't handled at all since I grew tired of trying to construct regexps for each and every one. On the bright side, the SnipSnap file format was very straight-forward, and they even use standard Java Properties files for the metadata, which I could simply pass to Properties.load() to extract the data I needed. The actual postings were plain-text UTF-8 files.

Re: How can I trust Firefox? or: Let the flamefest begin

No, I won't really take part in this flameparty re code signing (since most of it misses the target anyway), but I just thought I'd mention my own limited experience with trojans and spyware: the only times I've encountered trojans or spyware in the real world (that is, outside my geek universe), they have all been delivered as signed executables. Generally, up pops a window telling me that the code is signed, and who has signed it, followed by an arbitrary string provided by the signee that usually says something like install this cool software now to utilize all the really hot functions on this web site.

So, away went the credibility of code signing?

I do recognize the need for code authenticity, of course. It's just that when told so, grandma will always click "Yes", "Install" or whatever seems the most productive option at the time. "No" and "cancel" are actually scarier - even if they are the default. And by the way - what kind of notion of a "default choice" does the average home user have anyway? Do they really distinct between the default and non-default button in a dialog like this? I recall the good old days when I was young and had enough time to skip studies and instead spend it on compiling my Linux kernel, and when running make config, the last line of every configuration description was something like If unsure, select 'N' or 'N' should be a safe bet. I actually followed that advice when I really was unsure. Internet Explorer, in this example, never give such a straight-forward direction - instead, it tries to explain the domains of code signing and "trusted publishers". Ol' granma will think "of course I trust this publisher! my grandson told me to click on this link!" even if the link her grandson told her to click on was twenty clicks away.

Also, if IE actually had made it clear to its users that they really shouldn't run unsigned code, a lot less people would have tried Firefox (and the poster is right that there is a point in signing the Firefox installable anyway). And that would have been a bad thing. I don't have a good solution for this whole thing, in general. Many people believe in Trusted Computing and it really gives me the shivers. But something has to be done, I guess.

17 Dec 2004 (updated 17 Dec 2004 at 13:41 UTC) »
Climate changes

Pontus points me to an article (Swedish) in the Land magazine, a member's publication of Federation of Swedish Farmers, which briefly mentions the rapid movement of the climate zones due to climate changes. This presents a very real and immediate problem for the wood industry - a quite important part of Sweden's economy. The article roughly translates to:

Climate changes demands new tree species (mon 13 dec 2004)

The climate zones are moving north with 0.5 to 1 meter per hour, according to the Swedish Environmental Protection Agency, which will have major consequenses for the forestry.

It is too late to stop the climate changes, writes Lars-Erik Liljelund, EPA's director general in a debate article in the newspaper Göteborgs-Posten. Measures should have been put in place 50 years ago. In Scandinavia, the climate zones are moving north with four to nine kilometers each year. The society must now adjust to the ongoing climate changes. For the forestry, this amounts to a major change, warns Lars-Erik Liljelund.

More heat resistant species must successively be put into use to ensure the re-growth after the cut timber. Likewise, the director general means that the forestry should consider the use of new tree species to spread the risks. He also calls for research about forestry maintenance models that are adjusted for a warmer climate with winters that are warmer and have more precipitation.

International negotiations about the global work with reducing the green house gas exhausts has been under way in Buenos Aires for a while and are finished December 17th.

coLinux

A comment on Slashdot pointed me to coLinux; sort of like User-mode Linux, but running on Windows instead. Very interesting. I just installed it and downloaded a Gentoo root filesystem image supposedly customised for coLinux. Since there are still many (more or less valid) reasons that I want to keep running Windows on my desktop machine, this looks very appealing. This way I won't have to wait for Windows ports of all the server software I want to run (Apache, Subversion...). I'll post again when I have something up and running.

More playing around with Advogato XML-RPC

I wanted a more bloggy way of accessing my Advogato posts, so once again I turned to the XML-RPC services of Advogato to integrate my homepage with the Advogato diary. Of course, after a bit of hacking it eventually went entirely out of hand. As it turned out, I hacked a simple Java class called ALog which provides convenient access to the Advogato diary services, but also caches the data locally so that I don't have to go query the Advogato server each time a user views my home page. I did this before, but now I actually cache all diary entries and their metadata. If more than an hour has passed, the ALog queries the "diary.len" function on Advogato to see if there has been any new entries, and if so, it downloads it. An index of all diary entries is stored on disk, so that it can easily be fetched to, for example, populate a calendar or a post archive. I found this simple calendar that fulfilled my needs pretty well, and after a new rounds of DateFormatting, sorting and nicotine intoxication, I actually had JSP code that produced something resembling a blog, all based on the data on the Advogato server.

I once again got bitten (I think!) by Windows and Linux differences in Java. I wanted to make writes to the cache (which is file based) atomic, so on all places that I wrote to disk, I first wrote all data to a temporary file in the same directory, and when done, used renameTo() to atomically overwrite the old file with the new one. This seems to work well on Linux, but I'm using Windows on my workstation, where it didn't work at all. So I resorted to deleting the target file just before the rename, which creates a tiny race condition - but for I doubt I'll have to worry about those few milliseconds when it comes to my home page. :-)

I think the ugliest hack in this adventure, however, was the code to sort and count the monthly entries. I just wanted to get something done, so I created a TreeMap, wherein I put an diary count as value and as key a year-month string in the form "yyyy-mm". Then, I created a Comparator for the TreeMap that splits the key string and parses the two halves and compares first the year and then the month. This way, I can use entrySet().iterator() and retrieve an Iterator of Map.Entry objects containing each month and it's post count in a comfortable order. Of course, I had to parse the "yyyy-mm" into a time stamp that could be properly printed, too.

There are many other stupid things with the code (such as that the entire diary metadata index is read at each request), but I won't have to worry about them until the number of diaries is huge and/or I have thousands of visitors to my homepage. After all, the index file is only 272 bytes at the moment (with 16 bytes per post).

SMHI followup
(again, only interesting for my Swedish audience - sorry, I should find another place to post non-english stuff in the future, I guess)

Jag fick just följande svar från SMHIs verksjurist, angående mailet jag skickade och nämnde här samt i en kommentar på Gnuheter för en dryg vecka sedan. Intressant läsning.

16 older entries...

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!