Older blog entries for mstarch (starting at number 7)

Big improvement in the search engine department :-) I added support in our search engine at work, for pre-expanding the index list words, so that word-expansions can be avoided at runtime. This makes it necessary to load only one index list pr. user query word, just like "normal" search engines. The drawback is that these types of indexes can't be used to make sentence-searches, since the weighted expanded document list doesn't contain word positions. Another drawback is that it makes it impossible to do dynamic updates of the word relations/weights. However, these disadvantages are still insignificant compared to the huge speed improvements achieved on large scale index files. I also added support for using a more compact compression scheme for these preexpanded index lists, which should prove more effective speed-wise, compared to the huffman-encoding approach.

It's my birthday soon. Why does it have to be so difficult to think of birthday wishes? Thankfully I've just discovered "Dead can Dance", so I've added some of their cd's to my wish list. I would also like one of the new IPaq's, but I'm afraid the price tag would scare most gift- givers away ;-) (they look pretty cool though)

It's wonderful how computers provide an easy avenue for fooling yourself to believe that you do something to help others in need. Today I installed the new distributed cancer-cure program, so now my computer can work during idle-time, giving me a better conscience for free :-)

Went to Oslo thursday for a meeting with some guys from Fast Search & Transfer. I guess I could have stayed home though, because the meeting wasn't of a technical nature as anticipated, but it was still interesting to see their office building and to get an impression of "who fast is".

I read about the Microsoft Passport-terms-of-use controversy this week. I must agree, that the terms of use were totally unacceptable. Nice to see that they were changed immediately.

Thought it was time for to write something here, but I don't know what to write. The last month, I've been busy at work as usual, but I've also managed to get myself stressed up by committing myself to do a university project that require me to use approx. 270 hours on it until the middle of june. I don't see how I can make the time, but I guess I'll just have to somehow.

I have been playing a new game called 'Europa Universalis' quite a lot recently (aha, that's where I must find the time!). It's a strategic game that simulates the age of discovery up until early modern time. It's a lot of fun, and is definately the most comphrehensive game of it's type, but it lacks a little in the "creating-a-mood" department.

I took a day off to learn some python. I look forward to using it when I stumple upon the right little project where it makes sense.

7 Mar 2001 (updated 7 Mar 2001 at 20:26 UTC) »

Went to Paris with Unisys, testing the search engine on the ES7000 platform. The guys at the test center were very professional, and we got a lot of stuff benchmarked. The tests showed what I sort of expected, that the engine still needs some performance enhancements. (it also showed a lot of other stuff I won't bore you with (as if anyone is reading this ;-)) The trip was very work-intensive. Unfortunately we worked 13 hours a day, so we never got to see much of Paris, and the only highlight (except for the pleasant people from Unisys I travelled with) was some good restaurants at night. I really would have liked visiting Louvre again however :-(

I havent been doing anything ClanLib-related for months now, but I'm happy to see the amount of commits to the cvs repository that are done - the project is gaining a lot of momentum, and a lot of people are putting time into improving the SDK. It's really awarding to see the work already done being continued by "fresh" people.

I have decided to dedicate this semester to reading some books, because I've really been missing having time to read some good classical litterature. Lately I've read some Virginia Woolf; Miss Dalloway and The Waves. I've also read Steinbecks "Winter of our discontent", and are going to read "The name of the rose" by Umberto Eco - the film was a real classic, and I'm really looking forward to reading the book. When I'm done reading that I'll read "The God of small things" by Arundhati Roy - an impulsive lend on the local library. Maybe a book review will be coming up shortly :-O

I'm looking forward to a new album by Kristin Hersh that supposedly should be in stores in about a week. The first 3 albums she's made, "Hips and Makers", "Strange angels" and "Sky Motel" were all great. I think I like them best in the order written (chronologically), so it will be kind of interesting to see if she continue the road towards "less" good music (IMHO), or if she will "return to the roots" - probably a naive thought, but interesting nonetheless. No matter what she makes I'm sure I'll enjoy it.

I still thoroughly enjoy Advogato as a diary forum. It's nice to feel that you're not alone in writing diary updates, since you can surf around and read other complete strangers stories about what's going on in their daily life - it really takes away much of the distance, and it's funny to get small insights into other peoples lifes.

Just came home from an 8 days skiing vacation in Les Arcs, France. Weather was fine with lots of snow, so all in all it was a pretty good trip. Oh, and now I'm officially a CS bachelor :-) This means I only have 2 years left of my studies. Wonder what I will do when that happens?

Have been reading some more stuff for university exams.

Have been looking at Windows 2000's AWE (Address Windowing Extensions). Now it's possible I don't really understand the underlying implementation of AWE, but it seems to me that applications using AWE can totally undermine any efforts the operating system might have for system stability. With AWE the user process allocates a "window" of virtual memory pages in which allocated physical pages beyond the addressable size of the 32bit virtual address space can be mapped. For instance you can virtually allocate 64k of virtual address space, allocate 16 4k physical pages, and have windows hardware-map the physical pages into the given virtual address space. The problem is, that the index over which physical pages have been allocated for mapping through AWE seem to be kept only in user space. Not only that, but the API makes it necessary to dynamically allocate space to keep physical page numbers on the heap, which are then passed on to windows when different mappings are needed. Now, I havent tried this in a program, but what if this array of page numbers are modified, say by heap damage - won't the effects potentially be "catastrophic" as the msdn documentation suggest?

I suppose getting ready for university exams is as good an excuse as any to write a diary entry ,-)

I have been talking to some guys at Unisys about testing the search engine on their new ES7000 platform. That could be pretty neat, since it can easily be configured for multiple hardware configurations. Of course the machine it self has a starting price tag of $100.000, so we won't be buying it anytime soon I guess. The good thing is that the test if it takes place will be in Paris, which would be nice to see again.

Akamai is starting to show some real promise. They have risen 4 days in a row, and yesterday they even rose when the general trend was a slight loss. There may be hope yet.

Busy at work doing the ground work for the beginning of a major new project. This time a specification has been made (whee), so hopefully it will be a project managers dream, and we will complete it on time. That would be neat. We're making a distributed storage system for the search engine, which will handle document-caching and retrieval, checksum calculations used in duplicate-elimination tests and a simple stream-based filesystem with built-in fault- tolerance. It will also have flexible meta-data support and serve as an abstraction for different input-formats using a plugin parse-system. I can't wait to get started :-)

6 Jan 2001 (updated 6 Jan 2001 at 18:27 UTC) »

Time for another diary entry. Christmas and new year's over, and now it's time to wait 5 months until it becomes summer.

I took some time off from work between christmas and new year, where I didn't do much, went to some family christmas visits and stuff like that. All in all it was a fairly good christmas, although I never got into the christmas spirit :( I spent new years eve at home with some friends, which was quite ok, and started at work the 2. of january. I have now started working 30 hours/week instead of the previous 20, which pretty much just reflects how much I worked previously anyway.

Havent made anything great codingwise since last time - guess I'm in need of an inspiring project to work on. At the moment I primarily work with bug-fixing issues and small uninteresting feature-extensions.

The next thing of interest I'm going to do is to introduce streaming behavior in the search engine that is my primary responsibility at work. Nowadays document lists in the inverted indexes are read in total, when a given list is read. This can result in unpredictable and very spiky memory usage and can also reduce query turn-around time to some extent. By reading document lists in a paged fashion the memory usage becomes much more constant, and query evaluation can be begun as soon as the first pages are read. Today I recompiled an old project I worked with back in 1997 judas. It was an attempt to write a C++ -> bytecode compiler/virtual machine for a game. We made it work to the point where we made a small game using simple stdin input. The whole thing took us 2-3 months to make. It was a stupid project, but worth all the while to write. Recompiling the project and seeing it run this silly little game really was a nostalgic moment :-)

Just bought some more stocks in Akamai Technologies, only to watch them lose value the day after... sigh... I really believe that Akamai could have a bright future some day, but their stock performance has really sucked. Luckily I don't have enough money to actually lose big amounts of money on them, but then again this makes the money I lose seem more significant :-(

Hmph, now I can think of nothing more to write, and nobody cares anyway.

Finally decided to post a diary entry.

I guess I'm mostly doing this to keep a little track of time for myself, since it's doubtful who else are going to read this.

Advogato is potentially a worthful gathering place for the community, and I think it's great that so many busy people take the time to write a few lines about themselves and what they spend their time on. It's nice to have a sanctuary for developers by developers, where the discussions are fairly informed and where hobby programmers can get inspiration for their projects.

-- begin officical diary entry --

Yesterday I saw a new game written with ClanLib, Operation Citadel which looks very promising. It's a turn-based WW2 strategy game, much resembling the old pc game Perfect General. It's wonderful to see some nice graphics and a game that has even got some gameplay (although the game still needs some work, the AI beat me (I know I suck)), so thanks to Byteware Software for doing this :-)

At work, where I'm coding a search engine using fuzzy logic for my company Adaptive Computer Systems (we're looking for a more rememberable name now btw), I've just completed some code I've been working on that implements compression on the index lists containing the search data. The results are very nice. Using simple huffman compression it's possible to compress the index lists to 1/3 size on average (they used to be 8 bytes / entry), with little performance penalty. Since index size is a critical matter in all search engines, this is very good news.

I'm also trying to get into the whole christmas spirit, but a deadline on a project at work that has been carefully placed at 31/12 (is that stupid or what?) is destroying most of my attempts. I hope to get it off my shoulders this week though, so I can take the days off between christmas and new year.

Last week I was on a christmas-lunch (julefrokost) trip to Oslo, Norway (from Copenhagen, Denmark where I live) with Jubii, the biggest danish portal, which we're a daughter company of. We were away from friday to sunday, meaning two days of free bar on the boat :-) (it takes 19 hours to sail from Copenhagen to Oslo). It was much fun, especially since we got to see the famous danish "folk-band", Shubidua. Luckily Oslo wasn't as bitterly cold as I had feared, so I survived. I'm already looking forward to next year .-)

At my studies (I study Computer Science on 5th year in Copenhagen), we're working on an up-hill project called Copenhagen STL, which is an attempt to make a new version of STL borrowing from previous versions, but where we're trying to optimize the STL algorithms cache-efficiency. I'm personally skeptical about whether we can make a version of STL that improves on the previous excellent versions, but I guess you never know till you try. I'm responsible for the allocator class, where we're implementing a full heap- allocator using an improved buddy-system algorithm (the algorithm most heap-allocators use today). Personally I think it's in principle flawed to make a heap-allocator in the STL allocator class - it belongs in the C-library, but it can be defended by the fact that some compilers ship with sub-optimal heap allocators (mcvc 6.0 springs to mind). At the same time, the SGI/STLPort version of STL ships with an allocator that has a fast allocation scheme for small objects. However it performs no attempts to reduce memory fragmentation over time, which penalizes long- running programs. This must be possible to improve upon.

Don't be dissapointed if the next diary entry follows in 6 months :-)

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!