Older blog entries for apm (starting at number 20)

Got new computers at work and at home, nothing too special - Dell 1.6ghz P4's. They came with XP - so far it seems okay. I'm afraid I find it a little unnerving when it does things like set up new hardware without telling me. What was that Niven quote? Something like "any sufficiently advanced technology is indistinguishable from magic". I put 512mb of memory and an 80gb hard drive in my work machine - I'm planning to put VMware on it so I can test Suneido on different versions of Windows (and also run Linux). I still have some issues that only seem to surface on Win98 (which also happens to be the most common version right now, unfortunately). I only put 256mb and a 20gb drive in the home machine. I'd tell you these numbers boggle my mind but that would date me.

I also finally took the plunge and installed a Linksys wireless access point switch to connect my notebook and desktop at home. Only just installed it last night, but it seems to work pretty slick. Finally I can access the internet and print from couch :-) Of course, one of the reasons I prefered to use my notebook at home was because it was faster than the desktop - that situation is now reversed.

I've been pretty frustrated lately because I don't seem to be spending any time on "real" work. I've been trying to hire a programmer and a customer support person and we got a big pile of resumes this time. Even after weeding out the obvious rejects, we still had about 40 interviews to do. Pretty hard to get anything done in between interviews. And of course, switching computers at work and home is also time consuming. Hopefully things will settle down a bit in the next week or two!

Finally got around to taking a look at Eclipse - the general purpose IDE that IBM released open source. Read a bunch of the articles on the site. There are some pretty interesting ideas. Some aspects I liked, some I didn't.

  • Although the Eclipse IDE is general purpose, it's written in Java, and to tailor it or extend it you use Java. Since I'm primarily a C++ programmer this is a bit of a negative.

  • They chose to write their own (yet another) portable gui framework (SWT). I thought Java already had a portable gui? I find their approach very reasonable though. It makes me want to work on a portable gui for Suneido along the same lines.

  • There's a good discussion of gui "resource" "disposal", i.e. how and when to free fonts, etc. and why finalization isn't a good approach.

  • Eclipse has an interesting plugin architecture. It seems relatively simple yet flexible, with attention to performance. Plugins can define new "views", add to existing menus and toolbars, and even insert material into the documentation.

It makes me feel that Suneido's IDE is somewhat limited. I think it would definitely be worthwhile to incorporate some of Eclipse's ideas.

I've had fun the last few days getting Suneido to automatically change the mouse cursor to the hourglass wait cursor whenever it's busy. (This is on Windows.) You might think that would be easy, but it's actually fairly tricky. You can read about it on Codeproject.

I've wanted to do this for quite a while but been putting it off because I fully expected it to be frustrating. (And it was!) My thoughts after spending almost two days on this:

  • Why isn't this built into Windows?

  • Why hasn't anyone else done this? (Or if they have, why haven't they published it.)

  • Documentation sucks. None of the critical information I needed was in the documentation.

  • Thank goodness for news groups. I'm not sure I would have solved this without comp.os.ms- windows.programmer.win32

  • Was it worth spending this much time on such a minor detail?

Despite the frustration (or maybe because of it?) I'm still pretty happy with getting this done. It's a little thing, but in many ways I think a good product is the gestalt of a whole bunch of such "little" things.

It sure feels good when you can make a specific change to some software and get a definite improvement!

Suneido uses a cost based optimizer for its database queries. For each operation (e.g. join, union), it estimates the cost of different strategies. The problem is that if you have a complex query with a lot of operations, then the number of possible combinations of strategies becomes very large - a "combinatorial explosion". So query optimization slows down a lot for large queries.

However, I had a feeling there was a lot of redundant work going on recalculating the same costs over and over in different combinations. If this was the case, then caching the calculated costs for each operation could save a lot of work. So I added a simple cache to each operation class, using a linked list. After calculating a cost for a certain strategy I added it to the cache. When asked for the cost of a strategy I checked in the cache first to see if it had already been calculated. This involved 20 or 30 lines of code in the base class for query operations. All the automated tests still ran - a good sign that I hadn't broken anything.

With some simple counters in the code I found that on the automated tests (mostly fairly simple queries) the cache was eliminating about 2/3 of the cost calculations. Not bad, but the real test would be complex queries. One of the more complex queries in our accounting package was taking about 2 seconds to optimize. With the cache it now took about .1 seconds - 20 times faster!

Not bad for about an hour's work! But the best part was that the structure of the code allowed me to make this change easily and locally without disrupting anything else. Yes, I could've included the caching when I originally wrote the code. (If I'd known at that point it was worthwhile.) But I'm a firm believer in doing simple versions first. If you try to include "everything" in the initial version, you'll never get it done, it'll never work, and you'll still have to change it later.

If you're interested, you can see the actual changes to query.h and query.cpp in CVS on SourceForge

Argh!!! I was in the middle of updating the Suneido website when the server started giving errors and I couldn't finish the update. Now stuff is "broken" and I can't fix it! DataPipe hosts our site, and for the most part they've been pretty good, but it's frustrating in this kind of situation to have to wait for their tech support to get around to looking at the problem. I'm tempted to set up our own server but I'm not sure our internet connection is up to it. We're in a research park so the bandwidth is shared. Can't really afford to get our own dedicated connection. Maybe someday.

I was feeling pretty good about things before this happened. The site has been up for over a year, and I've managed to average a release every month. And so far the releases have all been pretty fair quality, IMO.

Progress is always slower than I'd like, but I don't really feel like working more than the 60 or 70 hours a week I already put into it. Suneido has attracted a few regular small-scale contributors, but so far no one else has really got involved in a major way. Not that I had any naive notion that masses of open source developers would immediately flock to the project. "If you build it, they will come." doesn't really apply! There are a zillion open source projects and so far, obviously, Suneido hasn't convinced anyone it's worth major investments of time. Maybe that'll come, and maybe not. In the meantime, I'll keep plugging away. I just hope our website comes back!

Something that's been on my todo list for Suneido for a long time is cleaning up a few places in the client-server network code where there might be bad interaction with the TCP/IP Nagle algorithm. However, I'd never actually done any testing to see if it was a real problem.

I won't try to explain the Nagle algorithm in detail. I'd recommend "Effective TCP/IP Programmer" by Snader if you're interested. But in short, it's a standard technique used to improve the efficiency of TCP/IP. However, it assumes you alternate between send's and receive's, e.g. send a request, receive a response. If you don't follow this pattern, for example, you do two send's e.g. a header and then data, then the Nagle algorithm can slow you down to 200ms per send/receive, or only 5 "messages" per second - pretty slow. One "fix" is to simply disable Nagle, but then you lose the improvements it brings.

So, last night I fixed the code to combine multiple send's and this morning I did some quick benchmarks. On the client side, the places affected were output, update, and delete. Sure enough, with the old code I was only getting 5 outputs per second (ouch!) With the new code I'm getting 2500 outputs per second, or 500 times faster! Not bad for a few hours work! (NOTE: This only affects client-server operation across the network, not standalone use. Standalone I get about 8000 outputs per second.)

It just goes to show that knowledge is a powerful tool. If I hadn't read about this problem it could have been a long time before I figured it out. Of course, if I had any brains, I'd have made this change a long time ago!

Gave a slide show on our spring Shishapangma expedition. It's always a lot of work pulling a slide show together out of literally thousands of slides. Fun to look at them though. Some good pictures. We had a pretty good turn out and it seemed to go over well.

Still plugging away at Suneido. Activity has been a bit down this week, both downloads and the forum and mailing list. We just passed 2500 registered downloads (lots more unregistered). Had an inquiry from a Brazilian company wanting to put Suneido on their CD. We'd previously had a German computer magazine wanting to include Suneido on a CD but I'm not sure if it ever was.

I've been working on trying to finish Suneido's Version Control system. We've been using it in-house for probably 6 months, but I never seem to find time to finish it and document it. Maybe I'll be able to get some more done on it this weekend.

Should prepare a new release before the end of the month as well. And update CVS on SourceForge, although I don't think anyone's really using CVS yet.

One of our users was talking about deploying an application he'd written with Suneido. That would be a milestone. I don't think anyone has deployed any applications outside of our office. Hope it works well for him.

It seems like I'm spending more and more time doing "support" for Suneido. On the positive side, that means people are actually trying to use it. However, it also means I don't get as much done. A lot of the questions (but not all) would be answered by better documentation. I should be able to use some of my responses as a starting point for some documentation. I'm afraid I can't really get excited about writing documentation. I know it's worthwhile, and if I want Suneido to be successful, it's going to have to be done. But ... I'd rather be programming. I can't even really fantasize about someone else doing it for me, since there aren't many people who know it well enough (yet).

Did get a little bit done today on int64 support for the dll interface. It's still not perfect because Suneido numbers don't cover the full range that int64 does. At some point I should probably extend the range of Suneido numbers. Currently they use 4 "digits" where each digit is 4 decimal digits (0 - 9999), for a total of 16 decimal digits. To cover the full range of int64's I'd have to add one more "digit" to handle 20 decimal digits. Shouldn't be too hard, but unfortunately, some of the code assumes 4 "digits". And, obviously, I don't want to break something as basic as numbers. I do have pretty extensive automated unit tests though, so hopefully they'll catch most problems. It's not a big priority, but one of our users is trying to implement an interface to MySql and it uses 64 bit integers. He also wants float and double support in the dll interface. Shouldn't be too hard to add.

The big things I want to get done for the next release are the version control and the unit testing framework. Both are more or less written and we've been using them in- house for quite a while. But they need some cleaning up, polishing, completing AND documenting. There's that damn documentation again!

Time to bike home. Not even 6pm and it's already getting dark. That's the problem with living so far north. Of course, the long days in the summer are nice. At least it was warm today - the snow we got last week is mostly gone.

It's been a while since I've been on Advogato. I noticed her had put a link on Ward's Wiki to the Suneido project on Advogato. So I followed it and found he had updated it a bit - thanks Helmut!

Suneido has been keeping me busy. I set up Suneido on SourceForge - that was an interesting exercise. I'd been on SourceForge before, but never used it much, let alone set up and administered a project. It was pretty straightforward. There's quite a bit of documentation, some of it pretty good. But there are always holes in the documentation. I had a bit of a struggle setting up the necessary CVS tools on Windows, but finally got it working. A while back one of our contributors, Roberto Artigas Jr., had added simple language translation to Suneido and had supplied translation data for Spanish. So I posted a "job" on SourceForge for other translations. I didn't really expect much to come out of it. I'm afraid most postings asking for help don't produce much. But this time I actually got quite a few responses. I had three people offer to do Italian! The end result is that you can now run the Suneido IDE in Spanish, Italian, French, German, and Russian. Pretty neat.

We had a good trip to Shishapangma in the spring. Turned around 150 meters (of altitude) from the top due to time and weather. So close! But with the snow conditions we only averaged 50 meters per hour so the top was likely another 3 hours away. I think the success rate was pretty low this spring. A few people were making it up, but with extremely long hard summit days. Oh well, we had a good trip, no sickness or injury. We left Kathmandu as the leftists were stirring up trouble, and a few days later most of the royal family was murdered. We were glad we got out when we did. After the climb we spent a couple of weeks on beaches in Thailand. It was nice R&R after 5 weeks on the mountain. It was amazingly easy to keep in touch - there are internet cafes all over Nepal and Thailand. And not only that, but there's even a real Starbucks in Phuket!

It's been a year since we released Suneido open source on the web site. A busy, interesting year. I've learnt a lot. (Which is what it's all about if you ask me.) In some ways we've made a lot of progress, in other ways, we haven't accomplished as much as I would have liked. One of these days, if I can find the time, I'd like to write an article about Suneido's first year. C'est la vie.

A long day yesterday - left the house at 6:30am and got home at 8:30pm - 14 hours. But, hey, when your wife phones and says she won't be home till late, it's a perfect opportunity to get in a few more hours.

I was on a roll anyway. I integrated the new bitmapped memory manager (allocator and garbage collector) and it worked! I've had this finished for a while but I've been too chicken to drop it in. It underlies everything and bugs at this level can be both catastrophic and elusive. Just what you don't need. But it seems solid. Perhaps the XP (extreme programming) techniques (simple incremental development and automated tests) really work. And the performance seems at least as good as the old stuff, and probably better. It should be much more efficient space- wise. Minimum allocation is 8 bytes instead of 16. (And there are a surprising number of these tiny objects - e.g. after startup, about 2000 out of 7000 objects are 8 bytes or less!) And there is *no* overhead on the blocks themselves (i.e. no "header" or "trailer"). The only space overhead is the bitmaps. Each bit in the bitmaps controls 8 bytes of heap space, so each bitmap is 1/64 or roughly 1.5% overhead. There are three bitmaps (one to mark block boundaries, one to use for mark/sweep, and one to designate non-pointer blocks) so the overhead is about 4.5%. Most memory managers have an overhead of 4 or 8 bytes per block. On small blocks (the majority in a system like Suneido) this can amount to as much as 25% space overhead. The old memory manager also used a much smaller set of block sizes, so there was more wastage from having to use larger blocks than strictly necessary. And it also kept separate memory pages for each size, so there was also wastage per size. The new allocator is not page based, it simply allocates consecutive memory locations (very fast) and by its nature, the bitmapped garbage collection automatically coalesces all the free space, greatly reducing fragmentation. All in all it seems like a great approach. (You might think that these days, with memory plentiful, space efficiency is not important. But because of caching and virtual memory, it can have a direct effect on speed.)

The new memory manager also supports "non-pointer" allocations. i.e. when you alloc something like a string, that you know will never contain pointers to other heap memory, you can tell this to the memory manager and then it doesn't need to scan these blocks during garbage collection. I made a few changes in the code so strings, numbers, dates, and compiled code are non-pointer. This resulted in about half of the heap being non-pointer, thereby cutting the garbage collection scanning in half.

"finalization" is also supported. You can register a pointer with the memory manager and when the memory it points to is garbage collected (i.e. no more references to it) instead of being free'd it is added to a queue. Then I added an abstract SuFinalize base class (derived from SuValue) that registers instances when they are created. It has a virtual "finalize" method that derived concrete classes define to release their resources. Then in Suneido's main message loop, if there are no messages waiting it removes values from the queue and calls finalize on them. I modified SuFile to derive from SuFinalize, and renamed it's close method to finalize. Voila, files are now automatically closed if you forget to do it. Of course, there's no guarantee on the timing of finalization. Reference counting can detect unreferenced objects immediately and predictably. But conservative garbage collection is not quite so predictable. Often, spurious pointers to objects will keep them "alive" for some time after they are actually "dead". But at least you have some assurance that resources will not "leak" indefinitely. Now I have to modify Suneido's other "resource holding" values to use this facility - e.g. Transaction, Cursor, Image. Windows "handles" will take a little more work because currently they're just treated as integers. I'll have to define a Handle type and change the dll definitions. But this will be an improvement anyway as it will provide some type safety (i.e. stop you from passing any old number as a handle).

Had a great bike ride to work this morning - it was cold, windy, cloudy, and snowing - brisk, you might say. To understand why I would call that "great", you need to know that I leave in three weeks to lead a mountain climbing expedition to an 8000m mountain in Tibet called Shishapangma. So besides the physical training, I'm working on mentally training myself to enjoy "interesting" conditions :-) I've been so wrapped up in Suneido that I'm feeling a little "separation anxiety". But I need the break and I'm looking forward to a simpler, more physical challenge, with a well defined goal and a clear definition of "success".

Anyway, time to get to work - lots to do!

11 older entries...

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!