Older blog entries for slamb (starting at number 11)

4 Dec 2003 (updated 4 Dec 2003 at 00:32 UTC) »
dwragg

I use debuggers when my program does something really nasty without telling me anything. I start up the debugger and check one of three things:

  • Which pointer is NULL when I get a SIGSEGV.
  • Which pointer is out of allowed memory when I get a SIGSEGV. This happens to me with C/ObjC code, not C++. In C++, I use boost::smart_ptr<> and similar patterns that ensure this never happens.
  • Where a weird exception came from (C++).

So in all three cases, I'm talking about a "backtrace" and "print varname", maybe a "catch throw". These are questions high-level languages (Java, Perl) answer for me. So to me, the debugger is an essential tool in C-based languages. I think Java and Perl come with debuggers, but I've never used them.

The article you linked to is talking about single-stepping through programs, which is a whole different deal. I agree that it's probably a waste of time.

Their alternative is to stick in printf()s or similar. I tend to use a more formal debug log with message levels, so I can leave the statements in and enable/disable them quickly on a per-class basis. It saves some time sticking them in and pulling them out. Otherwise, I do the same thing.

atoms

I've made a lot of progress in my unit testing. I still don't have good unit tests for the BufferedStreamFilter class, but I made good progress on testing the rest of the IO system. I found the secret was breaking the tests into much smaller pieces than I have been. I'll probably go back and apply that to my earlier tests. I'm definitely glad I made the effort - some of these tests found pre-existing bugs, and one found a bug that I introduced after writing the test. I could see how they'll easily stop more regressions in the future.

I can't easily test every error condition. For example, ENFILE, ENOBUFS, or ENOMEM would require using up all of the kernel's resources whenever I run my unit tests, which seems like a bad idea. But I was surprised by how many I could trigger. EMFILE was a matter of finding the number of descriptors in use and using setrusage to limit the maximum to that. setrusage took care of ENOMEM in the Buffer class also. And setsockopt(SO_LINGER) before a close() took care of the RST I needed to test ECONNRESET in read() and write().

The documentation is coming along, too. I love diagrams. They're fun to make, and I think they really help understanding. Case in point: this one. It shows the logic my code to come up with a correct IO class & state when thrown a random descriptor. (Used in open because it's important to distinguish between seekable stuff and not. The socket portion is probably not that useful by itself - how often do you need to do something with a socket descriptor but don't know what it is? - but helps me out in testing.) I think after seeing that there's not a lot of doubt about what this method does or how. And there are tests written for every major outcome, so I know it essentially works on my platform. It will be easy to see if it works or not on new platforms.

Work

Oracle selected me for a customer service survey. I had fun with my response, even though they probably won't really read it:

The little details that greatly improve usability don't seem to be valued very highly at Oracle, and I've found the support websites to be no exception.

A concrete example: this survey itself - with a wide window, it's hard to match up the descriptions with the pulldowns; I imagine people frequently mark the wrong ones. There are a couple ways to completely solve this problem, and both would take very little effort:

- Put the pulldowns immediately to the left of the left-aligned text, so there's no space between.

- Use alternating colors to guide the eye.

In my limited experience, it seems to me that reports of similar problems in the software being supported aren't taken very seriously, though overall they make it significantly harder for our users to take advantage of the software.

One incident really cemented this for me. I'd taken a while to track down what had caused weird behavior my users had observed. It was something they had done much earlier, with no error given at the time. When I told the support staff what it was, they essentially said "don't do that". Relaying that message to my users doesn't make me too happy, especially without a way of at least reinforcing it by an error dialog or similar in the software at the time of the mistake.

In general, I'd like to see poor human/computer interface design treated as a serious bug.

It really seems to me that Oracle's software is among the least usable I've ever seen. In fact, I've become much more interested in developing good user interfaces after my frustration with theirs.

26 Nov 2003 (updated 26 Nov 2003 at 20:48 UTC) »
atoms

I've been playing with doxygen. I've learned - again - that doxygen is really cool. My API docs are really shiny now. The diagrams are my favorite part. Inheritance diagrams, include diagrams, collaboration diagrams, call tree diagrams. They're all really cool. And it has a source browser, which I think is really the best way to get familiar with someone's code. The hyperlinks and cross-references make things much easier.

I made some other documentation for my code, too. Starting a user manual. I even made a diagram of my own for the section that explains Unix I/O concepts. I don't know what's gotten into me.

I've also spent some time working on my buffered I/O classes. I feel like I'm reinventing the wheel here, but I haven't been able to find a wheel that I like. In particular, I don't think the standard C++ iostreams classes are usable with non-blocking IO. I wish I didn't have to, because the buffering for seekable streams is surprisingly hard to get right. Then append mode is another fun complication. It's too bad I haven't even been able to find documentation on this.

I did find something interesting in the Java classes, though. They support setting a mark on an input stream and snapping back to it later. I had no idea why they'd want to do that, but it solves a problem of mine. Imagine you've got code like this:

shared_ptr<BufferedStream> s;
...
try {
    *s >> foo >> bar >> baz;
} catch (IOBlockException e) {
    // try again later
    return;
}

How do you know which of these were successfully read and which were not? You can't. I had said don't chain the read operators in this situation. But here's a better solution.

shared_ptr<BufferedStream> s;
...
try {
    BufferedStream::CompletionMon cm(s);
    *s >> foo >> bar >> baz;
} catch (IOBlockException e) {
    // try again later
    return;
}

Where BufferedStream::CompletionMon is a class that sets a mark and resets it if there's an IOBlockException. It clears the mark on normal exit of the block. So you know that you can just retry all of the read operations and the information will be in the buffer (still or again).

Of course, this is another thing I have to implement and test. My unit tests for these classes are already really bad. I'm going to have to spend some time learning how to write decent unit tests for this sort of thing.

Update: I just discovered that Java also has the java.nio.Buffer class which supports this mark idea and has a more bulked-out interface. Maybe that's what I need: a buffer class with more encapsulation which I can write separate unit tests for. And per-primitive type views which will be much slicker with C++ templates if I decide to implement them. As well as a mmap()-based subclass which is a good idea I might implement later.

tmorgan

Cool, nice to know other people here are using scons for their build system.

I was curious and looked through your socket code. Too simplistic for what I want to do (asynchronous IO), but seeing your BOOST_STATIC_ASSERTs with traits set off a template-learning spree. I have a few like things in my code now. (One in a very similar place - IO for fundamental types. I excluded floats, though; I don't think those have a chance of being sent across the network portably.)

I gave you a certification. I don't think one from me is worth much, but every bit counts.

atoms

After some frustrating debugging, I decided it's time to look into a better debugging technique. I stumbled on libcwd, which does some nice things like logging channels, demangling C++ symbol names, etc.

But the channels aren't quite what I want - not granular enough. I like Jakarta Commons logging in Java, which lets you change the log level for everything, for org.slamb, for org.slamb.atoms, and for individual classes (org.slamb.atoms.SelectPoller). It tries the broader things if it can't find the narrower things. And it loads it all from a configuration file; no recompiling.

I also don't like that libcwd talks about pairing commands. I'd rather have the monitor/guard pattern do the work for me. Both in temporarily tweaking the log channel's settings and in actual log output. (Lines like "Entering block XXX" / "Exiting block XXX".)

My code will eventually defer to libcwd where available for demangling symbol names in stack traces and such. That's code libcwd seems to do well and duplicating it would be a bad move. But that's for later; for now, I just want basic logging.

Unfortunately, my code doesn't work. The phrase of the day is static initialization dependency. There's a map<std::string,DebugLogger::Level DebugLogger::levels loaded from the configuration file. DebugLogger::DebugLogger() uses this to determine the logger's minimum logging level. Each class that logs has a statically-allocated DebugLogger. They're spread out across translation units. So some of them want to load before the map's initialized, and the program crashes. That link describes this same problem and seems to have a solution. But I can't think right now how to translate it to my specific case. Maybe it'll come to me...

scons is definitely what I was looking for in a build tool. It does a lot of the things I had hoped to see:

  • A clean, terse, well-documented unified syntax (Python's). Before, I had to deal with a lot more - the portable subset of sh after being interpreted by m4 with a bunch of autoconf/automake macros and automake->autoconf->make with the portable subset of sh thrown in. This just bites the bullet and says it's not portable to where Python isn't. In practical terms, this probably makes it a lot easier to port a bunch of scons-based projects - you just need to make Python run and you don't need to worry about minor variations in Python, because it's a lot more standardized (and only has two implementations, AFAIK).
  • Automatic dependency scanning in C/C++ files. This is hard to actually get right (much harder than most people realize). scons seems to do it right,
  • Correct behavior with multiple directories. Also hard to get right, and they have. They let you include SConscript files (like sub-makefiles) in subdirectories. Those are interpreted with paths relative to the subdirectory for convenience (you can refer to the top of the tree with '#' if you so desire). They're incorporated into the whole project's DAG when run for correct dependency analysis.
  • Built-in rules for building executables and libraries (both static and shared), as well as for a lot of other things. And the ability to create new rules for building things they couldn't anticipate.
  • It realizes that compiler flags are a dependency, too.
  • Easy support for keeping build files in another directory. I think I'll use this to have directories like build-ARCH-MODE (where MODE is debug or optimize). So then I can have one tree to build for several platforms (like on SourceForge's compile farm or Compaq's test drive).
  • They're even more paranoid than I am. They take MD5 checksums of all dependencies if the dates are within a (customizable) range. This compensates for clock drift when using network filesystems, which I guess is a big problem with make for some people. And I can turn it off for speed, because it certainly doesn't apply on my home system.

It is immature, though. There are some problems and it's missing some features I would like to see:

  • There's no built-in support for building a config.h based on the results of tests. So for now, I'm appending all the HAVE_XXX defines to the command line, but it's getting fugly.
  • It's not smart about rpath. Tests fail because it links in libraries it doesn't find at run-time. For now, I have a -Wl,--rpath=/usr/local/lib hardcoded in, but that will have to change.
  • There are a lot of tests automake/autoconf have built in that scons doesn't. Like finding the correct flags/libraries to build with pthreads. Or determining the platform's endianness. scons has pretty much the same framework as autoconf for building tests (checking for headers/libraries/symbols, running test programs, etc.) but it doesn't have many canned tests yet.
  • There's no built-in rule to make a Mac OS X framework. Mostly this is just some copying to a set directory structure, but the install_name thing is annoying, so it'd be nice to have it do it for me.
  • The shared library builder is simplistic. In addition to the rpath problems, it doesn't let you specify the version, doesn't let you specify the type (OS X has different types for loading and linking against), etc.
  • There doesn't seem to be much help for figuring out compiler flags. I've got it enabling -Wall if CXX is g++, but I know it can be called something else on other systems. (This is another thing autoconf has a test for; determining if the C compiler is gcc.) That'd be good enough; warnings only need to be enabled on the compiler I use. Basic optimization/debugging flags, on the other hand, I'd like to be set for whatever compiler I run across, just by saying mode=optimize or mode=debug or something. If I want anything more fancy like -funroll-loops, I suppose I can figure that out for each compiler on my own.
  • The documentation isn't all there. The SConf (configure-like) stuff is so new that it's only mentioned in the user manual once and never described. The manual page gave me enough to write the tests I have, though.

I'm pretty confident a lot of these will go away with time, though. And even with these problems, it's easier and more correct than automake/autoconf/libtool/m4/autom4te/make/sh/foo/bar/baz ever was.

You can check out my scons files in progress here - SConstruct (main file; mine has all the configure-like tests in it), the (really simple) library SConscript, and the tests SConscript. My old make-style files are gone (unless you look in the Subversion history), but they were much more complicated and not as correct.

haruspex:

You're absolutely right; it is possible to use make non-recursively. My atoms build system does that now. One part of my frustration is that automake isn't really set up to do that, though. Their only mention of that paper is this node with a little footnote to the effect of "We haven't actually tried it in a real system". So I feel like I'm fitting a square peg into a round hole to extend my non-recursive build system to their stuff.

My other frustration with this is that I need to do so much of the work myself, with so many different tools to master. There's no template for doing things correctly, and even if there were, it would be a really complicated one. I'd like one tool that handles dependencies in the optimal way for me, handles an entire DAG described in separate files without tricks, makes it possible to easily compile/run little tests like autoconf is meant to do, knows how to build shared libraries, still makes it possible to enable compiler-specific flags (-Wall), etc. Something that takes care of all the common stuff but still has some flexibility.

berend:

Even the docs for the newest/shiniest autoconf aren't that great. They'll tell you how to make a system in the autoconf/automake/whatever preferred manner, but it's just not that good. I think those systems just don't work well and require too much blood, sweat, and tears to maintain; people put up with them out of inertia. That "Recursive make considered harmful" paper really illustrates this. Bob Ippolito pointed me at scons, a Python-based build system. I'll be looking into it, I think. Maybe in the time since I last looked, a usable alternative has arrived.

I saw your comment about the human eye's resolution. There is a limit. Essentially the human eye has to detect the momentum (direction; mass and speed are known) and position of a photon to see. The position/momentum uncertainty principle sets a limit on that. I think you can derive from that this law which tells you the minimum resolvable angle given a lens diameter. So if you look up the human eye's lens diameter and make some assumptions about how far people sit from their display, you can come up with a maximum usable resolution. I haven't read raph's paper, but I bet that's what he did.

14 Nov 2003 (updated 14 Nov 2003 at 07:29 UTC) »

It's been a while since I've posted here, and a lot has happened.

Adium

I did end up joining the Adium project, and I have been working on it. I haven't touched the iChat plugin, though - plans changed.

First, I wrote a crude Jabber plugin. I stole the acid library from the people who made the nitro client for OS X. It was really easy to work with - they've got a nice class structure, and they use a cool trick to parse the XML. They've got some factory code that returns different subclasses of XMlNode based on some xpath queries, so it just pumps out JabberMessage objects with very little code. It's really slick. This was fun code to write, and the acid/nitro people were pleasant to work with.

But...that's not what we've been working on lately. Christian Hammond has put a lot of work into separating gaim into libgaim and gtk-gaim, so he can make a qpe-gaim for handheld devices. We stole that code and a few of us have been writing a plugin around it. It's further along. It can handle sending/receiving, buddy icons, away messages (through a horrible kludge), idle times, etc. This is all for Oscar, but most of the code is in a generic class we can subclass for other protocols as well. And gaim supports a lot of protocols.

One little snag was that the Cocoa and glib event loops don't really fit together. Qpe-Gaim and Adium use a 10-ms repeating timer to poll for glib events. That drains my poor laptop's battery life. So I sent a patch to the gaim people that generalizes the event handling through a callback structure. With an adapter, it fits into Cocoa. I discovered NSFileHandle can't indicate write availability, so I wrote this class to have more flexible select() stuff from Cocoa. Anyone else need to test for IO availability? It's general code. I've seen a few requests for this on mailing lists, so maybe I'll see if I can talk the OmniFoundation people into including it or something. Update: Bob Ippolito pointed out that CFSocket can now do write notifications. So I'll likely be converting the Adium code to use that. But at least I got something out of it - it was good practice to write that sort of event loop code, since atoms will need a write of that sort of thing. (Its asynchronous notification code really blows now. I'm looking into libevent or liboop but will likely end up writing something myself.)

Not sure if I'll keep the separate Jabber plugin around or not. gaim has Jabber support, but Jabber is Not Like All The Others. Even with subclassing, it might end up being less code / more seamless to go with acid for that.

We recently discovered the leaks tool for OS X. So in the last couple days Adium has gone from hemorrhaging memory badly to being pretty tight. There are a few small leaks still reported, but I'm wondering if they're Panther bugs.

Thanks to whoever mentioned headerdoc when I asked about API docs before (sorry, I forgot the screen name). Haven't used it much, but I still intend to.

Strange little incident: my ISP pointed out to me that last month my server used 7GiB more bandwidth than usual. The culprit turned out to be a program a friend had installed on his account that does the Adium commit statistics. Apparently it's really inefficient - he was running it hourly, so it must have pulled about 10MiB per run from the SourceForge servers. So I suspended it and we've been looking into alternatives. I wanted to rsync it and run it on a local copy of the repository, but unfortunately SourceForge just has day-old tarballs of the repositories. So we're either going to uncompress the tarball on their server daily - using about 20 MiB of our 100 MiB allocated - and rsync it, or just reduce it to running once per week, which is a more reasonable ~40MiB/month of bandwidth.

atoms

Recently I resurrected my atoms project and have been playing with it. I used to have some really complex, slow code to handle signals correctly. (Which is a harder problem than most people realize - their code just doesn't work in all cases.) I rethought what I'm trying to do and realize the brand-new C++-safe cancellation support in Fedora Core 1 does most of what I want, and is being standardized. So I ripped out a lot of code, and I'm much happier with it now. This is really limiting my portability - the only other platform I've found where this test passes is a Tru64 machine I access through Compaq Test Drive. But it will improve. Until it does, some features of my library will just only work on a select few platforms.

The build system for atoms is getting frustrating. I want to use automake's dependency generation code and libtool, but I don't want to use all of automake. It's just such a mess - make didn't do it, so they added shell scripts, then they added m4 / a whole bunch of macros (autoconf), then they added autom4te (wrapper around m4), then they added automake (some Perl, more autom4te) and libtool. There's so much code there, and it's so convoluted. And it doesn't even work well - their current best practices suck. (They've talked about implementing the ideas in that paper, but so far it's just "in theory, you should be able to write a correct Makefile with automake...if you figure out how, let us know" in the documentation (paraphrased)). Plus, most of it doesn't apply to me - I'm writing a build system for modern C++ code. If a platform doesn't support ANSI C prototypes, it's sure as hell not gonna run my code. I've looked into jam, but I don't think it's quite what I want, either. I'm tempted to write a fresh system in Python, but I know I shouldn't. I have a bad habit of starting one project, finding something it depends on that I don't quite like, working on that for a while, and never really finishing either. So I'm trying to resist. (How do other people resist this urge?)

xmldb

I looked through my search logs and discovered there does seem to be some interest in my xmldb stuff. So maybe next weekend (after this round of midterms is over), I'll finish up my improved projects pages, package my code, submit it to freshmeat, and see what happens.

Work

Our new, shiny redesigned personnel system is in active use now, and has been for a while. It's great - it actually gives people friendly error messages (There's already someone with that SSN) or, if all else fails, pops up a bug report dialog. (Which fills in some fields from the form name, error ID, trigger block/item, etc, so we get more useful information than "My Oracle is broken.") And the schema is much more pleasant to work with. It's been a success, even if there were more initial bugs than we'd hoped for. I'm not even really working on it anymore - my friend/roommate/coworker is doing most of the maintenance on it now.

Well, now that I have my blatant project/rating whoring out of the way, it's time for a more normal diary entry.

Adium

Today I decided that I'd put up with my frustrations with Adium's iChat plugin for too long. Adium's a great IM program, but I'm tracking current CVS and using the experimental plugin to connect to AIM via the iChat framework. It's a buggy proof-of-concept plugin that I put up with because the normal (TOC2) one doesn't support reading away messages (it's just not in the dumbed-down TOC2 protocol at all). I want to finally get rid of some of these annoying bugs. So now I'm going through the usual steps to acquaint myself with a bunch of someone else's code.

I'm always nervous when I look at other people's class structure, because it seems most people aren't as picky as I am, and I tend to get frustrated when things aren't pretty clean conceptually. I've started out before with good intentions of doing some specific work on a project, started redesigning larger chunks, and then quit because it was more work than I was willing/able to put in, even if I could find a diplomatic way to convince the author my changes are an improvement. I've wondered more than once if the author understood OOP at all.

In this case, though, I seem to be in luck. From what I see so far, Adium's structure is solid. I always make sure I understand exactly what a single instance of a given class represents before I use it. In Adium's case, I can make reasonable guesses for most of them just based on the names, which is good. I posted my best guesses to the forum. When I hear back from people, I'll put together a patch adding class-level documentation comments. Hopefully before long this will be out of the way and I can fix the connection bugs I've noticed.

Question for the world: is there a good API documentation tool for objective C? I'm talking about something like doxygen or javadoc. doxygen's todo list mentions objective C, but it apparently hasn't happened yet. My google search finds nothing else interesting.

Still no interest in my projects; I'm kind of disappointed. I've decided to go for a rating, too. If nothing else, the additional links bring more people to this page - people hopefully interested in my work. I think I deserve an Apprentice rating. I consider myself a Journeyer as far as coding is concerned, but I've yet to manage a successful open-source project.

Things I have done for the open source world:

  • contributed bug reports to numerous projects
  • written (small) coding patches to Apache and Subversion and gotten them accepted
  • written some (small) patches to other software (Postfix, for one) which were never accepted into the main distribution
  • written a fair chunk of documentation for Subversion. Hopefully the ideas will make it into the book in some form; it's sitting outside the book in the doc directory now.
  • a couple security audits for some web tools I've looked into
  • written my own projects, but not gone through the whole release process due to lack of interest

Feel free to check these; everything in the list should be pretty easy to dig up on google groups.

2 older entries...

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!