Older blog entries for johnw (starting at number 55)

Run times for Hello, World in 2009

Someone recently asked what my issue was regarding the JVM, since at the moment it prevents me from falling too much in love with Clojure – a language with the double-benefits of functional programming, and Lisp syntax and macros.

Well, below is my reason. These may not seem like much time in the scheme of things, but psychologically it builds up on me when I have to run a particular script over and over and over again. I’ve already noticed the pain with Groovy.

Language Running time
C 0.00415675640106
C++ 0.0043337225914
Haskell (compiled) 0.00494946241379
Perl 0.00773874521255
Ruby (1.8.7) 0.00913717746735
Ruby (1.9.1-p0) 0.0196997523308
Python 0.0269904136658
ECL (Common Lisp) 0.126332080364
Java (JDK6) 0.146584188938
Haskell (interpreted) 0.20009740591
Groovy (JDK6) 1.07791568041

If you’d like to generate some of these timings for your own system, I have created a Hello, world project on GitHub.

Syndicated 2009-03-15 16:50:14 (Updated 2009-03-15 19:57:23) from Lost in Technopolis

Hello Haskell, Goodbye Lisp

As some one who has enjoyed the Lisp language (in several flavors) for about 15 years now, I wanted to express some of my reactions at recently discovering Haskell, and why it has supplanted Lisp as the apple of my eye. Perhaps it will encourage others to explore this strange, wonderful world, where it looks like some pretty damn cool ideas are starting to peek over the horizon.

First, let me say that unlike many posts on the Lisp subject, I have nothing negative to report here. It’s not that I haven’t had my share of ups and downs with Lisp, but that if you want to know about those, look around. Most of what other bloggers have to say is dead on, so there’s little need to repeat it here.

I’ll just address some of the cooler aspects of Lisp, and how Haskell compares in response.

Elegant syntax

While many dislike Lisp’s abundant parentheses, I fell in love with them. Perhaps it’s because I spend so much of my time working on compilers, and Lisp programs read like their parse trees. This “code/data equivalence” is beautiful. It makes it trivial to write DSLs, for example, since you all you need to do is model the syntax tree as a series of Lisp data structures, and then evaluate them directly. It removes the need for an intermediate parse-tree representation.

When I first approached Haskell, I was shocked at the amount of syntax I saw. Operators abounded – more even than C – like: ->, =>, ::, $, $!, etc, etc. The more I looked, the more operators there seemed to be, until I began feel as lost as when I read Perl code.

What I didn’t realize is that in Haskell, much of the syntax you see are just special function names. There is very little “true” syntax going on; the rest is built on top of a highly expressive core. Lisp looks clean because nearly all its operators are used like functions. Haskell goes for an “infix optional” style, which allows you to call anything as either prefix or infix, provided you quality the function name correctly:

  (/= (+ 1 2) 4)        ; Lisp reads very logically
((/=) ((+) 1 2) 4)    -- Haskell can look almost identical!
1 ^ 4                 -- this is the infix form of ((^) 1 4)

Nothing can much Lisp’s rigorous purity, but once you see past the sugary veils, Haskell is pretty basic underneath as well. Almost everything, for both languages, boils down to calling functions.

Macros

Another beauty of Lisp is its macro facility. I’ve not seen its like in any other language. Because the forms of code and data are equivalent, Lisps macro are not just text substitution, they allow you to modify code structure at compile-time. It’s like having a compiler construction kit as part of the core language, using types and routines identical to what you use in the runtime environment. Compare this to a language like C++, where, despite the power of its template meta-language, it employs such a radically different set of tools from the core language that even seasoned C++ programmers often have little hope of understanding it.

But why is all this necessary? Why do I need to be able to perform compile-time substitutions with a macro, when I can do the same things at runtime with a function? It comes down to evaluation: Before a function is called in Lisp, each of its arguments must be evaluated to yield a concrete value. In fact, it requires that they be evaluated in order2 before the function is ever called.

Say I wanted to write a function called doif, which evaluates its second argument only if the first argument evaluates to true. In Lisp this requires a macro, because an ordinary function call would evaluate that argument in either case:

  (defun doif (x y) (if x y))       ; WRONG: both x and y have been evaluated already
(defmacro doif (x y) `(if ,x ,y)) ; Right: y is only evaluated if x is true

What about Haskell? Does it have a super-cool macro system too? It turns out it doesn’t need to. In fact, much of the coolness of Haskell is that you get so many things for free, as a result of its design. The lack of needing macros is one of those:

  doif x y = if x then (Just y) else Nothing

Because Haskell never evaluates anything unless you use it, there’s no need to distinguish between macros and functions.

Closures

The next amazing thing Lisp taught me about was closures. Closures are function objects which retain information from the scope they were constructed in. Here’s a trivial example:

  (defun foo (x) (lambda (y) (+ x y)))

(let ((bar (foo 10)))
   (funcall bar 20))
  => 30

In calling foo, I’ve created a function object which adds two numbers: the number that was originally passed to foo, plus whatever number get passed to that closure in turn. Now, I could go on and on about the possibilities of this mechanism, but suffice it to say it can solve some really difficult problems in simple ways. It’s deceptively simple, in fact.

Does Haskell have all this closurey goodness? You bet it does, in spades.

  foo x = (\y -> x + y)        -- here \ means lambda
bar = foo 10
bar 20                       -- arguably cleaner syntax, no?
  => 30

In fact, Haskell even one-ups Lisp by making partial application something as natural to use as an ordinary function call:

  foo = (+)
bar = foo 10
bar 20
  => 30

This code doesn’t just make foo an alias for add, which I could have done in Lisp as well. It says that foo returns a function object expecting two arguments. Then that bar assigns one of those arguments, returning a closure which references the 10 and expects another argument. The final call provides the 20 to this closure and sets up the addition. The fact I’m evaluating it in the interpreter loop causes Haskell to perform the addition and show me the result.

This combination of lazy evaluation with partial application leads to expressive capabilities I’ve frankly never experienced before. Sometimes it causes my head to spin a bit.

Parallelism

One thing about Common Lisp is that it harkens back to a day when computers were much simpler – before multi-threading, and multiple processor machines were both cheap and common. Since it was designed at a time when there was One Processor to Rule them All, it didn’t go to great lengths to consider how its design might effect the needs of parallelism.

Let’s take function argument evaluation, as a simple example. Because a function call in Lisp must evaluate all arguments, in order, function calls cannot be parallelized. Even if the arguments could have been computed in parallel, there’s no way to know for sure that the evaluation of one argument doesn’t cause a side-effect which might interfere with another argument’s evaluation. It forces Lisp’s hand into doing everything in the exact sequence laid down by the programmer.

This isn’t to say that things couldn’t happen on multiple threads, just that Lisp itself can’t decide when it’s appropriate to do so. Parallelizing code in Lisp requires that the programmer explicitly demarcate boundaries between threads, and that he use global locks to avoid out-of-order side-effects.

With Haskell, the whole game is changed. Functions aren’t allowed to have side-effects, and their value is not computed until needed. These two design decisions lead to situations like the following: Say I’ve just called a function and passed it a bunch of arguments which are expensive to compute. None of these operations need to be done in sequence, because none of them depend on the others for their value. If then I do something in my function which needs some of those values, Haskell can start computing the ones it needs in parallel, waiting on the completion of the whole set before returning the final result. This is a decision the language itself can make, as a by-product of its design.

Community

Lastly, the Haskell community is amazing. Newbies, you are welcome here. Their IRC channel is both a friendly and knowledgable place, where newcomers are cherished and developed.

Likewise, the web resources and books I’ve read about Haskell so far have all been top-notch. You get the feeling people are fascinated by the language, and eager to share their joy with others. What a refreshing change. Lisp may have a rich history, but I think Haskell is the one with the future.


  1. http://www.lispworks.com/documentation/HyperSpec/Body/03_ababc.htm ↩

Syndicated 2009-03-14 09:54:32 (Updated 2009-03-14 09:54:43) from Lost in Technopolis

The saga of rebase versus merge

Following on my last entry, where I built a better pre-commit hook to ensure every commit in my Git history passed make check with flying colors, I realized something today about my good friend, rebase.

git rebase is a brilliant tool for editing history you haven’t pushed yet. If I have a set of ten commits, and realize the 3rd commit has an oversight I’d like to smooth out, I can make a new commit to fix the problem and then merge it with that old commit, resulting in a nice, clean set of commits for pushing.

However, using rebase at any time invalidates the work of my pre-commit hook. Why? Because any time I use rebase, I throw away whatever I’ve confirmed in the past about those commits. A rewritten commit is really a new commit, which means it hasn’t been tested, may never have existed as a working tree, and certainly isn’t the same as the previous commit, however similar its diff output may be.

What this goes to show is that immutability is a requirement of sane integration. Not only does code go into a commit, plus a date and a description, but also the work that went into verifying that commit. All of these details are bound up in the immutable state of that commit. If the state is change, and the immutability guarantee broken, all bets are off.

Thus the only way I could use rebase in confidence would be to run the same pre-commit again on every changed commit during the rebase operation – which is something Git doesn’t do. It thinks that if you rewrite a series of commits, the final HEAD will have the same contents as the previous HEAD, which is true. But the rebased commits leading up to that head, especially if their order was changed, now represent a fictitious history behind that new HEAD.

It makes me think more and more about the virtues of merging.

Syndicated 2009-02-26 08:30:32 (Updated 2009-02-26 08:30:49) from Lost in Technopolis

Building a better pre-commit hook for Git

Recently a friend turned me onto an interesting article about a problem I had just recently discovered about Git and its pre-commit hook:

Committing in git with only some changes added to the staging area still results in an “atomic” revision that may never have existed as a working copy and may not work.

As an example of this, I often find myself doing a whole flurry of changes all at once. This is no problem with Git, because I have the wonderful tool magit.el to help sift out the many commits implied by those changes. So I turn one big set of changes into many smaller commits, leaving only pending work in my working tree. Then I push.

My users pull those commits, only to find that lo! and behold, they will not build. “What?”, I think to myself, “how can that be? I just ran the unit tests and everything was fine.” However, I never ran the unit tests against that particular commit. Because those commits I just pushed never existed as independent working trees on my system. In fact, they never existed at all, they were mere figments within the Git index, which Git happily made into immutable commits for me.

What makes this all worse is that the pre-commit hook was something of a lie. I thought that by adding make check to my pre-commit hook, I’d know for sure that every commit I checked in was safe and sound. However, that make check was running against my working tree, not the proposed commit. I still knew nothing about the correctness of what I just checked in, unless it happened to also represent the state of my working tree – a rare occurrence indeed, given Git’s culture’s preference for frequent, smaller commits.

The answer turned out to be a little complex. What I needed was a pre-commit hook that would test the contents of my Git index before each commit, not my working tree. And there happens to be no simple command in Git for “checking out your index”. Even if you do use git checkout-index, it resets the timestamps for every files that it creates, forcing make check to rebuild the entire app each time – not just its most recent changes. Assuming you have a Makefile system that works, such duplication of effort is wholly unnecessary.

I came up with a solution that uses a secondary source tree, to hold the checked out index, and a temporary build tree, which gets updated with any changes since the last time the pre-commit hook was run. The end result is that small changes pass or fail quickly, while large-scale changes sometimes require a full rebuild to confirm.

The script itself can be viewed in my git-script project on GitHub. You will need to tailor it for your own project if you plan to use it, and then copy it to .git/hooks/pre-commit, and enable the executable bit.

Syndicated 2009-02-14 03:23:00 (Updated 2009-02-14 07:36:09) from Lost in Technopolis

Simple class for walking char arrays as istreams

This has probably been written countless times before, but I found myself needing it today and it was quick to write. It lets you read characters from a char array in C++ via the istream interface:

  #include "ptrstream.h"

int main() {
  ptristream in("Hello, world!\n");
  char buf[31];
  in.getline(buf, 32);
  std::cout << buf << std::endl;
}

Handy for if you don’t want std::istringstream needlessly copying character strings.

Below is the implementation, which is in the public domain:

  #include <istream>
#include <streambuf>
#include <stdlib.h>

class ptristream : public std::istream
{
  class ptrinbuf : public std::streambuf
  {
  protected:
    char *      ptr;
    std::size_t len;

  public:
    ptrinbuf(char * _ptr, std::size_t _len) : ptr(_ptr), len(_len) {
      assert(ptr);
      if (*ptr && len == 0)
        len = std::strlen(ptr);

      setg(ptr,               // beginning of putback area
           ptr,               // read position
           ptr+len);          // end position
    }

  protected:
    virtual int_type underflow() {
      // is read position before end of buffer?
      if (gptr() < egptr())
        return traits_type::to_int_type(*gptr());
      else
        return EOF;
    }

    virtual pos_type seekoff(off_type off, ios_base::seekdir way,
                             ios_base::openmode mode =
                             ios_base::in | ios_base::out)
    {
      switch (way) {
      case std::ios::cur:
        setg(ptr, gptr()+off, ptr+len);
        break;
      case std::ios::beg:
        setg(ptr, ptr+off, ptr+len);
        break;
      case std::ios::end:
        setg(ptr, egptr()+off, ptr+len);
        break;

      default:
        assert(false);
        break;
      }
      return pos_type(gptr() - ptr);
    }
  };

protected:
  ptrinbuf buf;
public: 
 ptristream(char * ptr, std::size_t len = 0)
    : std::istream(0), buf(ptr, len) {
    rdbuf(&buf);
  }
};

Syndicated 2009-02-04 05:46:12 (Updated 2009-02-04 07:53:28) from Lost in Technopolis

Ready Lisp version 20090125 now available

There is a new version of Ready Lisp for Mac OS X available. This version is based on SBCL 1.0.24, and requires OS X Leopard 10.5. The only changes in this version are upgrades of many of the dependent packages.

What is Ready Lisp? It’s a binding together of several popular Lisp packages for OS X, including: Aquamacs, SBCL and SLIME. Once downloaded, you’ll have a single application bundle which you can double-click — and find yourself in a fully configured Common Lisp REPL. It’s ideal for OS X users who want to try out Lisp with a minimum of hassle. The download is approximately 124 megabytes.

There is a GnuPG signature for this file in the same directory; append .asc to the above filename to download it. To install my public key onto your keyring, use this command:

  $ gpg --keyserver pgp.mit.edu --recv 0x824715A0

Once installed, you can verify the download using the following command:

  $ gpg --verify ReadyLisp.dmg.asc

For more information, see the Ready Lisp project page.

Syndicated 2009-01-26 03:52:26 (Updated 2009-01-26 04:12:24) from Lost in Technopolis

Unicode support on the cheap

I’d been avoiding adding full Unicode support to Ledger for some time, since both times I tried it ended up in a veritable spaghetti of changes throughout the code, which it seemed would take forever to “prove”. One branch I started used libICU to handle Unicode strings throughout, while an earlier attempted using regular wide-string support in C++. Both were left on the cutting floor.

But last night an idea struck me: Ledger doesn’t care if UTF8 encoded data is passed around. If a user has Cyrillic characters in their data file, and Ledger leaves its encoding alone, then when those same bytes are printed out the user will see exactly what they input. In this case, the best approach is “hands off”. Just pass the user’s data through transparently, and they will see in their output exactly what they input.

Where this fails is when Ledger tries to output elided columnar data, such in the register report. The problem is, there is no way to know the length of a string without determining exactly how many code-points exist in that UTF8 string. And without knowing the length, it’s impossible to get columns to line up, or to know exactly where a string can be cut in two without breaking a multibyte UTF8 character apart.

Anyway, I discovered a cheap solution today which did the job: Convert strings from UTF8 to UTF32 only when individual character lengths matter, and convert them back after that work is done. This took about one hour to implement, but now Ledger is able to justify columns correctly, even when other alphabets are used! It still doesn’t work for right-to-left alphabets, though.

Syndicated 2009-01-23 23:57:54 (Updated 2009-01-23 23:58:10) from Lost in Technopolis

The feature I avoided for half a year

The other day I finally implemented a feature in Ledger which I’d avoided doing for a full half-year. The reason? Every time I thought about it, my brain kept shutting down. It seems my brain doesn’t care for math much, or for mathy problems, so it always seemed as if something better needed doing…

The problem turned out to be a fairly straightforward one, it just required sitting down and mapping it out for a couple of hours before the coding began. Here’s the synopsis:

You have a network of N nodes, each of which can be connected to N-1 other nodes. There can be multiple connections between any two nodes, where each connection has a date — but no two connections between the same nodes can have the same date.

Given a start node, a query date, and a set of target nodes (which may be zero, one or many), find the shortest and youngest path that is not older than the query date, from the start node to each of the target nodes.

Ledger uses this algorithm to record price conversions between commodities, and to later render each commodity into a market value relative to another known commodity. Sometimes such renderings are not possible, or sometimes they require multiple conversion steps before a value can be found.

For example, if I bought 10 shares of AAPL for $30.00, and later exchanged $10.00 for 9.83 CAD, and at one point exchanged 80 EUR for 100 CAD, then how many EUR are my shares of AAPL worth?

Previously Ledger could only render AAPL in terms of dollars, but now it can finally report any commodity in terms of any other, provided there exists a path of traversal between the two nodes which is older than or equal to the query date.

Syndicated 2009-01-22 00:15:19 (Updated 2009-01-22 00:15:33) from Lost in Technopolis

Moving to Movable Type

The blog has now fully moved over to Movable Type, including all past articles and their comments. It took a bit of Perl, Python and mucking with SQL, but now the transfer is complete.

The reason for the move is that the app I was using, RapidWeaver, was beginning to introduce a bit too much inertia to the blogging process. And one thing I know about myself: if something isn’t dead simple, even after months of being away from it, I’ll avoid it forever.

I write these blog posts using ecto now, which couldn’t be easier. There’s no separate publishing step, it’s like writing and sending an e-mail.

I actually liked the way WordPress looks a bit more, but Movable Type supports PostgreSQL, which is what ever other service on this server uses. And for some reason MT’s XML-RPC script doesn’t work with FastCGI and Apache, which is something I guess I can live with.

Syndicated 2009-01-21 00:22:43 (Updated 2009-01-21 00:22:50) from Lost in Technopolis

A day for nostalgia

After tracking it down on a public domain mirror, and installing an emulator on my MacBook Pro, I was able to download and run the first full computer program I ever wrote: “Sector Inspector” for the Apple //e.

I wrote this program in 1989, and took eleven months to write it (seven to code, four to debug). At the time, it was one of the more complete disk editing utilities I’d seen.

It was released as Shareware (for $20), and I made a total of $60 over the course of eight years. This is the experience that turned me to freeware, actually; because I realized that coding for possible, yet unrealized profit was an unlikely aim. It’s better to know that little will come of it ahead of time, which makes it all about the coding.

Sector Inspector was written using the Merlin Prodos Assembler. It took thirteen minutes to assemble on my Apple //e, four using a friend’s hardware accelerator card. In those days I owned a 1Mb expansion card, and would do all of my development there (for the sake of speed), frequently saving to 5.25” floppies.

When finished Sector Inspector printed to 255 pages of assembly code, which was registered with the US copyright office. I tried selling it to three different software companies at the time, but only responded positively — the authors of Merlin, who said they couldn’t publish another title, but offered me a job instead. I didn’t take the job (I don’t know why), and instead released the program as Shareware.

I remember having dreams that IS would make around $10k, and with that money I would buy a color Macintosh IIfx, all the rage at the time. Those dreams never materialized, of course, and shortly afterward the //e was cancelled, Prodos 8 was cancelled, and I started working on UNIX machines. The next year I worked for Network Solutions and used that money to buy a NeXT workstation, and said goodbye to the Apple world for a very long time (until just a few months ago, when I bought a PowerBook G4).

Here is a screenshot of the splash screen on startup:

screenshot of the startup screen

And a screenshot of the main window:

screenshot of the main window

You can download an Apple //e disk image of Sector Inspector, and play around with it, or read the FEATURES document I wrote as a young 17 year old programmer.

Syndicated 2009-01-20 03:36:57 (Updated 2009-01-20 07:39:30) from Lost in Technopolis

46 older entries...

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!