Older blog entries for lukeg (starting at number 27)

I just saw in the paper that Christiania, the big hippie commune and soft-drugs haven in the middle of Copenhagen, was just raided by a hundred police. They put a lot of people in jail for a long time.

The world is less of a cool place now.

17 Aug 2003 (updated 17 Aug 2003 at 18:21 UTC) »
re: Clear and informative error messages

raph: CMU Common Lisp is a system that takes error-reportage with macro-expansion seriously.

In the case of compile-time errors in the results of macro expansion, CMUCL's error reportage gives some very useful information: the actual source expression that expands to erroneous code, some context to say which part of the expansion is in error, and an easy way to see how particular expressions get expanded (macroexpand-1) so that your eyeball can spot the error.

For errors occuring during macro-expansion, you get full source-level debugging, since the macro is a Lisp program.

Now I'm curious about how well it works for debugging runtime errors in macro-expanded code. I mustn't have done much of that.

Another more upbeat observation is how good programs like strace and ethereal are, and the fact that you can understand a lot about a program by looking at how it interacts with other programs. Is this what they call "stratified design"?

2 Aug 2003 (updated 2 Aug 2003 at 20:19 UTC) »

Today I wrote a small ethernet switch using my Lisp networking code. A ways back I broke it into separate "low-level networking library" and "partial TCP/IP stack" pieces, which makes it easier to write other applications like this switch. It's surprising how many useful throw-away applications have popped up at work recently when testing our latest network appliance.

I also added support for PF_PACKET sockets in addition to TAP interfaces. PF_PACKET sockets are better for switchey things because they directly access existing interfaces.

The switch is a toy, but I like it because it works and it fits on one screen. (Your screen mileage may vary.)

Here's the code:

;;; switch.lisp -- a toy ethernet switch built on `netlib'.
;;; (netlib source is at http://www.sourceforge.net/projects/slitch/)
(defpackage :switch
  (:use :common-lisp :netlib)
  (:export :start))
(in-package :switch)
(defvar *ports* nil
  "Array of switch ports (network devices).")
(defvar *fdb* (make-hash-table :test #'equalp)
  "Forwarding database: maps MAC address onto port number.")
(defun start (&rest devices)
  "Start switching packets between DEVICES."
  (setq *ports* (concatenate 'vector devices))
  (loop for device across *ports*
        for port from 0
        do (init-port device port)))
(defun init-port (device port)
  "Initialize DEVICE as an input port (number PORT)."
  (netdev-enable device
                 ;; This function is called when a frame arrives.
                 ;; FRAME is an ethernet frame as an array of bytes.
                 (lambda (frame) (input frame port))))
(defun input (frame input-port)
  "Process a FRAME arriving on INPUT-PORT."
  (multiple-value-bind (source destination) (header-addresses frame)
    (update-fdb source input-port)
    (let ((output-port (where-is destination)))
      (cond ((null output-port)
             (flood frame input-port))
            ((/= output-port input-port)
             (send frame output-port))))))
(defun header-addresses (frame)
  "Return the source and destination addresses from FRAME's ethernet header."
  (with-input-from-frame (stream frame)
    (let ((header (read-ethh stream)))
      (values (ethh-src header) (ethh-dest header)))))
(defun update-fdb (address port)
  "Update the forwarding database: ADDRESS is on PORT."
  (unless (ethernet-multicast-p address)
    (setf (gethash address *fdb*) port)))
(defun where-is (address)
  "Return the port that ADDRESS is on, or NIL if unknown."
  (gethash address *fdb*))
(defun send (frame output-port)
  (netdev-tx (aref *ports* output-port) frame))
(defun flood (frame input-port)
  "Send FRAME to all ports except INPUT-PORT."
  (dotimes (output-port (length *ports*))
    (unless (= output-port input-port)
      (send frame output-port))))
31 Jul 2003 (updated 31 Jul 2003 at 11:10 UTC) »

I've had performance problems with garbage collection before, but this profiler output gave me a great giggle:

  00000030 6173     2.26089     ipt_do_table_Rsmp_b731682a /lib/modules/2.4.18-26.7.xcustom/kernel/net/ipv4/netfilter/ip_tables.o
  c01dc450 11798    4.32107     rt_intern_hash          /logs/vmlinux
  00000000 25777    9.44095     (no symbol)             /isd/isdwss/erts-5.2.b1.oprofile/bin/beam
  c01ddfa0 40883    14.9736     ip_route_input          /logs/vmlinux
  c01d0a60 109002   39.9225     neigh_forced_gc         /logs/vmlinux

Time for a generational collector in Linux? :-)

And I must say, oprofile is an unbelievably cool tool. Who says systems software research is irrelevant?

(P.S., that GC expense is provoked by a flood of IP packets with random source addresses.)

25 Jun 2003 (updated 25 Jun 2003 at 16:02 UTC) »

I did a fun little configuration-hack today. It makes the CD-player next/prev buttons on my Inspiron laptop cycle me between my regular desktop display and the full-screen displays of some VMware machines I run. The main desktop runs in one Sawfish "workspace" (using a 3x3 grid of Viewports), and each VMware runs in its own workspace (also with 9 viewports, but using only one). (Little details make it ugly to run full-screen VMwares just in viewports.)

If you want to do this fun and pointless hack too, there are two steps: binding the buttons as keys, and some Sawfish tweaking.

The CD-player buttons on the Inspiron are nothing special - just keys that generate unmapped keycodes. To bind them to keys in Sawfish, you just need to associate their Keycodes (found in xev) with Keysyms. I browsed through <X11/keysymdef.h> and found a couple of rather appropriate-looking keysyms to add to my .Xmodmap:

    keycode 132 = Next_Virtual_Screen
    keycode 131 = Prev_Virtual_Screen

    Then these can be bound to "Next Workspace" and "Previous Workspace".

    That's most of it done. The only problem is, when you switch workspaces you keep the viewport (position) of the one you jumped from. So when I jump to a VMware workspace, I won't necessarily land in the viewport where VMware is running. This problem is solved by a little sawfish configuration hack:

      ;; When I leave a workspace and then come back to it, I want to end up
      ;; in the same viewport as when I left. I don't know if Sawfish
      ;; supports this behaviour by pure configuration, so..
      (defun remember-workspace-viewport (workspace)
        "Remember which viewport WORKSPACE is looking at."
        (put (workspace-symbol workspace) 'viewport (screen-viewport)))
      (defun restore-workspace-viewport (workspace)
        "Go back to the viewport that we remember WORKSPACE was looking at."
        (let ((viewport (get (workspace-symbol workspace) 'viewport)))
          (when viewport
            (set-screen-viewport (car viewport) (cdr viewport)))))
      (defun workspace-symbol (workspace)
        "Return a symbol to represent WORKSPACE.
      A different symbol is returned for each workspace."
        (intern (format nil "workspace-%S" workspace)))
      (add-hook 'leave-workspace-hook remember-workspace-viewport)
      (add-hook 'enter-workspace-hook restore-workspace-viewport)
11 Mar 2003 (updated 11 Mar 2003 at 18:11 UTC) »
wainstead: In your Emacs Lisp snippets, you could use let instead of setq to get real local variables. The trouble with setq is that it will set an existing variable to the new value, by searching up the stack to find where the variable exists and defaulting to a global if none is found. So, there's a risk that it will interfere with some other variable. This is riskier than it might seem, since even "top-level" interactive commands could find themselves munging local variables in other functions that are making recursive edits.

For example, you could switch to let like this:

(defun sw-list ()
  (let ((buffer (get-buffer "*Ibuffer*")))
    (if (bufferp buffer)
          (switch-to-buffer buffer)
          (ibuffer-update nil))

More details at http://www.delorie.com/gnu/docs/emacs-lisp-intro/emacs-lisp-intro_49.html


mbp: I reckon Erlang's concurrency is the best thing since sliced bread -- and it's very explicit, a long way from the lazy functional languages. I like it because it gives me a very high level programming interface with very lightweight processes, and neatly avoids both the hairiness of shared-everything threading and the inconvenience of explicit state machines.

The unit of concurrency in Erlang is the process, which is the central idea in the language. An Erlang process is more like a Unix process than a thread, but is very lightweight -- creating a process takes only a few microseconds and a few hundred bytes of memory. Erlang processes share no mutable memory or variables and only communicate explicitly, using send and receive primitives. When a process receives messages, it specifies which types of message are acceptable, causing others to be queued in a mailbox -- this gives a very beautiful synchronization-through-communication programming style from CSP. Erlang processes own files and sockets in the same way as Unix processes, which helps to cleanup after errors, and can detect each others' termination (kinda like SIGCHLD) and so on.

The Erlang runtime system itself is a C program implemented, AFAIK, very much like Squid, except that the main loop is an Erlang interpreter instead of a HTTP proxy. If you ported Squid to Erlang, with one Erlang process per connection, I think it would execute in much the same way wrt. performance and system calls (though I'm no expert on Squid or the Erlang runtime system.) Overall you get the efficiency of select/poll/...()-loop state machines plus a programming interface easier than unix processes, which I say is just great :-).

That's the best late-night summary I can come up with :-). I skipped some other interesting bits about Erlang: that all data structures are immutable, the ways for writing robust programs, nicely integrated distributed programming, and the funky stuff borrowed from functional languages.

Here are some nice links for more about Erlang:

  • Open Source Erlang website (apparently some DNS trouble just now), where you can download the book Concurrent Programming in Erlang as a PDF. The book is highly recommended - light and enjoyable reading.
  • Lightweight Languages Workshop #2 homepage, where you can watch a recent talk of Joe Armstrong introducing Erlang (he's one of the inventors.)
  • The Erlang publications page, which I link so that I can give this thought provoking quote, from the original Erlang paper:

      In programming large systems, many small programming errors will be made - we view this as inevitable. Formal systems and exhaustive test procedures are currently not capable of ensuring fault free software for systems of the size and complexity of modern telecomms applications. Given that errors will be made, we are interested in the problem of detecting and handling those errors in such a manner that the system as a whole exhibits satisfactory behaviour in the presence of errors.
23 Dec 2002 (updated 23 Dec 2002 at 17:02 UTC) »
Threads and state machines

So it looks like threads make an interesting topic! pphaneuf's comments and references make a good case that shared-everything threads are hard to use. It's a pity that MichaelCrawford brought up SMP though ;-)

Since consensus is no fun, let me suggest that both threads and state machines have advantages and disadvantages, and that there are other solutions that take the best of both worlds and more. So, bold-assertion-mode-on, and please forgive me my blunders :-)

The state of play, pros and cons

To put things in context, here's what I think are the main good and bad points of shared-memory threads and state machines.

The upside to state machines is that they can be implemented efficiently in languages like C using structs for states and select()-style scheduling, and they avoid micro-level synchronization problems by performing multiple concurrent tasks with a single thread of execution. The main downside is that they are more cumbersome to code because they have to explicitly represent states that would otherwise be implicit in the stack. I'll wave my arms a bit about this below.

The upside to threads is that they easily support blocking operations (like I/O and IPC), which makes a lot of things simpler, and they are easily implemented to take advantage of SMP. The downside of typical implementations is that they are heavy-weight (e.g. because they pre-allocate large private stacks), and that preemption in shared-memory environments makes every line of code susceptible to race conditions. This area is already very well covered by pphaneuf's last post so I'll leave it alone.

Both models still leave you to find a good way for your threads or state-machines to communicate, if they need to.

The good thing is that it's possible to implement light-weight threads with much the same efficiency (and even implementation strategy) as state machines, while still supporting blocking operations and being free of fine-grained race conditions. There are also some elegant and practical high-level models of concurrency and communication (IPC) which make it much easier to write concurrent programs, and which can even take advantage of SMP without the problems of shared-memory threads. These approaches are used in practice to write very nice programs. To avoid lengthiness, I'll do another post later about some types of concurrency and IPC that I really like (unless someone beats me to it), and say why I reckon Erlang is a masterpiece. For now I just want to make an unoriginal observation.

State machines and Continuations

I think that state machine programming has a lot in common with Continuation Passing Style, and that the concept of continuations would be very interesting for people writing event-based programs. The best description of CPS that I know of is in Essentials of Programming Languages by Friedman, Wand, and Haynes. I have the first edition, which I'm told is better than the second.

Continuations are interesting if we think about how to convert an imaginary C program that runs as a thread and uses blocking I/O into an event-driven state machine that only uses non-blocking I/O. (Let's suppose it doesn't interact with other threads, for the moment.) The basic algorithm is that each blocking call in the original program becomes a state in the state machine. Each time the new program reaches its next state, it makes a non-blocking call for I/O and then saves "where I'm up to" in a state structure to pass on for the next event. What goes into the state structure is some explicit representation of the current execution context (continuation) - i.e. the information that the original program has in the stack and program counter, saying where the program is up to and what it has to do next.

After we've done that for each blocking operation, the new program is an asynchronous state machine. Unfortunately, the new program takes more coding because of the explicit representation our execution state. This, I think, is the fundamental downside to C-style state machine programming. For some programs I think explicit state machines are nice and clear, but most of the time I think it's awkward.

The interesting thing is that the above conversion from blocking to non-blocking is really a case of continuation-passing-style conversion (or "CPS-conversion") - maintaining the execution state explicitly and passing it around instead of using the stack. They've figured out a lot of useful things about continuations, like how compilers can do CPS-conversion automatically, and how to represent "first-class" continuations (which let you capture the execution state automatically) efficiently. This is great stuff for concurrent programming.

Fancy languages that support first class continuations, like Scheme and Ruby, don't need to put their state in a struct, they can just use call-with-current-continuation to capture the whole execution context automatically. In fact, in those languages the read() and write() functions could do this themselves and invoke the scheduler (or select()-loop) to convert the whole program from blocking to non-blocking automagically.

Running away, eh?

I'll put my money where my mouth is with some of the things that I think are really great when the festive season allows :-)

New email address:


22 Dec 2002 (updated 22 Dec 2002 at 03:28 UTC) »
    mbp: I really agree on threads vs. state machines. I'm curious about where the idea that shared-everything threading is a Good Thing came from, and particularly why it keeps getting adopted for high-level applications like user interfaces. In particular, I've been digging into state-machine based concurrent programming in Emacs Lisp in the past year and finding it very pleasant - now my mind boggles at the idea of adding traditional threads to an Emacs-like editor (which is often seen on wishlists.)

    I also think that some of the work on building new abstractions for non-sharing explicitly-communicating state machines is really wonderful. I bet you'd enjoy the book Communicating Sequential Processes by C.A.R. Hoare, which starts from bare bones state machines and works out some very elegant (and practical) ways to make them communicate and synchronize.

New Computer

    I got a new laptop yesterday, quite an upgrade from my 3 year old thinkpad: 5x CPU clock speed, 5x memory, 10x disk space at 3x speed, 1600x1200 vs. 1024x768, etc.

    But where's the fun in new computers these days? The hardware all works fine in Linux without trouble shooting, and is even copiously documented; being used to distcc means "make" output doesn't scroll any faster than on the old machine; and with Debian it only takes 10 seconds to install j-random-program when you notice you haven't got it. What used to be a month's distraction only took one evening this time. Maybe the time is ripe for another operating system to take off with a similar rallying cry to the original Linux announcement :-)

18 older entries...

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!