Recent blog entries for nikodemus

29 Jun 2008 »

ELS 2008 report

Thank You, for everyone who worked to make The 1st European Lisp Symposium the roaring success it was.

For me, the symposium started by having dinner with other attendees (and some organizers) on Wednesday. This was obviously the right thing to do, because the whole conference went in the same vein: great people, good food, excellent wine. No presentations on Thurdsay -- but if you ask me, they are not really the thing that makes a conference tick for me.

Thirsday was the first day on the official program. After the initial welcome session, and kibbitzing over coffee people split into two tracks: the work-in-progress track, and birds-of-feather sessions. Not having submitted a WIP paper I did not attend that track, but from what I heard it was definitely worth attending.

  • The BOF track started off with a session on concurrent and distributed processing by Sebastin Gonzlez. Questions of code mobility, identity, and structure of communication between nodes were talked about. Sebastin is working on a distributed and concurrent processing system, and has obviously spent quite some time thinking about the issues. From my perspective the project will be interesting to follow, and it is especially interesting to see what sort of things they will want from the underlying Lisp -- but generally speaking my own interests are on a slightly lower level.
  • Next session was on image processing APIs, chaired by Matthieu Villeneuve -- working towards possibly consolidating Imago and ch-image into a single library. I admit I spent most of the session hacking on SBCL and checking my email, but I gather that those who participated properly found it engaging. Not quite sure were it ended up.
  • Third was the CLIM (or perhaps rather McCLIM) session. I'm sorry to say it did not go as well as I would have hoped. The McCLIM hackers who were prepared for it had prepared to talk about McCLIM internals and future directions with each other, but due to the largely non-cognoscenti audience it turned into an impromptu CLIM tutorial/demonstration, which no-one was prepared for: demo-effects were plentiful, and direction was a bit lacking. Could have been worse, but could have been a lot better. It would be really great if someone could make a screencast to showcase the stuff that should have been in the demos -- but time being in a globally short supply I totally understand if that doesn't happen anytime soon.
  • Lunch in a nearby resteraunt was simply excuisite -- and part of the symposium fee. Quite probably the mellowing effects of good food had a lot to do with the success of the following session... (Actually, maybe lunch was before the CLIM session? I forget.)
  • Final, quite impromptu, session was on a future successor to ASDF. Andreas Fuchs, I, and several other intrested parties had realized this might be a good time to hash some things out face to face, and took the opportunity. Several conclusions were made -- in no particular order: Purely declarative system definitions are good: loading a system definition from a file should not use LOAD, just READ, and then some sort of an EVAL-SYSTEM. System definition extensions required for doing operations on a given system should live in a separate file. Manually managed fine grained dependencies are too brittle, and complete automated dependency management is too hard. Therefore encourage module-level dependencies (with stronger dependecy semantics then ASDF) as a half-way house between purely serial definitions and brittle file-level dependencies. Protocols are good: extending existing operations to new kinds of components is good and simple. Adding new operations for existing components is also good, but trickier. Need to examine several existing ASDF systems and extensions to see what kinds of things are being done with the protocol. Plan-then-execute is nasty: makes a lot of things harder, and doesn't buy very much. No need to globally replace ASDF as long as the "future successor" can depend on ASDF systems (and as long as we can extend ASDF to depend on the new kinds of systems.) No project plans were made, but after everyone has had a bit of time to reflect on this, maybe something will happen.

Syndicated 2008-05-27 14:31:36 from Nikodemus Siivola

5 May 2008 »

Quick Notes

The only valid measurement of code quality: WTFM

Getting Git series will continue later this week, but in the meanwhile I would like to bring to your attention an oft-forgotten output operator in Common Lisp: WRITE. It is perfect for both REPL and code in many cases, since you don't need to bind printer control variables around it -- just pass the ones you care about as keywords. Similarly, you don't have to worry about it munging variables your callers may care about.

Presenting ESRAP 0.1. It is a simple packrat parser for Common Lisp. It's been almost a year since I wrote it, and it seems unlikely that I'll work more on it in near future. In its current state it is neither particularly optimized or polished, nor does it have a great deal of fancy features, but it did what I needed it to do at the time, and I figured someone else might find it a more useful starting point for their own needs then CL-PEG. The feature list reads:

  • dynamic redefinition of nonterminals
  • inline grammars
  • semantic predicates
  • simple introspective facilities

Examples:

(parse '(or "foo" "bar") "foo") ⇒ "foo", NIL

(add-rule 'foo+ (make-instance 'rule 
                 :expression '(+ "foo"))) 
  ⇒ FOO+

(parse 'foo+ "foofoofoo") 
  ⇒ ("foo" "foo" "foo"), NIL

(add-rule 'decimal
           (make-instance 'rule
            :expression '(+ (or "0" "1" "2" "3" 
                                "4" "5" "6" "7" 
                                "8" "9"))
            :transform 
            (lambda (list) 
              (parse-integer (format nil "~{~A~}" 
                                     list)))))
  ⇒ DECIMAL

(parse '(oddp decimal) "123") ⇒ 123, NIL

(handler-case
    (parse '(oddp decimal) "124")
  (error (e)
    (format t "~&oops: ~A~%" e))) ⇒ NIL
; output
oops: Expression (ODDP DECIMAL) failed at 0.

(parse 'foo+ "foofoofoobar" :junk-allowed t) 
  ⇒ ("foo" "foo" "foo"), 9

(parse '(evenp decimal) "123" :junk-allowed t) 
  ⇒ NIL, 0

(add-rule 'foos-or-decimal
          (make-instance 'rule
           :expression '(or foo+ decimal)))
  ⇒ FOOS-OR-DECIMAL

(describe-grammar 'foos-or-decimal) ⇒ NIL
; output
Grammar FOOS-OR-DECIMAL:
   FOOS-OR-DECIMAL <- (OR FOO+ DECIMAL)
   FOO+    <- (+ "foo")
   DECIMAL <- (+ (OR "0" "1" "2" "3" "4" "5" "6" "7" "8" "9"))

Existence of bugs is guaranteed. Licence is zero-clause MIT.

Syndicated 2008-02-05 14:57:29 from Nikodemus Siivola

5 May 2008 »

Getting Git, part 3

Note: updated to correct a logical error. When development converges, a child will have multiple parents -- not vice-versa. Kudos to Johannes Grødem for sharp eyes and a heads-up.

Intermission</b>

I've learned that some people are reading this and wondering if they really need to know all this to use Git?.

No.

If all you're using Git for is "edit-diff-commit, edit-diff-commit" cycle, you don't. This seems to be 99% of what many people use a VCS for, and there is nothing wrong with that. You can even go quite a bit beyond that, and still you don't need to know anything about what's really going on -- just follow the a simple recipe, and you're good to go.

It's when you move beyond the recipe level that you need to understand the model the VCS uses, just like you need to with CVS, Subversion, Darcs, or any other VCS. If you don't think that's true, you've either internalized the model without noticing, or you're just hammering out recipes.

Specifically, this series is written to teach enough of the Git model to be able to look at a bunch of disparate branches you need to merge somehow, figure out the kind of history you want to build, and then do that. No recipe in the world is going to to this for the general case: you need to know what is going on before you can decide what you want.

What's In A Commit

So, where were we? Ah, storing history. If you haven't read it yet, here's where you can read the story so far.

Commits are the third kind of object stored in the object database. You could call commit "a moment in history", but let's see exactly what it contains:

  • Tree: the entire contents of the directory tree associated with the commit.

  • Parent(s): a commit has one or more parent commits. A parent commit is the "previous" commit: the changes introduced by a commit can be seen by comparing the trees of the commit and its parent(s).

    The common case is a single parent: this represents normal linear development.

    Multiple parents represent converging lines of development. A commit with multiple parents is called a merge commit. Two parents is the norm, but multiple branches can be merged with a single commit.

  • Comment: some text describing the commit.

  • Committer: the person who actually created the commit, and the date this was done.

  • Author: the person responsible for the change represented by the commit, and the date. Often the author and the committer are the same, but when eg. submitting patches by email Git automatically preserves information about the original author.

Commits form a DAG via parents. When development diverges, multiple children will share that same parent. When development converges, a single child will have multiple parents. If this is not clear, get a piece of paper, and draw a few dags -- it's more effective then any fancy graphic I might cook up.p>So, if you have hold of a commit object, you have hold of the entire history up to that point -- but you don't know anything about the future. In other words: History is the DAG rooted at any given commit.

<s>Back to our regularly scheduled Erlang envy.</s> Review: What is a commit object? Can multiple commits refer to a single tree; if so, what does it mean; if not, why not? Can multiple commits share parents; if so, what does it mean; if not, why not? If the tip of a branch is a commit, can you guess what the history of the branch is?

Next time: tagsoup.

Also, here's some moral support for me: Git is the next Unix says apenwarr. I may not agree with the metaphor, but it's a nice read:

Git was originally not a version control system; it was designed to be the infrastructure so that someone else could build one on top. And they did; nowadays there are more than 100 git-* commands installed along with git. It's scary and confusing and weird, but what that means is git is a platform. It's a new set of nouns and verbs that we never had before. Having new nouns and verbs means we can invent entirely new things that we previously couldn't do.

Syndicated 2008-02-03 06:18:20 from Nikodemus Siivola

5 May 2008 »

Getting Git, part 2

Read part 1 first.

At the core of Git is the object database. It is not an implementation detail, but a fundamental part of the whole. Don't ignore it, and don't be scared of it. You don't have to use it directly, but knowing the basics makes life a lot easier.

So:

The object database lives somewhere under .git/ in each and every repository clone -- nevermind where exactly.

The object database is garbage collected: the only way to delete an object is to remove all roots that point to it, and let the GC reclaim it. Roots also live under .git/, but are distinct from the database.

Since content is immutable, there is no way to mutate anything in the database -- you can only add new objects.

Now, there are four kinds of objects in the database. Today we will cover just two of them -- the lower level, if you will. This is content at its most content-seeming:

Blobs are binary content. They don't contain any pointers. Blobs are used to store file contents. Not file names, etc -- just contents. If you have files x/foo.txt and y/bar.txt, which both contain just the string "foobar", then in the object database there will be a blob that stores the string "foobar" -- representing the content of both files. Remember: content is identity.

Trees are lists of entries. The entries represent other objects in the database: for each entry the tree stores the object type, the pointer/SHA1 for the object, the object name, and the mode (the executable bit, really.) A single tree object represents a single directory with its files and subdirectories; in normal circumstances a tree willl only contain blob and tree entries. Again, content is identity: if you have to two directories containing identically named files with identical contents and executable bits, both will be represented by the same tree object.

Consider directories and the file here: x/y/z.txt

We have:

  • Blob-object B<sub>z</sub> for the contents of z.txt.
  • Tree-object T<sub>y</sub> for y/, containing the name z.txt, and a pointer to B<sub>z</sub>.
  • Tree-object T<sub>x</sub> for x/, containing the name y, and a pointer to T<sub>y</sub>.

Now, if we change the contents of z.txt, and commit the new content to the object database -- what happens to the object graph as a whole?

Remember: Content is identity, and not just for blobs, but all objects. If you change the contents of a file, you need a new tree object containing a pointer to the new blob, etc. This is a really important bit, so make sure you understand this: any pointer into the object database is a unique identifier for the whole object graph reachable from that point.

So, you will have:

  • Blob-object B<sub>z2</sub> for the new contents of z.txt.
  • Tree-object T<sub>y2</sub> for y/, containing the name z.txt, and a pointer to B<sub>z2</sub>.
  • Tree-object T<sub>x2</sub> for x/, containing the name y, and a pointer to T<sub>y2</sub>.

The old versions are still there: content is immutable -- as long as GC hasn't reclaimed them, we can get at them.

Now, as long as you remember that a tree object represents the whole state of the whole directory structure under it, including file contents, you can forget about blobs. Just think of trees, and you will be fine.

Review: <s>Why did the hacker cross the road?</s> Where does Git store content? How are files and directories stored? Can you mutate stored content; if so, how; if not, why not? Can you deleted stored content; if so, how; if not, why not? What does a tree object represent?

Next time: how history is content, and how it is stored.

Syndicated 2008-01-27 13:10:42 from Nikodemus Siivola

5 May 2008 »

Getting Git, part 1

Many people -- me included -- find Git really nice, but equally many seem somewhat confused by it. I think this is due to Git being conceptually different compared to other VCS we're used to: unless you have an at least mostly correct theory of Git your expectations based on experiences with other systems will lead you astray.

This is the first part in a series of posts that tries to address this issue, by providing the aforementioned theory. In this part I talk about general concepts: the terms I use are not Git terms; I'm trying to start you thinking in a Git-compatible way as opposed to whatever CVS and others have taught you over the years. Details and real terms will start in the next episode.

Ready? Buckle up!

Git stores content, not metadata. Some of the data stored by Git may very well describe some other data also stored by Git, but it is all content to Git.

Git stores content, not changes. It can reason about changes in content, but only content is stored.

Content is identity. If two "things" have identical content, they are the same thing for Git.

Content is immutable. If you think you are mutating something stored by Git, think again: what you're doing is making an altered copy and possibly throwing away the original. Since content is identity, you cannot mutate an object and have it retain its identity.

That's it.

I hope this was simple enough, but if you've had trouble understanding Git before, please review the points above a time or two to make sure you understand what I'm saying. You don't yet need to understand how the points above relate to Git -- just try to grok das Ding an sich. When you think you have it, ask yourself:

<s>Conan! What is best in life?</s> What does Git store? How would you describe this stuff that Git stores?

Next time we'll pop open the hood and see what's inside.

Syndicated 2008-01-24 08:28:30 from Nikodemus Siivola

5 May 2008 »

This makes me sad

A short while ago, I wrote a piece parodying a few common memes oft seen in programming blogs:

  • Primarily, valuing terseness over readability and long-term maintainability. Terseness and conciseness do not imply expressiveness. While it is reasonable to expect for the reader to be familiar with the language used and various common idioms, it is not reasonable to make abstractions that are larger then the problems they solve.
  • Secondarily, the tendency to use trivial operations over small sets to demonstrate benefits of a given approach for non-trivial operations over large sets. Granted, blogs are too small a margin for many things, but we should all pay more attention to this.
  • Finally: "the utter clarity that functional programming delivers". Just like code that depends on intricate side-effects is hard to understand, code that depends on intricate flow of values can be hard to understand. Saying "It's Functional!" is not an excuse for impenetrable code: sometimes your personal aesthetics and clarity of expression are at odds, and you must find a balance that does not unduly stress future generations that must live with the code we write today.

In other words, the arrow macros I presented were intentionally horrible. If you thought they were neat, think again.

I hope the fact that my parody elicited more responses then anything I've ever written before -- some liking, others disliking, but all taking it seriously -- is more indicative of my meager skills as a humorist, then it is of the state of the programming blogs.

*sigh* -- I thought disclaimers are for wussies.

Syndicated 2008-01-22 14:29:17 from Nikodemus Siivola

5 May 2008 »

Experiments in Fun[ctional] Programming

WARNING: May Contain Trace Elements of Parody

I mentioned the possibility to use <- instead of #L a while ago. Then I got to talking with Troels Henriksen of the Climacs (which is Emacs of tomorrow -- like Arc is to Common Lisp, except that it actually exists!) and McCLIM fame about related matters. Like so many others, I've always been intrigued by the utter clarity functional programming delivers -- on thing led to another, and now I'd like to introduce you to my three new best friends: <-, <---, and --->.

The core idea is to be able to express lazy and partial evaluation succintly -- because, as we all know, terseness is a virtue: code that is never written is the best code. Naysayers will claim that overly terse code can be hard to understand, but as long as there is less of it, that really doesn't count, does it?

Firstly, we have the idea of "streams" or "pipes": functions called repeatedly to produce different values: such a function can easily express eg. any infinite series. Of course, for this to to be of any utility, we need a utility to do this calling in a terse-yet-expressive manner. This is what ---> does:

(defmacro ---> (size composition &rest arguments)
  (let ((express (gensym "EXPRESS"))
        (unexpressed (gensym "UNEXPRESSED"))
        (expressed (gensym "EXPRESSED"))
        (expression-arguments (gensym "EXPRESSION-ARGUMENTS")))
    `(let ((,expression-arguments (list ,@arguments)))
       (labels ((,express (,expressed ,unexpressed)
                  (if (zerop ,unexpressed)
                      ,expressed
                      (,express (append ,expressed (list (apply ,composition ,expression-arguments))) 
                                (- ,unexpressed 1)))))
         (,express nil ,size)))))

This macro expands into an elegant recursion that constructs a list by repeatedly calling a function with the results of evaluating arguments. The last bit is the key to its power:

;;; no need for this
(let ((stream (make-stream))) (---> *n* (composition) stream))

;;; just this is enough
(---> *n* (composition) (make-stream))

Next, we need tools for writing streams. What we would like to be able to do is:

(defun natural-number-stream () (<--- 0 1+))

Happily, <--- is simple enough:

(defmacro <--- (start step)
  (let ((value (gensym "VALUE")))
    `(let ((,value ,start))
       (lambda () (prog1 ,value (setf ,value (,step ,value)))))))

Finally, we want to be able to express our composition nicely. Say, we would like to put (lambda (x) (- (funcall x)) in a better way.

;;; This would be nice: no extra parens!
(<- -)

Again, easy enough. This is a slight extension to my earlier <-, which deals with the _ implicitly, and automatically handles functions as arguments, so that it can be used with streams like above.

(defmacro <- (operation &rest arguments)
  (let ((processed-arguments (if (member '_ arguments) arguments (append arguments '(_)))))
    `(lambda (_) 
       (setf _ (if (functionp _) (funcall _) _))
       (,operation ,@processed-arguments))))

That's all there is to it! Now we can express mappings over natural numbers in a clear and concise manner:

(---> 3 (<- -) (<--- 0 1+)) ; => (0 -1 -2)

Where are <-- and --> you might ask, naturally enough. I'm reserving them for implementing general purpose lazy evaluation...

Syndicated 2008-01-20 14:07:25 from Nikodemus Siivola

5 May 2008 »

This Calls for Dispute

Eric Normand writes about expressing higher-order functions. His case study is basically:

(mapcar (lambda (x) (* x 2)) list)

One of the expressions this can be given is:

(mapcar (curry #'* 2) list)

and another is

(mapcar #L(* 2 _) list)

I realize that some people are really fond of #L. Fine, it's fine -- use it. Let's just be fair about the comparison...

Problems of #L according to Eric:

  • It's not widely used, so it might be hard for someone to understand at first glance.
  • Can only be used for functions of one argument.

The same for CURRY:

  • Not very common, so it's not readable by all.
  • #' syntax gets in the way.
  • I would not use it since the form is a bit clunky.</lu>

Huh? #' gets in the way and #L does not? How is the CURRY klunky? I just so do not get this. That said, I generally prefer the explicit lambda -- though there are exceptions.

Finally, I really don't understand why to use a reader-macro for this when regular macros work for similar purposes more then fine. Use macros when fuctions won't do, and use reader-macros when macro's won't do is a good rule of the thumb in my books.

(defmacro <- (&body body) `(lambda (_) ,@body))

...with the added benefit that the user can macroexpand the form for instant comprehansion.

Syndicated 2007-12-17 19:50:51 from Nikodemus Siivola

5 May 2008 »

SBCL 12.5 months after 1.0

It is now a bit over a year since SBCL 1.0 was released.

Author: William Harold Newman
Date:   Thu Nov 30 02:36:43 2006 +0000

    1.0:
        release, will be tagged as sbcl_1_0

What's changed since then? I'll list my own favorites here (in reverse order of appearance).

  • Hash-tables with synchronization support over multiple accesses.
  • Compare and swap support on several kinds of places.
  • Support for user-defined subclasses of SPECIALIZER.
  • Code coverage support.
  • XREF support.
  • Support for user-defined subclasses of SEQUENCE.

Not counting optimizations or bugfixes here at all. If I was, I'd make a big deal about how SBCL threads are finally starting to mature. Or list Big-Oh / order of magnitude optimizations. Or rant about the old chestnut, interrupt-safety...

Maybe it's time to start thinking about 1.1...

Syndicated 2007-12-17 18:54:22 from Nikodemus Siivola

5 May 2008 »

In Merry Old England

Last Sunday I moved to Leiceister, UK -- though only for till the end of the year: my partner has an honorary research fellowship for the fall term here, in the De Montfort University.

Last week has been chaotic and rather stressful: it turns out that almost no-one in the UK is happy with taking tenants for less then 6 months; the hotel internet is a bad joke; we both have sore throats and have been feeling a bit feverish. ...but things are settling down. We did find a place to live, and can move in next tuesday, and I have found a decent internet cafe to work from -- essential as our place will not be connected until some time in October.

So, if there is anyone in the area who feels like saying hello, feel free to drop me a line.

Syndicated 2007-09-20 11:21:58 from Nikodemus Siivola

3 older entries...

New Advogato Features

FOAF updates: Trust rankings are now exported, making the data available to other users and websites. An external FOAF URI has been added, allowing users to link to an additional FOAF file.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!