Name: Nikodemus Siivola
Member since: 2003-04-17 09:27:17
Last Login: 2008-05-06 00:10:48
Homepage: http://random-state.net/
Notes: I finally have access to my Advogato account again after several years... Let's see if I can make this thing get updates from my actual blog.
Quick Notes
The only valid measurement of code quality: WTFM
Getting Git series will continue later this week, but in the meanwhile I would like to bring to your attention an oft-forgotten output operator in Common Lisp: WRITE. It is perfect for both REPL and code in many cases, since you don't need to bind printer control variables around it -- just pass the ones you care about as keywords. Similarly, you don't have to worry about it munging variables your callers may care about.
Presenting ESRAP 0.1. It is a simple packrat parser for Common Lisp. It's been almost a year since I wrote it, and it seems unlikely that I'll work more on it in near future. In its current state it is neither particularly optimized or polished, nor does it have a great deal of fancy features, but it did what I needed it to do at the time, and I figured someone else might find it a more useful starting point for their own needs then CL-PEG. The feature list reads:
Examples:
(parse '(or "foo" "bar") "foo") ⇒ "foo", NIL
(add-rule 'foo+ (make-instance 'rule
:expression '(+ "foo")))
⇒ FOO+
(parse 'foo+ "foofoofoo")
⇒ ("foo" "foo" "foo"), NIL
(add-rule 'decimal
(make-instance 'rule
:expression '(+ (or "0" "1" "2" "3"
"4" "5" "6" "7"
"8" "9"))
:transform
(lambda (list)
(parse-integer (format nil "~{~A~}"
list)))))
⇒ DECIMAL
(parse '(oddp decimal) "123") ⇒ 123, NIL
(handler-case
(parse '(oddp decimal) "124")
(error (e)
(format t "~&oops: ~A~%" e))) ⇒ NIL
; output
oops: Expression (ODDP DECIMAL) failed at 0.
(parse 'foo+ "foofoofoobar" :junk-allowed t)
⇒ ("foo" "foo" "foo"), 9
(parse '(evenp decimal) "123" :junk-allowed t)
⇒ NIL, 0
(add-rule 'foos-or-decimal
(make-instance 'rule
:expression '(or foo+ decimal)))
⇒ FOOS-OR-DECIMAL
(describe-grammar 'foos-or-decimal) ⇒ NIL
; output
Grammar FOOS-OR-DECIMAL:
FOOS-OR-DECIMAL <- (OR FOO+ DECIMAL)
FOO+ <- (+ "foo")
DECIMAL <- (+ (OR "0" "1" "2" "3" "4" "5" "6" "7" "8" "9"))
Existence of bugs is guaranteed. Licence is zero-clause MIT.
Getting Git, part 3
Note: updated to correct a logical error. When development converges, a child will have multiple parents -- not vice-versa. Kudos to Johannes Grødem for sharp eyes and a heads-up.
Intermission</b>
I've learned that some people are reading this and wondering if they really need to know all this to use Git?.
No.
If all you're using Git for is "edit-diff-commit, edit-diff-commit" cycle, you don't. This seems to be 99% of what many people use a VCS for, and there is nothing wrong with that. You can even go quite a bit beyond that, and still you don't need to know anything about what's really going on -- just follow the a simple recipe, and you're good to go.
It's when you move beyond the recipe level that you need to understand the model the VCS uses, just like you need to with CVS, Subversion, Darcs, or any other VCS. If you don't think that's true, you've either internalized the model without noticing, or you're just hammering out recipes.
Specifically, this series is written to teach enough of the Git model to be able to look at a bunch of disparate branches you need to merge somehow, figure out the kind of history you want to build, and then do that. No recipe in the world is going to to this for the general case: you need to know what is going on before you can decide what you want.
What's In A Commit
So, where were we? Ah, storing history. If you haven't read it yet, here's where you can read the story so far.
Commits are the third kind of object stored in the object database. You could call commit "a moment in history", but let's see exactly what it contains:
Tree: the entire contents of the directory tree associated with the commit.
Parent(s): a commit has one or more parent commits. A parent commit is the "previous" commit: the changes introduced by a commit can be seen by comparing the trees of the commit and its parent(s).
The common case is a single parent: this represents normal linear development.
Multiple parents represent converging lines of development. A commit with multiple parents is called a merge commit. Two parents is the norm, but multiple branches can be merged with a single commit.
Comment: some text describing the commit.
Committer: the person who actually created the commit, and the date this was done.
Author: the person responsible for the change represented by the commit, and the date. Often the author and the committer are the same, but when eg. submitting patches by email Git automatically preserves information about the original author.
Commits form a DAG via parents. When development diverges, multiple children will share that same parent. When development converges, a single child will have multiple parents. If this is not clear, get a piece of paper, and draw a few dags -- it's more effective then any fancy graphic I might cook up.p>So, if you have hold of a commit object, you have hold of the entire history up to that point -- but you don't know anything about the future. In other words: History is the DAG rooted at any given commit.
<s>Back to our regularly scheduled Erlang envy.</s> Review: What is a commit object? Can multiple commits refer to a single tree; if so, what does it mean; if not, why not? Can multiple commits share parents; if so, what does it mean; if not, why not? If the tip of a branch is a commit, can you guess what the history of the branch is?
Next time: tagsoup.
Also, here's some moral support for me: Git is the next Unix says apenwarr. I may not agree with the metaphor, but it's a nice read:
Git was originally not a version control system; it was designed to be the infrastructure so that someone else could build one on top. And they did; nowadays there are more than 100 git-* commands installed along with git. It's scary and confusing and weird, but what that means is git is a platform. It's a new set of nouns and verbs that we never had before. Having new nouns and verbs means we can invent entirely new things that we previously couldn't do.
Getting Git, part 2
Read part 1 first.
At the core of Git is the object database. It is not an implementation detail, but a fundamental part of the whole. Don't ignore it, and don't be scared of it. You don't have to use it directly, but knowing the basics makes life a lot easier.
So:
The object database lives somewhere under .git/ in each and every repository clone -- nevermind where exactly.
The object database is garbage collected: the only way to delete an object is to remove all roots that point to it, and let the GC reclaim it. Roots also live under .git/, but are distinct from the database.
Since content is immutable, there is no way to mutate anything in the database -- you can only add new objects.
Now, there are four kinds of objects in the database. Today we will cover just two of them -- the lower level, if you will. This is content at its most content-seeming:
Blobs are binary content. They don't contain any pointers. Blobs are used to store file contents. Not file names, etc -- just contents. If you have files x/foo.txt and y/bar.txt, which both contain just the string "foobar", then in the object database there will be a blob that stores the string "foobar" -- representing the content of both files. Remember: content is identity.
Trees are lists of entries. The entries represent other objects in the database: for each entry the tree stores the object type, the pointer/SHA1 for the object, the object name, and the mode (the executable bit, really.) A single tree object represents a single directory with its files and subdirectories; in normal circumstances a tree willl only contain blob and tree entries. Again, content is identity: if you have to two directories containing identically named files with identical contents and executable bits, both will be represented by the same tree object.
Consider directories and the file here: x/y/z.txt
We have:
Now, if we change the contents of z.txt, and commit the new content to the object database -- what happens to the object graph as a whole?
Remember: Content is identity, and not just for blobs, but all objects. If you change the contents of a file, you need a new tree object containing a pointer to the new blob, etc. This is a really important bit, so make sure you understand this: any pointer into the object database is a unique identifier for the whole object graph reachable from that point.
So, you will have:
The old versions are still there: content is immutable -- as long as GC hasn't reclaimed them, we can get at them.
Now, as long as you remember that a tree object represents the whole state of the whole directory structure under it, including file contents, you can forget about blobs. Just think of trees, and you will be fine.
Review: <s>Why did the hacker cross the road?</s> Where does Git store content? How are files and directories stored? Can you mutate stored content; if so, how; if not, why not? Can you deleted stored content; if so, how; if not, why not? What does a tree object represent?
Next time: how history is content, and how it is stored.
Getting Git, part 1
Many people -- me included -- find Git really nice, but equally many seem somewhat confused by it. I think this is due to Git being conceptually different compared to other VCS we're used to: unless you have an at least mostly correct theory of Git your expectations based on experiences with other systems will lead you astray.
This is the first part in a series of posts that tries to address this issue, by providing the aforementioned theory. In this part I talk about general concepts: the terms I use are not Git terms; I'm trying to start you thinking in a Git-compatible way as opposed to whatever CVS and others have taught you over the years. Details and real terms will start in the next episode.
Ready? Buckle up!
Git stores content, not metadata. Some of the data stored by Git may very well describe some other data also stored by Git, but it is all content to Git.
Git stores content, not changes. It can reason about changes in content, but only content is stored.
Content is identity. If two "things" have identical content, they are the same thing for Git.
Content is immutable. If you think you are mutating something stored by Git, think again: what you're doing is making an altered copy and possibly throwing away the original. Since content is identity, you cannot mutate an object and have it retain its identity.
That's it.
I hope this was simple enough, but if you've had trouble understanding Git before, please review the points above a time or two to make sure you understand what I'm saying. You don't yet need to understand how the points above relate to Git -- just try to grok das Ding an sich. When you think you have it, ask yourself:
<s>Conan! What is best in life?</s> What does Git store? How would you describe this stuff that Git stores?
Next time we'll pop open the hood and see what's inside.
This makes me sad
A short while ago, I wrote a piece parodying a few common memes oft seen in programming blogs:
In other words, the arrow macros I presented were intentionally horrible. If you thought they were neat, think again.
I hope the fact that my parody elicited more responses then anything I've ever written before -- some liking, others disliking, but all taking it seriously -- is more indicative of my meager skills as a humorist, then it is of the state of the programming blogs.
*sigh* -- I thought disclaimers are for wussies.
nikodemus certified others as follows:
Others have certified nikodemus as follows:
[ Certification disabled because you're not logged in. ]
FOAF updates: Trust rankings are now exported, making the data available to other users and websites. An external FOAF URI has been added, allowing users to link to an additional FOAF file.
Keep up with the latest Advogato features by reading the Advogato status blog.
If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!