Older blog entries for johnw (starting at number 8)

Using Archiveopteryx on the Mac

The following instructions are for a Mac running OS X 10.4. Your mileage may vary. It’s not much different for running on Linux, which I’ve done too.

Archiveopteryx is quite a wonderful little database store, which holds e-mail in a PostgreSQL database and lets you access it via the IMAP protocol. It’s aimed at long-term storage and high volume.

Why would you want to keep your mail in such a thing? Well, it scales well, for one. I have tens of thousands of e-mails right now — since I don’t like deleting them — and it’s just going to keep growing. Also, using a real database to keep your mail is a solution that will stay practical up into the millions of messages, since it’s not disk space we lack: but consistent and careful organization and indexing. I think every e-mail system I’ve had has corrupted my data at least a few times, mainly because the data had gotten too large for me to “keep clean”. Archiveopteryx, however, uses database constraints and checks to ensure that whatever goes into the mail store is and remains compliant to a standard, RFC822-style structure.

Setup PostgreSQL

To use Archiveopteryx, first you will have to install PostgreSQL. Why PostgreSQL and not MySQL? Well, Archiveopteryx only supports PostgreSQL, for one. This is a Good Thing. MySQL is handy when you need a place to keep data, but it’s not really engineered from the ground up to keep your data sane. Foreign key constraints were only added in 5.0 — and then only if you use InnoDB tables, which have their own issues. PostgreSQL has consistency checking, transactional support, and journaling. It cares about your data more than almost anything else.

The easiest way to run PostgreSQL on your Mac is to install it using MacPorts. If you’re not a MacPorts user yet, let this be your gentle introduction. It’s a very nice way to install free software projects on your Mac.

Once it’s installed, just run these two commands at the Terminal:

sudo port install postgresql82 postgresql82-server
sudo /usr/local/lib/postgresql82/bin/initdb \
/usr/local/var/db/postgresql82/defaultdb
sudo launchctl load -w \
/Library/LaunchDaemons/org.macports.postgresql82-server.plist

Linux users need to do something similar. This is for CentOS/Redhat types:

yum install postgresql postgresql-server
service postgresql start
chkconfig postgresql on

Now PostgreSQL is running, and will be running when you reboot too.

Patch the Archiveopteryx sources

This is for Mac users only. Unfortunately, OS X seems to have a broken implementation of setreguid and setreuid, because they return EPERM even for the super-user. So apply this diff to the sources before you build, by copying the contents to a file, and then running this command from the top level of the Archiveopteryx source tree:

cat patch-file | patch -p0

Here’s that diff:

--- /Users/johnw/dl/archiveopteryx-2.03/server/server.cpp       2007-09-10 09:31:24.000000000 -0400
+++ server/server.cpp 2007-10-10 23:53:38.000000000 -0400
@@ -542,8 +542,16 @@
}
File::setRoot( root );

- if ( setregid( gr->gr_gid, gr->gr_gid ) ) {
- log( "Cannot secure server " + d->name + " since setregid( " +
+ if ( setgid( gr->gr_gid ) ) {
+ log( "Cannot secure server " + d->name + " since setgid( " +
+ fn( gr->gr_gid ) + ", " + fn( gr->gr_gid ) + " ) "
+ "failed with error " + fn( errno ),
+ Log::Disaster );
+ exit( 1 );
+ }
+
+ if ( setegid( gr->gr_gid ) ) {
+ log( "Cannot secure server " + d->name + " since setegid( " +
fn( gr->gr_gid ) + ", " + fn( gr->gr_gid ) + " ) "
"failed with error " + fn( errno ),
Log::Disaster );
@@ -557,8 +565,16 @@
exit( 1 );
}

- if ( setreuid( pw->pw_uid, pw->pw_uid ) ) {
- log( "Cannot secure server " + d->name + " since setreuid( " +
+ if ( setuid( pw->pw_uid ) ) {
+ log( "Cannot secure server " + d->name + " since setuid( " +
+ fn( pw->pw_uid ) + ", " + fn( pw->pw_uid ) + " ) "
+ "failed with error " + fn( errno ),
+ Log::Disaster );
+ exit( 1 );
+ }
+
+ if ( seteuid( pw->pw_uid ) ) {
+ log( "Cannot secure server " + d->name + " since seteuid( " +
fn( pw->pw_uid ) + ", " + fn( pw->pw_uid ) + " ) "
"failed with error " + fn( errno ),
Log::Disaster );

As you perhaps can see, I’m just replacing the calls to setregid and setreuid with calls to: setgid, setegid, setuid and seteuid.

Create an aox user and group, and add yourself to it

Again, for Mac users only, since we lack convenient useradd and groupadd commands. You’ll have to run Netinfo Manager, go into the “users” and “groups” directories, and copy the postgres user to a new aox user. For the uid and gid, pick the next number after the postgres user. Then, in the aox group, create a users property, and add two values: aox and your username. Isn’t Netinfo Manager lovely?

Build and install Archiveopteryx

Now, from the Archiveopteryx sources, you can just run:

sudo make install

After that’s done, run this:

export AOX=/usr/local/archiveopteryx
sudo $AOX/lib/installer

This will create the necessary tables in the PostgreSQL database and get things ready.

Start it running

Once the stage is set, it’s time to put the players in action:

sudo $AOX/bin/aox start

If you see no output from this command, that’s a very good thing. I get a warning about a failure from cryptlib to allocate memory, but it doesn’t seem to cause a problem. And it’s only started happening since I rebooted. Go figure.

Create your first user

Create users and mailboxes in Archiveopteryx is trivial:

sudo $AOX/bin/aox create user myuser pwd my@email.com

This creates a user named “myuser”, with password “pwd”, that will accept mail for “my@email.com”. You can create aliases if you want a user to be able to accept mail for other addresses. When you create the alias, you can even specify which mailbox the mail should be delivered to:

sudo $AOX/bin/aox create alias workbox my@workemail.com

NOTE: Once you’ve logged out and back in again after your Netinfo Manager changes (see above, for Mac users only), you won’t have to use sudo anymore. Here’s a quick way to suck in a big UNIX mailbox:

formail -s $AOX/bin/deliver my@email.com < big.mbox

Configuring fetchmail to deliver to Archiveopteryx

There are many ways to import old e-mail into Archiveopteryx. The simplest way is to just copy it there using an IMAP client. If you’re a sysadmin type, there’s the aoximport command, which understands UNIX mailboxes, Maildirs, etc.

For importing new mail, I use a combination of fetchmail and procmail. If you want to use fetchmail only, use the lmtp and smtphost directives. Archiveopteryx is capable of receiving mail over an LMTP socket, or using the deliver command that comes it.

Configuring procmail to deliver to Archiveopteryx

I like to use procmail to deliver my mail, after suitable massaging and filtering, to eliminate duplicates and catch out special e-mails. Here’s the basic procmail file I use, in entirety:

PATH=<... set this for your system ...>
MAILDIR=$HOME/Mail
LOGFILE=$MAILDIR/Library/Logs/procmail.log
#VERBOSE=yes
DELIVER=/usr/local/archiveopteryx/bin/deliver
MYADDR=my@email.com

######################################################################
#
# Backup the last 32 e-mails
#
######################################################################

:0 c: backup.lock
backup

:0 ic
| cd backup && /bin/rm -f dummy `ls -t msg.* | sed -e 1,32d`

######################################################################
#
# GNUS must have unique message headers, generate one if it isn't
# there. By Joe Hildebrand <hildjj@fuentez.com>
#
######################################################################

:0 fhw
| formail -a Message-Id: -a "Subject: (None)"

######################################################################
#
# Remove messages with duplicate Message-ID's
#
######################################################################

:0 Whc: msgid.lock
| formail -D 32767 msgid.cache

:0 a:
dups

######################################################################
#
# Remove the bogus >From header inserted by formail via fetchmail
#
######################################################################

:0 fhw
| perl -ne 'print unless /^>From johnw/;'

######################################################################
#
# Immediately drop unwanted garbage we can't stop
#
######################################################################

:0:
* <... put your rule here ...>
/dev/null

######################################################################
#
# Separate out mailing list messages
#
######################################################################

:0
* ^TO_<... mailing list address ...>
| $DELIVER -t "Mailing Lists" $MYADDR

######################################################################
#
# Catch out mail notices before checking for SPAM
#
######################################################################

:0
* ^Return-Path:.*apache@myserver.com
| $DELIVER -t Notices $MYADDR

######################################################################
#
# Remove SPAM
#
######################################################################

:0
* <... your SPAM rules here ...>
| $DELIVER -t Junk $MYADDR

######################################################################
#
# Notify via Growl if significant mail comes through
#
######################################################################

SENDER=`formail -rtzxTo:`
SUBJECT=`formail -zx Subject:`

:0 cwir
| growlnotify -a "Mail.app" -n "Mail.app" -t "$SENDER" -m "$SUBJECT"

######################################################################
#
# Split for known targets
#
######################################################################

:0
* ^From:.*<... your work e-mail here ...>
| $DELIVER -t Work my@workemail.com

# All the rest goes into the INBOX

:0
| $DELIVER $MYADDR

Connecting using Apple Mail

You may now connect to your new mail store using Apple Mail. Create an IMAP account on “localhost” with the username and password you told Archiveopteryx. For the SMTP server, also use “localhost”, but without a username or password. Apple Mail is a handy client for creating and deleting mailboxes, and moving mail around. Also, you can have it store a copy of the mail externally from the mail store for the purposes of Spotlight searching. Yes, this more than doubles the among of space your mail takes up on the disk, but the searching and indexing advantages are worth it. And you know that no matter how sketchy Apple Mail can get sometimes with tons and tons of e-mail, the mail kept in your store is there to last a lifetime.

Connecting using Emacs Gnus

In a Gnus group buffer, use the “B” key and enter “nnimap” for the server. Then pick “localhost”, and tell it the username and password you specified. You can now move around to the groups you want to subscribe to, and type “u” to add them to your group buffer. Now just type “g” and it will read the groups and present you with the latest and greatest.

I tend to use Apple Mail as my browser, and Gnus and my reading and writing tool.

Syndicated 2007-10-11 08:19:56 from johnw@newartisans.com

Applescript and UTF-8 arguments

The following tip is based on a hint by mzs found on MacOSXHints.com.

Although the Mac has been a great environment for working with UTF-8 text (8-bit Unicode), I’ve found a few corners where it’s rather difficult to preserve the encoding of my text. One of these is passing UTF-8 arguments to Applescripts on the command-line, using the osascript utility.

To step back for a second: the reason I need UTF-8 support everywhere is that I sometimes work with Persian texts, which use an Arabic alphabet. In general, most Cocoa application display Arabic text fairly well (though a large number of them have no clue when it comes to properly formatting right-to-left text; this means that when I type an exclamation mark, it often appears to the right of my entered text, rather than to the left). But in the non-Cocoa world, which includes Carbon apps and the command-line, UTF-8 is either non-existent or very poor.

For example, as a result of my work in Persian, I have files that both contain Persian text and have Persian filenames. The default setup for the Mac is pretty well suited for handling this at the Cocoa-level of things, such as the Finder, TextEdit, and so on. But on the command-line, things are a bit different. For one, Terminal.app must be reconfigured to properly display Unicode characters. Then, you have to pass the -w flag to /bin/ls to get Unicode bytes in filenames to render correctly.

If you want pass a Persian filename to a script, many programs do not handle it at all. Some work transparently — they pass the encoded bytes right along to the underlying filesystem calls, which works great. But others convert the encoded filenames to their own encoding (usually MacRoman) which completely destroyes UTF-8 characters. osascript is one of these.

If you write an Applescript with an “on run” handler, and call it with osascript, passing a UTF-8 encoded filename, your “on run” handler’s argument list will look nothing like what you passed in. But there is a trick for getting around this limitation. It appears that osascript does not translate data passed in via pipe. We can use this knowledge to trick osascript into reading its argument list in a different way instead of “on run”.

To do this requires making a shell script with two forks. The data fork is a regular shell script whose job is to package the argument list into a string that can be piped directly to osascript. The resource fork is the Applescript itself, compiled to read and unpackage those arguments from the other side of the pipe.

First, the script template, which is always the same:

#!/bin/sh

case $# in
0)
echo "Usage: ${0##*/} file [ file... ]" >&2
exit 1 ;;
esac

{ arg=$1
echo -nE "$arg"
shift

for arg in "$@"; do
echo -ne '\x00'; echo -nE "$arg"
done
} | /usr/bin/osascript -- "$0"

Next, the Applescript template. After this header, refer to your argument list using the argv list:

set argv to do shell script "/bin/cat"

set AppleScript's text item delimiters to ASCII character 0
set argv to argv's text items
set AppleScript's text item delimiters to {""}

-- The rest of your script follows here...

To bind these pieces together, we’ll assume you’ve called the shell script “template.sh”, and your Applescript “myscript.script”. First you need to compile the Applescript:

osacompile -o myscript.scpt -- myscript.script

Then bind the compiled Applescript to the resource fork of the final script:

ditto -rsrc myscript.scpt myscript

Next, copy the shell script template to the data fork of the final script:

cat -- template.sh > myscript

And finally, mark the script executable and delete the byproducts:

chmod 755 myscript
rm myscript.scpt

Now you can run “myscript” and pass it a UTF-8 encoded filename, and the Applescript will see it as a properly encoded string of type “Unicode text”.

Syndicated 2007-10-01 21:16:30 from johnw@newartisans.com

A few remarkable Mac apps

The following is a list of some of the Mac applications I live in every day. They are the precious few I couldn’t live without, and that contribute to my Joy of Mac each day. They are listed roughly in order of affection.

LaunchBar

First on any list is LaunchBar. This little utility single-handedly revolutionizes the way I use my Mac. It is an application launcher, quick file finder, phone dialer, e-mail starter, and more. If there’s something I need my system to do, LaunchBar will usually get me there in less than five keystrokes.

For freeware addicts, there is Quicksilver which is also excellent. In fact, Quicksilver is somewhat better designed than LaunchBar. I had switched to Quicksilver for six months, diving into all its arcane and nifty features. However, as time wore on I gradually used less and less of Quicksilver, until finally their daily feature sets became equal. And LaunchBar can do that subset much faster, and more reliably, than Quicksilver can. I got tired of resetting Quicksilver a minimum of twice a day, dealing with random stalls, missing icon previews, etc. LaunchBar is fast and almost never fails me.

DEVONthink Pro

DEVONthink Pro is an information database. You can drag almost any common type of textual file into it, and it will index it and provide a built-in viewer to view the contents. This makes it easy for me to keep all the PDFs and web pages and text files together that relate to my recent researches. But that’s not where the power ends. This app is too deep to do justice to it in one paragraph, but suffice it to say that in terms of my data, I practically live in DEVONthink Pro, using it to hold and search and correlate almost everything I collect from various places around the Web.

RapidWeaver

RapidWeaver is what I made this website with. It’s easy to use, but very customizable. I need a tool like this because I’ve found that unless something is nearly effortless to do, I can’t “keep up” with it. RapidWeaver is almost entirely responsible for me actually writing new content for this website.

1Passwd

The next runner up for sheer joy of use is 1Passwd, a password manager for most web browsers on the Mac. It can also be used to conveniently store “secret notes” which are viewed using the 1Passwd application. I’m currently using 1Passwd to manage 145 web passwords and 137 secret notes. Even though I have a different, random password for nearly every service I use on the web, 1Passwd makes them all equally effortless. And better: when I come to a form that needs my name, billing address and credit card info, 1Passwd can fill in all those fields with a single keypress. The daily utility of this little app is amazing, considering I first thought it would be superfluous next to Apple’s own Keychain application.

ChronoSync

I’m a backup maniac. At any given time, I like to have about five different backups of my data, here, there and everywhere. ChronoSync has the best interface for backing my data that I’ve found so far. It’s still not my ideal, but it’s fairly easy to use, keeps generational archives of replaced backup files, and has a good status indicator. It’s main downside is that it’s unusably slow when using network volumes, but I’m hoping they get that fixed someday.

Minuteur

Minuteur would win every award I have for just plain cuteness and intelligence of interface on a small scale. It’s just an egg timer, but I use it any time I need an upcoming alarm or countdown timer. You just run the app, type in some numbers, and it starts. Almost all its functionality can be driven by the space bar, return key and the number keys, in very intelligently thought-out ways. After using it for several months, I found it well worth the donation the author asks for.

QuicKeys

QuicKeys is one of the more expensive utilities I use, but given that I use it literally thousands of times a day, I can’t complain. The whole point of money is to translate your efforts into efforts made on your behalf; by that metric, QuicKeys is worth every penny of its hefty price tag. At the moment I have nearly fifty macros active on my system, customizing all kinds of applications that otherwise would require lengthy, repetitive mousing and keyboarding.

SSHKeychain

I use SSH a lot, and SSHKeychain makes password management with SSH completely effortless. Even better, it integrates with Apple’s Keychain, so that if I shut my laptop’s lid, even my SSH keys are locked. And even better, it’s completely free!

Little Snitch

I’m a bit of a security nut, which is why I love Little Snitch, a network monitoring app by the same people who created LaunchBar. Whenever an application tries to talk to the Internet for the first time, Little Snitch gives me the opportunity of allowing or denying the connection. It has been extremely helpful in learning what goes in inside my machine when I connect to the Net. You’d be surprised sometimes. For educational value alone it’s worth running it during its three hour trial period.

Path Finder

Path Finder is a Finder replacement for the Mac that is far nicer to use than what the Mac comes with. I find the default Finder to be pretty unusable, which had forced me to do all my file manipulation in the Terminal, or Emacs. But Path Finder brought me back to a world of efficient mousing and easy to read graphical displays.

VMware Fusion

VMware Fusion is a virtual machine emulator for the Mac. It lets you rather other Intel-based operating systems, like Windows XP, Linux, BSD, Solaris — even OS X itself if you do enough searching on the Web to figure it out. I love virtualization technology, and I use this app every day. Sometimes I have multiple virtual machines running as psuedo-servers so that I can test out changes in client/server type code.

Merlin

Merlin is a project manager in the style of Microsoft Project. Until I found Merlin, I never had a good way of making estimates for clients, keeping on task with those estimates, or of providing regular updates of the current projected date based on work done so far. Merlin, despite its initial complexity, made it truly enjoyable to manage all of this detail in a way that was easy to communicate to my clients.

Syndicated 2007-10-01 00:37:32 from johnw@newartisans.com

OpenSSH connection mastering

I just discovered a very cool feature of SSH today: control mastering. It lets you multiplex a single ssh connection so you don’t have to open multiple TCP connections to the remote host; instead, all your SSH/SCP commands “share” the initial connection. This speeds up subsequent connections to the same host, and also means you don’t have to enter your password more than once for hosts who don’t know your public key yet. I use this feature to implement a script for setting up new remote accounts.

To use control mastering from the command line, first you need to open a connection which will act as the “master”. If you plan on keeping an interactive session open on the remote host, you might do something like this, if you use the wonderful “screen” utility:

ssh -Mt -S /tmp/ssh USER@HOST exec screen -DR

This command creates a master connection to the remote host, and invokes the screen command — causing it to re-establish any previous session that might exist, but telling it to create a new session if none does. I usually make this into a shell script for quickly opening new screen sessions to remote hosts.

Now that I have the master open, I can “piggy back” on the connection to run a quick command on the same host, without having to actually create a new TCP connection:

ssh -S /tmp/ssh USER@HOST ls -l

Now, of course this is a little too verbose to be very useful. But OpenSSH supports some extremely cool commands in your ~/.ssh/config file. Here’s all you need to make of use “opportunistic” connection mastering:

Host *
ControlMaster auto
ControlPath /tmp/%r@%h:%p.sock

What this says is that whenever an SSH connection is made, if there is no master for the connection already, make the new connection into a master. However, if there is a master available, use a channel on the master’s connection instead of initiating a new one. You’ll notice when a connection is a “slave” because it will create much, much faster than a regular master connection. Of course, once the master quits, all of the slaves will be terminated, so be careful if you use this kind of setup!

As another example of how this can be used, here is my “connection prep” script, which I use to setup my basic shell environment on a brand new account for which I only have password-based SSH access. The first thing I want it to do is to install my public-key, and then configure and change my shell to zsh. By using connection mastering, I only need to type my password twice: once for the initial SSH master connection, and a second time for the chsh command.

#!/bin/sh

user=$2
server=$1

if [ "$user" = "" ]; then
user=johnw
fi

ssh -MNf -S /tmp/sshsock.$$ $user@$server

if ! ssh -S /tmp/sshsock.$$ $user@$server test -d .ssh; then
ssh -S /tmp/sshsock.$$ $user@$server mkdir .ssh\; chmod 700 .ssh
fi

if ! ssh -S /tmp/sshsock.$$ $user@$server test -f .ssh/authorized_keys; then
scp -o ControlPath=/tmp/sshsock.$$ \
~/.ssh/id_dsa.pub $user@$server:.ssh/authorized_keys
ssh -S /tmp/sshsock.$$ $user@$server \
chmod 600 $user@$server:.ssh/authorized_keys
fi

scp -p -o ControlPath=/tmp/sshsock.$$ \
~/.screenrc ~/.zshenv ~/.zshrc $user@$server:.
ssh -S /tmp/sshsock.$$ $user@$server ln .zshenv .zlogin
ssh -S /tmp/sshsock.$$ $user@$server chsh -s /bin/zsh $user

ssh -S /tmp/sshsock.$$ -O exit $user@$server

Syndicated 2007-09-26 00:43:55 from johnw@newartisans.com

Stateful directory scanning in Python

The problem to be solved

This article describes how to utilize the stateful directory scanning module I’ve written for Python. But why did I write it? Understanding the problem I aimed to solve will help show why such a module is useful, and give you ideas how you might put it to use for yourself.

The problem I faced was a growing Trash folder on my Macintosh laptop. I’m rather OCD about the stuff in my Trash folder. Every time I see a “full trash-bin” icon on my desktop, I yearn to empty it. It became an obsessive thing. I thought to myself, “Either I should delete everything straight away, or I should never delete it — well, or delete it monthly.” But I knew I would forget to delete it monthly, and then the same problem would nag at my mind: was my Trash over-full?

It seems like a silly thing, but it troubled me. I like having the option of undeleting things, but I also hate endless clutter. It began to feel as though the Macintosh Trash-bin were a poisoning uncle running rampant through my neat, Danish kingdom.

The answer, I realized, is that I should be able to constrain the items in my Trash-bin to a count of days. After X days an item should leave the Trash, silently, on its own, as though it had realized it was time to vacate. But who was going to monitor my Trash, and who would do the cleanup? What about files that need special privileges to be deleted?

There exist applications for the Mac to do this, and also scripts I’ve seen for GNU/Linux. But I wanted a robust, extremely reliable script in Python that also has the facility to be used for other, similar tasks — not just cleaning my Trash. But since cleaning the Trash is something this code does well, the rest of my article will show how to use it to do just that.

Installing the Python module

The trash cleaner script uses a directory scanning module called dirscan.py, which may be obtained from my Downloads section. Once downloaded, put it anywhere along your PYTHONPATH.

Creating a Trash cleaner script

Dirscan’s main entry point is a class named DirScanner. This class is used to keep state information about a directory in memory, so that this information doesn’t need to be reloaded if you choose to perform multiple scans in a single run. The constructor for this class takes many arguments, which are described in a later section. For now, I’ll use an example which shows just a few of them: my Trash cleaner.

from os.path import expanduser
from dirscan import DirScanner, safeRemove

d = DirScanner(directory = expanduser('~/.Trash'),
days = 7,
sudo = True,
depth = 0,
minimalScan = True,
onEntryPastLimit = safeRemove)

Here’s what this object does: It scans my trash directory (~/.Trash) looking for entries older than 7.0 days. It only looks at entries in the top-level, meaning if a directory is older than 7.0 days, its children are removed all at once. Also, the Trash is only scanned if its modtime is newer than the state database, since this accurately reflects whether new objects have been added recently. Lastly, it removes old files using my safeRemove function, which understands the sudo option, and hence can use the sudo command to enforce removal of files needing special privileges.

To make the removals happen, all I have to do now is to scan the directory:

d.scanEntries()

Because of the minimalScan and depth settings, these two lines of code are extremely efficient if no changes to the Trash have actually occurred. If there are old files, the scanner knows just by looking at the state database. As a result, I can safely run this script hourly without worrying about excessive resource consumption when it runs. Even for lots and lots of entries, the state database loads very fast, since I’m using Python’s cPickle module.

That’s all it takes to write a script that keeps your Trash squeaky clean, removing all entries beyond seven days in age! I don’t have any options to delete files beyond a set size, since I couldn’t come up with a reliable algorithm for deciding what should be deleted in that case.

For example, if the limit were 5 GB, and I deleted a file 5.1 GB in size, does that mean I should remove everything else but that file to stay near the target size, or should I delete the 5.1 GB file right away? Either way, it doesn’t give me the safety of having several days to decide whether that or another file shouldn’t have been deleted. So my choice was to prefer time over size, since disk space is not as much at a premium as knowing I have seven days to reverse any hasty decisions.

Running Trash cleaner with launchd

You could run the Trash cleaner as a cron job, or you can be all Macsy and run it as a launchd service. All you have to do in that case is create the following file in your ~/Library/LaunchAgents directory (you may have to create it), under the name com.newartisans.cleanup.plist. Be sure to change any pathnames in the file to match your environment. I’ve assumed here that you’ve created my cleanup script under the name /usr/local/bin/cleanup:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple Computer//DTD PLIST 1.0//EN"
"http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
<key>EnvironmentVariables</key>
<dict>
<key>PYTHONPATH</key>
<string>/usr/local/lib/python</string>
</dict>
<key>Label</key>
<string>com.newartisans.cleanup</string>
<key>LowPriorityIO</key>
<true/>
<key>Nice</key>
<integer>20</integer>
<key>OnDemand</key>
<true/>
<key>Program</key>
<string>/usr/local/bin/cleanup</string>
<key>StartInterval</key>
<integer>3600</integer>
</dict>
</plist>

Turning command-line options into arguments

The dirscan.py module has a handy function for processing command-line arguments into DirScanner constructor options. To use it, just change the script above to the following:

#!/usr/bin/env python
# This is my Trash cleaner script!

import sys
from os.path import expanduser
from dirscan import DirScanner, safeRemove, processOptions

opts = {
'days': 7
}
if len(sys.argv) > 1:
opts = processOptions(sys.argv[1:])

d = DirScanner(directory = expanduser('~/.Trash'),
days = 7,
sudo = True,
depth = 0,
minimalScan = True,
onEntryPastLimit = safeRemove,
**opts)
d.scanEntries()

Now the user can turn on sudo themselves by typing cleanup -s. Or they can watch what the script is doing with cleanup -u, or watch what it would do with cleanup -n -u. Of course, for more sophisticated processing you’ll probably want to write your own options handler, and create your own dictionary to pass to the DirScanner constructor.

Options for DirScanner’s constructor

Here is a run-down of the options which may be passed to the DirScanner constructor

directory
Sets the directory to be scanned. If not set, an exception is thrown.
ages
Setting this boolean to True changes the behavior of the scanner dramatically. Instead of doing its normal scan, it will print out the known age of every item in the directory. This is a really just a debugging option.
atime
If True, use the last access time to determining an entry’s age, rather than the recorded “first time seen”.
check
If True, always scan the contents of the directory to look for changed entries. The default is False, which means that if the modtime of the directory has not changed, it will not be scanned. If you care about entries within the sub-directories of the main directory, definitely set this to True.
database
The name of the state file to use as a database. The default is .files.db, which is kept in the scanned directory itself. It may also be an absolute or relative pathname, if you’d like to keep the scan data separate.
days
The number of days (as an integer or floating point value) after which an entry is considered “old”. What happens to old entries is up to you; what it really means is that the onEntryPastLimit handle is called.
depth
How deep in the hierarchy should DirScanner go to find changes? If depth is 0, only the top-level is scanned. If it is -1, all sub-levels are scanned. If it’s a number, only that many levels are scanned beyond the top-level.
dryrun
If True, no changes will be performed. This option gets passed to your handler, so you can know whether to avoid making changes.
ignoreFiles
A list of filenames which should not be monitored for changes. It defaults to just the name of the state database itself.
minimalScan
If True, and if the directory to be scanned’s modtime is not newer than the state database, it’s assumed that no changes have occurred and no disk scan will be performed. Old entries are still checked for, however, by scanning the dates in the state database.
mtime
If True, use the modtime of directory entries to determined if they are “old”, rather than the recorded “first seen time”.
onEntryAdded
This is a Python callable object, or a string, which is called when new entries are found. If it’s a string, it will be executed as a shell command.
onEntryChanged
Handler called whenever an entry has changed (meaning it’s modtime has changed).
onEntryRemoved
Handler called whenever an entry is removed.
onEntryPastLimit
Handler called whenever an entry is found to be “old”. DirScanner knows of three ages for a file: time since its atime, time since it modtime, and time since the first time DirScanner saw it (the time when the entry was added to the state database).
pruneDirs
If True, directories with no entries are removed.
secure
If True, entries are removed using the command srm instead of rm or Python’s unlink. This only works on systems which have srm installed, such as Mac OS X.
sort
If True, directory entries are acting on in name order. Otherwise, they are acted upon in directory order (essentially random).
sudo
If True, and if a file cannot be removed because of a permissions issue, the same command (either rm or srm) will be tried again using the sudo command. Only use this option if your sudo privilege does not require entering a password!

How else can you use it?

The directory scanner can be used for many other things than just cleaning out old files. I use it for moving files from one directory to another after a certain length of time, for example (such as moving older downloaded files from a local Downloads cache to an offline achive whenever the offline drive is connected). Or you could use it to trigger an e-mail alert whenever a files in a directory tree change, or if a file is ever removed.

Extending the scanner using Python types

It’s also possible to extend DirScanner using custom entry types. This is the most powerful way to use the scanner, and allows you to define things like alternative meanings for “age” and so on. This would be the approach to take if you wanted to enforce size-based limits. Here’s how it’s done:

import dirscan
import time

class MyEntry(dirscan.Entry):
def getTimestamp(self):
"This is my custom timestamp function."
return time.time() # useless, nothing will ever be "old"

d = dirscan.DirScanner(...)
d.registerEntryClass(MyEntry)
d.scanEntries()

The difference between this example, and the previous Trash cleaner script, is that dirscan.py will now store instances of MyEntry in its state database rather than its own entry class. You can use your own class to maintain whatever kind of state you want about an entry, and have it respond based on that information. The following are the methods you can override to provide such custom behavior:

contentsHaveChanged
Return True if the contents of the file have changed. This check is only performed if the modtime has changed from the last scan. The default implementation does nothing, but you could use this method to store an MD5 checksum, for example.
getTimestamp
Return the timestamp that will be used to determine the “age” of the file.
setTimestamp(stamp)
Called to forcibly set the timestamp for a file. The argument is of type datetime from the Python datetime module.
timestampHasChanged
Return True to indicate that the timestamp has changed.
isDirectory
Return True if the entry represents a directory.
shouldEntryDirectory
Return True if the scanner should descend into this directory. The default implementation checks the user’s depth setting to answer this question.
onEntryEvent
Called whenever something notable has happened. This is a low-level hook, called by all of the other hooks.
onEntryAdded
Called when an entry is first seen in a directory.
onEntryChanged
Called when an entry has observed to change, either because of its timestamp or its contents.
onEntryRemoved
Called when the entry has been removed from disk.
onEntryPastLimit
Called when the entry has become “old”, based on its timestamp and the user’s days argument passed to the DirScanner constructor.
remove
Method called to completely remove the current entry. The default implementation goes to great lengths to ensure that whatever type of entry it is — be it a file or a directory — and whatever permissions it has, it’s gone by the time this method returns.
__getstate__
If you need to keep your own instance data in the state database, override this method but be sure to call the base class’s version.
__setstate__
The same goes for this method, as with __getstate__.

Syndicated 2007-09-24 05:26:50 from johnw@newartisans.com

How to administer OpenVPN

Introduction

This document describes how to administrate OpenVPN on a Debian GNU/Linux server. It does not cover installing a new OpenVPN service from scratch, since that is already covered in the official OpenVPN 2.0 HOWTO. In particular, this document covers:

  1. Logging in via OpenSSH to administrate the system.
  2. Creating X.509 certificates for new OpenVPN users.
  3. Installing the OpenVPN client on a user’s machine.
  4. Re-configuring OpenVPN and restarting the daemon.
  5. Re-installing OpenVPN on a new Debian GNU/Linux server, in case the old server dies or is compromised.

If you haven’t installed OpenVPN on your server yet, please visit the official HOWTO and complete the steps there. Then you can return to this document. I originally wrote this to show co-administrators how to work with an already-running OpenVPN installation.

Logging in with OpenSSH

To administer OpenVPN you must have a user account on the server (obviously). Typically, this involves logging using OpenSSH. If interactive — i.e., password-based — SSH logins are disabled, the server must already have your SSH public key installed for you to get in.

Let’s say you already have a login and know how to access the machine through SSH, and you want to create a new account for a new administrator. These steps can also be forwarded to someone who does have root access, so they can create a new account for you.

This is what to do: first, login to the machine using your SSH client. Once you come to a prompt, run:

$ sudo useradd joeuser

(These examples assume the new user’s name is “joeuser”). Next, create the home directory for this user and the user directory needed for SSH:

$ sudo mkdir /home/joeuser
$ sudo chown joeuser:joeuser /home/joeuser
$ cd /home/joeuser
$ sudo mkdir .ssh
$ sudo chown joeuser:joeuser .ssh
$ sudo chmod 700 .ssh

The “sudo” command allows you to more safely run commands as root. The “chmod” command ensures only joeuser can access his own .ssh directory.

Next you need joeuser’s SSH public key. This will be a text file, generated by PuttyGEN or ssh-keygen, beginning with the text “ssh-dss”. The default location for this file on Linux is either ~/.ssh/id_dsa.pub or ~/.ssh/id_rsa.pub.

If you are creating a public/private key pair for a Windows user, I recommend the utility PuttyGEN.EXE. Select an RSA or DSA key — using the radio buttons at the bottom of the dialog — and a key length of 2048-bits or higher. Then press the “Generate” button and save the private key to disk as a file named joeuser.ppk.

In a text box at the top of the resulting window will be the public key needed for logging in. Select the entire key (which begins with “ssh-dss”) and copy/paste it to a file named joeuser.pub. It is not good enough just to press “Save Public Key”, as this does not save the public key in the correct format for the OpenSSH daemon. After copying this data to a file, move the file joeuser.pub to the VPN server and execute the following commands:

$ sudo cp joeuser.pub ~joeuser/.ssh/authorized_keys
$ sudo chown joeuser:joeuser ~joeuser/.ssh/authorized_keys
$ sudo chmod 600 ~joeuser/.ssh/authorized_keys

Now joeuser will be able to login to the OpenVPN machine using his SSH private key. Once he is able to login, he needs to be added as a “sudo” user, which can be done with this command:

$ sudo visudo

Edit the end of the displayed file so it reads:

joeuser ALL=(ALL) ALL

This should follow the pattern of other lines near the end of this file. By adding joeuser here, he is now be able to use the “sudo” command to run other commands as root.

NOTE: In order to disable password authentication in OpenSSH — thus requiring that all users login using their public key — edit the file /etc/sshd/sshd_config and make sure it contains these lines:

PasswordAuthentication no
PermitRootLogin no
PermitEmptyPasswords no

This will go great lengths toward securing your OpenSSH service from brute-force password attacks.

Creating new X.509 certificates

How it works

OpenVPN authenticates users using an X.509 certificate exchange. This is the same way SSL works when you connect to websites whose URL begins with “https://”. It’s one of the most widely used and secure ways of establishing an encrypted connection on the Internet.

For X.509 certificate exchange to work, several elements are involved:

  1. The user’s own private and public certificates. His private certificate lives only on his own machine and is used for decrypting data received from the VPN server. His public key certificate lives both on his machine and on the server, and is used by the server to encrypt data intended for his machine. If you’re not familiar with public/private key encryption, the theory is that anything encrypted with a public key can only be decrypted by the corresponding private key. Therefore, anyone who has your public key can freely send you data that only you can read, without involving the exchange of sensitive passwords. For our sample user, these two files are named joeuser.crt and joeuser.key. (There is also a third file which records to the original certificate signature request, named joeuser.csr).
  2. The server’s private and public certificates. After a connection to the OpenVPN server is established, the server will transmit a copy of its public certificate, which is used to encrypt data only the server can read. Once each side has the other’s public key, two-way secure communication becomes possible.
  3. The server’s TLS authentication key. A copy of this key resides on both machines and is used to help avoid “man-in-the-middle” attacks during the initial negotiation process.
  4. The Certificate Signing Authority’s public certificate. This is used to authenticate the “digital signatures” applied to all of the above keys, so both sides can verify the other’s validity. There exists one private/public Certificate Authority pair for all of Cronus, named ca.crt and ca.key.

To summarize, the user will end up with five files on his machine:

joeuser.crt ::

The user’s public certificate.

joeuser.key ::

The user’s private certificate.

joeuser.csr ::

The user’s original “certificate request”.

ca.crt ::

The public certificate for the Certificate Signing Authority, which is a single certificate that represents the entire CEG organization. All certificates created by CEG get signed by this certificate, which means that any certificate claiming to come from CEG can be verified as such.

ta.key ::

The TLS authentication key, used to strengthen the security of initial handshaking.

NOTE: The security of the entire system rests in the physical security of each user’s private certificate. This means they should not be kept on removal media that can easily get lost! If anyone should get a hold your private certificate, they could listen in on OpenVPN connections coming in from the Internet and easily compromise the entire system. Keep it secret, keep it safe!

Creating a certificate pair

When joeuser is first created, he has no certificates. Adding a new certificate is easy: just login via SSH and run the following commands:

$ sudo su -
# cd /etc/openvpn
# . ./vars
# ./build-key joeuser

You will now be asked a series of questions relating to joeuser. Answer them to the best of your knowledge, and select Yes when it asks if you want to sign and commit the new key. The user’s keys now exist in the “keys” subdirectory, and should be copied through SSH (or using scp) to your own machine so you can give them to joeuser. Here’s how I do it:

# cp keys/ca.crt ~joeuser
# cp keys/ta.key ~joeuser
# cp keys/joeuser.* ~joeuser
# chown joeuser:joeuser ~joeuser/*
# chmod 400 ~joeuser/*

Now ask joeuser to copy these five files from his home directory using his SSH client. Once the transfer is complete, wipe them from his home directory on the server:

# wipe -f ~joeuser/*

joeuser now has all the certificates he needs to connect to the VPN.

Revoking a certificate

If a certificate pair is ever compromised—or you think it might have been—it should be revoked on the server and a new certificate pair issued. This is also quite easy to do:

$ sudo su -
# cd /etc/openvpn
# . ./vars
# ./revoke-full joeuser

Now if joeuser tries to connect again using his old key the connection will be immediately dropped. Note that for this to work, the CRL list must be checked on the server. This is indicated by the crl-verify option in the server’s configuration file. If this has not been setup yet, see the official HOWTO for instructions.a

Installing the OpenVPN client

Once joeuser has his certificates, he needs the OpenVPN client to connect. This can be installed on a Windows or Mac machine using installer packages that are kept on the server. Clients for either OS can be downloaded from the following locations:

Windows http://openvpn.se/
Mac OS/X http://www.tunnelblick.net/

After installing OpenVPN on Windows, for example, it will install a set of programs under the user’s Start menu. One of these is named “Open the OpenVPN configuration directory”. Select this to open an explorer window pointed at the user’s config directory. There should be a single file already in this directory named README.txt.

Into this directory copy all of the user’s certificate files, as well as the standard client configuration file for you server. This client configuration file must be edited once you copy it to reflect the user’s keyfile name. I usually create a client configuration template and keep it here:

/etc/openvpn/client.conf

Once these six files are installed in user’s configuration directory, right-click on the OpenVPN icon in the server tray at the lower right of the screen. There will be a “Connect” option now, which will connect you to the VPN. Once the connection succeeds, the user’s assigned IP address is displayed. Their machine can now be seen by other machines and clients at that IP address.

Congratulations! You are now connected to the VPN.

Configuring the server

At some point in time, you may need to reconfigure the OpenVPN server. For example, let’s assume the DNS server has been changed to something else. Here’s how you’d make that change:

Login to the VPN server using SSH, become root with sudo su -, and then edit this file (assuming you named it server.conf):

/etc/openvpn/server.conf

Here is what this file looked like at the time of writing, with comments interspersed. When I get to the section describing the DNS server, I’ll show how to change it.

The configuration file

This section describes the current server configuration, found in server.conf. This is the only file that needs changing to alter the behavior of OpenVPN.

local 63.251.4.10 port 1194
proto udp

These lines define the external (Internet-accessible) address of the VPN server, and the port number it can be accessed by via UDP. If you change the port number you must also change your firewall script to allow connections to the new port.

cd /etc/openvpn

This directive changes the working directory of the OpenVPN daemon so that we can specify relative pathnames in the rest of the file.

ca keys/ca.crt
cert keys/cronus.crt
key keys/cronus.key
dh keys/dh2048.pem
tls-auth keys/ta.key 0
cipher AES-256-CBC

These are the “security” parameters for the VPN server, specifying the location of the Certificate Authority’s public certificate, the VPN server’s private and public certificate, the Diffie-Helman key used for encryption, the TLS authentication key, and the cipher algorithm used to encrypted the flow of data. AES-256-CBC means that we are using 256-bit AES (American Encryption Standard) in Cipher-Block-Chaining mode, where each block of data sent alters the encryption used on the following block.

dev tun0
server 10.8.0.0 255.255.255.0

push "redirect-gateway"
push "dhcp-option DNS 1.2.3.4"
push "dhcp-option DNS 1.2.3.5"

These options specify that we are using IP Tunneling mode, where all IP traffic flows across the VPN. The other possible mode is Bridging, where all Ethernet traffic flows across the VPN. The advantage to Ethernet Bridging is that although it is noisier (it lets NetBIOS traffic through, for example), it allows clients to map network drives in Windows.

keepalive 10 120

This statement causes the server to “ping” all clients every ten seconds; if a client does not respond to a ping within 120 seconds, it is considered down and the link is terminated. A similar statement is used by clients to make sure that their connection to the remote side has not been terminated.

;comp-lzo
;fragment 1400

These two statements are disabled for now. If enabled, they would cause all traffic over the VPN link to be compressed, and all packets over 1400 bytes in size to be broken up into smaller packets. These options may become useful in the future, which is why I haven’t deleted them.

user nobody
group nogroup

For security purposes, the OpenVPN daemon sets its effective user id after initialization to the user nobody and its group to nogroup. This ensures that the daemon has effectively zero priveleges on the system while it’s running.

persist-key
persist-tun
ifconfig-pool-persist ipp.txt

These options cause all information relating to client connections to be persisted to data files, so that if the server gets restarted (within 120 seconds) existing connections will not need to be terminated. That is, if OpenVPN is restarted, no one currently connected will notice, except for a temporary pause in service.

status /var/log/openvpn-status.log
verb 3

These statements influence the amount of logging performed by OpenVPN. The status directive causes a list of all current connection to be written to /var/log/openvpn-status.log. This file gets updated by the server every minute.

chroot /var/run/openvpn

When the server is running—even though its effective user and group ids have no priveleges whatsoever—we don’t want it to have any more access to the system than is necessary. To this end, the above directives stick the OpenVPN server process into a “chroot jail”, in the directory /var/run/openvpn, which means that the running daemon cannot access any files outside of this directory. So even if an attacker somehow compromises the daemon and forces it to run a command on the system, there is no system file the command will be able to read or change.

mlock

This security-related option forces all memory related to OpenVPN to remain in system memory. It is never paged or written into swap files, where it might be possible for an attacker on the system to sniff out temporary keys or passwords.

Restarting OpenVPN

Once the configuration file has been changed, OpenVPN needs to be restarted. This can be done with the following command:

$ sudo /etc/init.d/openvpn restart

At the end of this command it should say:

OpenVPN: client(FAILED) server(OK).

It’s OK for the client to fail because we are not running in client mode on the server. You can see the most recent informational messages from the server using this command:

$ sudo tail -30 /var/log/daemon.log

You can also use “tail -f” instead of “tail -30” if you want to “watch” new messages output by the server while people are trying to connect. Each new connection generates several messages during the process of certificate negotation.

Recreating the server

NOTE: These instructions are for Debian GNU/Linux 4.0, but should be fairly easy to translate to the other Linux variants.

It may happen at some point in time that your current OpenVPN installation crashes or becomes unstable. In that case, it may be necessary to recreate a new one from scratch. The following steps will guide you through the process of creating a new OpenVPN server, whether or not you still have access to the security files on the old server.

First, a new machine is needed with two network cards: one with access to the internal network, and one with access to the Internet.

Step 1: Install Debian GNU/Linux 4.0. This should be done from the netinst CD-ROM image that can be downloaded from ftp://ftp.debian.org.

Step 2: Once Debian is installed, login as root and type:

# apt-get update
# apt-get dist-upgrade

This will upgrade your Debian installation to the latest stable version.

Step 3: Install the necessary security packages:

# apt-get install openvpn openssl bridge-utils
# apt-get install ssh iptables iproute rsync

Step 4: Uninstall certain default packages which are not needed and pose potential security risks, such as the Apache HTTP server:

# apt-get remove apache2

Step 5: Copy over the configuration files and scripts from the old server:

# rsync -e ssh -av <oldserverip>:/etc/openvpn/ /etc/openvpn/
# scp <oldserverip>:/etc/init.d/openvpn /etc/init.d
# scp <oldserverip>:/etc/ssh/sshd_config /etc/init.d
# mkdir /var/run/openvpn

If you do not have an old server, or if you believe the old server has been compromised, you should create a new OpenVPN environment. This will mean resetting the authentication certificates for all clients and is not a trivial operation for a sizable organization. To do this, follow the steps in the official OpenVPN 2.0 HOWTO.

Step 6: Configure the network interfaces for the new machine. Edit the file /etc/network/interfaces= and assign the IP address, gateway and netmask details to your two network interfaces.

Step 7: Reboot the server to incorporate all the above changes. If everything was copied correctly, it should mention in the boot log that the OpenVPN server was started OK.

Security details

The OpenVPN server uses SSH with public key authentication only and OpenVPN via X.509 certificate exchange. It runs OpenVPN in a “chroot jail” (in /var/run/openvpn), meaning that after initialization the server daemon cannot see anything on the system except what is in the /var/run/openvpn directory. Lastly, the OpenVPN server is configured with the mlock option, which prevents it from ever data to the swap volume so that temporary keys cannot be sniffed out, even if an attacker comprises the system and is able to login.

For maximum security, the file /etc/openvpn/keys/ca.key should not be kept on the server, but moved to a physically secure device that only the system administrator has access to. This is the Certificate Authority private certificate, and is required for creating new certificates. Without it, only pre-existing certificates would ever be allowed by the system. If it is copied to a new location, say /mnt/private/ca.key, the configuration file /etc/openvpn/server.conf will have to be changed to refer to this new location. Another possibility would be to copy the contents of /etc/openvpn from the OpenVPN server to another machine which is not accessible to the network, and then to use this machine only for creating new keys.

The OpenVPN server also runs a firewall using the iptables facility in the Linux kernel. You can see the current state of the firewall by using this command:

$ sudo iptables -nL

The firewall should leave at least two ports open to the Internet: TCP port 22 for SSH; and UDP port 1194 for OpenVPN. It must also accept incoming traffic from authenticated OpenVPN clients to the internal network, and from the internal network to all connected clients. Here is a set of commands for iptables that reflect this policy:

INET=eth0
iptables -A INPUT -p udp --dport 1194 -j ACCEPT

iptables -A INPUT -i tun+ -j ACCEPT
iptables -A FORWARD -i tun+ -j ACCEPT
iptables -A FORWARD -o tun+ -j ACCEPT

iptables -A INPUT -i tap+ -j ACCEPT
iptables -A FORWARD -i tap+ -j ACCEPT
iptables -A FORWARD -o tap+ -j ACCEPT

iptables -t nat -A POSTROUTING -o $INET -j MASQUERADE

It is also advisable for clients of the VPN to always have a firewall running and active. The built-in firewall that comes with Windows XP is just fine for this purpose.

Syndicated 2007-09-24 00:15:52 from johnw@newartisans.com

An SVK primer

Today’s entry is a little primer I wrote for some co-workers at CEG, on setting up SVK to do remote development. We continue to use a central Subversion repository, but I often find myself working in cafés where I don’t have immediate access to the server. Also, I like to branch and check-in much more frequently than would be sane to do with Subversion — I also like the distinction between a “check-in” being a simple, quick snapshot, and an “svk push” as the real deal.

Setting up remote mirrors

SVK is a simple wrapper around Subversion that provides some of the better features of Distributed Version Control systems without a lot of the complexity that such systems usually involve. This primer aims at giving a moderate to seasoned Subversion user quick access to the better features of SVK.

Creating “depots”

The first step to using SVK is to create a local mirror of a remote repository. But even before that, you need a local depot to track them in. Here is the first command you need to run to get started:

svk depotmap -i

This initializes a private depot named “//” in ~/.svk/local. All of the projects you mirror locally will be tracked here, and any projects you create on your own machine are kept here. ~/.svk is where SVK “lives”. When SVK asks if you want to create the missing directory,just say yes.

It’s also possible to have multiple depots — you might have one for personal projects, one for work, and one for tracking free software. Here’s how you’d go about creating a depot named “/CEG/” for tracking CEG projects, in the directory ~/CEG/.svk:

svk depotmap -i /CEG/ ~/CEG/.svk

SVK asks to create the directory for you, and then initializes it so it’s ready for importing/creating projects within it.

Mirroring projects

To recap, every project you track lives in a depot. The depot name occurs at the start of every directory string you use to identify projects. The default depot name is “//”; if you created a CEG depot,that one is named “/CEG/”. Here’s an example of how to mirror a personal project in //, and a CEG project in /CEG/:

svk mirror https://ledger.svn.sourceforge.net/svnroot/ledger/trunk \
//mirror/ledger/trunk

svk mirror svn://svnhost.3dex.com/project/MyProject/trunk \
/CEG/mirror/MyProject/trunk

The path “/CEG/mirror/” is just a convention, but it will be very helpful later on. Also, rather than mirroring the entire project “MyProject”, it’s much better just to mirror the trunk and any specific branches you need. With SVK, it’s easy to integrate mirrors of other branches later on. Let’s quickly add a mirror for the a-new-port project to the CEG depot:

svk mirror svn://svnhost.3dex.com/project/MyProject/branches/a-new-port \
/CEG/mirror/MyProject/branches/a-new-port

Voila! the a-new-port branch is now being mirrored also, alongside the trunk.

Getting remote changes

Now that you have mirroring setup, you must “sync” to get all the latest changes — which on the first run means all changes. This first run will take a long time, so don’t be dismayed, or abort the process thinking that it’s hung. The command to sync all of your mirrors is:

svk sync --all

To sync a specific mirror, name the depot path. You can also sync all the mirrors for a particular depot:

svk sync /CEG/mirror/MyProject/trunk

svk sync --all /CEG/

To get a list of your depots, use:

svk depotmap -l

To see a list of all mirrored projects, use:

svk mirror -l

As time goes by, you can now periodically update your mirrors using svk sync --all, which downloads all the changes that have been commited since the last time you ran it.

Using SVK, Subversion-style

It’s possible to use SVK solely as a mirroring Subversion client. In this form of usage, checkins are commited to the Subversion repository immediately, just as if you were using Subversion itself. The only benefit gained by using SVK in this mode is that you have full access to the repository’s history, even when you’re not connected.

Checking out a working tree from a mirror is a lot like svn checkout, except that you give the depot path, not the Subversion URL:

svk checkout /CEG/mirror/MyProject/trunk MyProject

This creates a local working tree named “MyProject”, following the remote trunk.

Now let’s say you disconnect from the network. You will still be able to run the following command, showing differences between version 200 and the HEAD:

svk diff -r200:HEAD META-INF/ejb-jar.xml

With Subversion, this command would need access to the remote repository to succeed; with SVK, it always happens at the speed of local access.

(NOTE: SVK uses it own revision numbers, which are not identical to those used in the Subversion repository. This is because SVK revision numbers track the number of changes that have occurred in your depot, whereas Subversion tracks the number of changes that have happened to the remote repository overall (including changes in branches you may not be tracking). So it always helps to use svk log to determine the correct revision numbers of the changes you’re looking for.)

Updating your working tree

If someone commits a change to the remote Subversion tree, you can get it by doing an svk sync --all, following by an svk update in your working tree:

svk sync --all
svk update

Checking in changes

To check in changes, just do a svk sync --all, following by an svk update; resolve any merges conflicts — just as you would with Subversion — and then commit the changes:

svk sync --all
svk update
# <Now resolve possible conflicts…>
svk commit -m "My commit comment"

The changes are posted immediately to the remote Subversion repository, and your local mirror is updated at the same time.

Using SVK, Distributed-style

Using SVK in distributed mode requires only one extra step: creating a local branch of the remote mirrored project. This local branch lives on your own machine, and all your future commits are made against it. To get changes down from the server, or push them back up to the server, SVK provides the commands “push” and “pull”. Here is a quick guide to setting up a local branch for distributed development:

# Create the branch by doing a cheap copy
# (this is identical to creating a branch in Subversion)

svk cp -p -m "Created branch" \
/CEG/mirror/MyProject/trunk \
/CEG/local/MyProject/trunk

This command create a local branch in the /CEG/ depot, with almost the same name as the mirror of the remote repository. The mirror path begins with /CEG/mirror to show its contents are tracking the remote; the local branch begins with /CEG/local to show its contents live only on the local machine.

Once we’ve created the local branch, we can checkout a copy exactly as we did above, only using the new local branch path instead:

svk checkout /CEG/local/MyProject/trunk MyProject

Now we have our local working tree again, whose contents (at the moment) are identical to what would have happened from a regular Subversion checkout.

Updating your working tree

Let’s say someone checks in changes to the Subversion repository. We need to: 1) synchronize our mirror, 2) merge the changes from the mirror to our local branch, and 3) merge these new changes from the local branch into our working tree. Fortunately, SVK has rolled all these commands into one:

svk pull

That’s it. It will do the sync, update the mirror, update the branch, and then update our local working tree. If you had wanted to do it manually, the steps would have been:

svk sync --all
svk smerge /CEG/mirror/MyProject/trunk \
/CEG/local/MyProject/trunk
svk update

The smerge command is described later. Most of the time, all you need will be svk pull. You won’t even have to do a sync anymore!

Checking in changes

Checking in changes to a local branch is the best part about SVK, because they don’t have to go to the remote repository right away. This means you can do multiple, quick checkins during a large work in progress without breaking any builds.

You can commit to the local branch in the same way as any Subversion commit:

svk commit -m "First change"
svk commit -m "Second change"
svk commit -m "Third change"

These commits are quick and cheap, since they all go to a local branch on your own machine. When you next do an svk pull, it will merge in any changes from the remote repository “underneath” your new changes, meaning it’s easy to keep up-to-date with the latest trunk revision without interrupting your workflow. This is the real beauty of distributed version control.

Posting your changes

Because we’ve only commmitted our changes locally, we now have to “post” them back to the remote repository. SVK also has an equally easy command for this:

svk push

The push command can work in one of two modes: it can “replay” each local commit on the remote server, in order to preserve all your commit history; or it can post all your local changes into one big commit, with all the merge comments glommed together in one comment:

svk push      # push each local commit as a remote commit
svk push -l # "lump" all local changes into one remote commit
svk push -C # don't actually commit; show if it would conflict

For interests sake, the individual steps of the push command in this example would look like this:

svk sync --all
svk smerge /CEG/local/MyProject/trunk \
/CEG/mirror/MyProject/trunk

The process of merging into /CEG/mirror causes those commits to be immediately staged into the remote Subversion repository, since SVK maintains the mirror in perfect sync with the remote repository. We are now back in line with the main trunk!

Creating a local topic branch.

Let’s say you’re doing some heavy work, and you want to experiment with a possible optimization. This means you want to pause current development in your local branch — but you want to do your test work on top of these local changes, without having to check them in first. In SVK this is a breeze.

First, fully commit your current work into the local branch. Then, make a snapshot of your local branch to a local topic name:

svk commit -m "Committing work to begin topic branch"
svk cp -p -m "Created topic branch" \
/CEG/local/MyProject/trunk \
/CEG/local/MyProject/branches/optimization-test

Now switch your local working tree to track “optimization-test”:

svk switch /CEG/local/MyProject/branches/optimization-test

The changes you commit from this point onward are committed to the “optimization-test” topic branch. If you ever need to switch back to the main local branch for any reason, just commit all current changes into your topic branch and say:

svk switch /CEG/local/MyProject/trunk

As long as you commit before switching, you can switch back and forth as much as you like. Plus, using svk pull in either working tree will pull in whatever recent changes have been made to athe remote repository. This lets you work on multiple branches of local development easily, without ever getting out of sync with the main trunk.

If you end up not liking your changes to the optimization-test branch, just switch back to your main local branch and delete the topic branch:

svk switch /CEG/local/MyProject/trunk
svk rm -m "Bad code" \
/CEG/local/MyProject/branches/optimization-test

If instead you really liked the changes and want to integrate them into your main local branch (to prepare them for committing to the remote), use the powerful smerge command to copy the changes over:

svk switch /CEG/local/MyProject/trunk
svk smerge /CEG/local/MyProject/branches/optimization-test .

The smerge command says to merge all changes committed in the optimization-test branch into the current working tree (.). If you like the result, svk commit the changes back into your local branch. Then you can svk push to reflect them up to the remote repository.

The power of “smerge”

svk smerge can be used not only for merging branch changes into a working tree, but also for merging changes directly from repository to repository, without involving any working tree at all. However, it’s easier to test the results of a merge if you use a clean working tree as the “staging area”.

You can also use the -C option to smerge to do a “merge check”. This doesn’t actually do any merging, but instead tells you what would have happened, and if any conflicts would have resulted from the merge.

Further, the smerge command maintains a historical state of all past merge operations, using regular Subversion properties. This means that if you merge in changes from a topic branch one week, and then merge in later changes from the same branch a week later, only the new changes get merged in the second time. smerge knows that it already has the older changes.

Here’s how you would successively merge changes from the MyProject Subversion trunk into the a-new-port branch, using SVK. I personally run this command every time I see new changes committed to the trunk:

svk smerge /CEG/mirror/MyProject/trunk \
/CEG/mirror/MyProject/branches/a-new-port

By running this command every week, the “a-new-port” port in the remote repository stays up to date with changes in the trunk.

On the day when a-new-port is finally ready for prime time use, the reverse command will merge all those changes back into the trunk — without overlapping any changes from those previous smerge runs:

svk smerge /CEG/mirror/MyProject/branches/a-new-port \
/CEG/mirror/MyProject/trunk

Of course, with a command like this, it’s MUCH safer to stage the merge results into a working tree for verification first. Here’s how such a session might play out:

svk checkout /CEG/mirror/MyProject/trunk
svk smerge /CEG/mirror/MyProject/branches/a-new-port .

# resolve conflicts and/or correct any breakages
svk commit -m "Merged in a-new-port"

svk rm -m "Removed SVK mirror; we don't need it anymore!" \
/CEG/mirror/MyProject/branches/a-new-port

svn rm -m "Removed Subversion branch; we don't need it anymore!" \
svn://svnhost.3dex.com/project/MyProject/branches/a-new-port

Syndicated 2007-09-22 04:38:57 from johnw@newartisans.com

14 Nov 2003 (updated 14 Nov 2003 at 08:23 UTC) »

Finally updated my user info with a link to my new home page.

Also, I've moved to a Mac as my base development platform. The move to Darwin has been interesting, and mostly pretty fun. Some of my discoveries are documented at http://www.newartisans.com/johnw/MacTips.html.

This Advogato system seems like an interesting setup. I've always wondered about remote diary systems, though. For me, I use records-mode in Emacs, and then publish them (via a Python script) to my website. To see thoughts I've been having recently about free software or social freedom, look under the category "freedom". I wonder if there's a way to export that data here, rather than having to enter it through a separate mechanism?

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!