Older blog entries for johnw (starting at number 33)

Script of the week: linkdups

It’s been a while since I’ve posted a script; life has been distracting lately. I also wanted to let this current script mature a lot more before sharing it, as it has the potential to be destructive. Use wisely!

It’s name is linkdups, and it’s a Python program to recursively walk through a directory tree and hard-links any files together whose contents match exactly. That means that if you have two files, each taking up 10 Kb, afterwards they will be linked to the same contents for a total savings of 10 Kb.

This type of space savings only works for media that never changes. It’s great for use inside of archival disk images, website mirrors, backup directories, etc.

It seems that most programmers, at some point or another, writes a script to do exactly this kind of thing. My brother was telling me the other day about one he’d written.

The main purpose of this script is to be extremely efficient. I tested it against a directory hierarchy containing over 40 million entries, many of which required hard-linking. My requirements were that memory consumption never grew beyond a small, startup footprint, and that it not waste cycles computing unnecessary details (like checksumming everything and then looking for matches).

Here’s how the algorithm works:

  1. First, I divide the problem between large files (easy gains), and small files. The default cutoff is 16K. This division means you get the biggest savings up front. It also cuts down on memory usage for internal tables.
  2. Next, a table is created of how big each file in the target set (large or small) is. Groups are then made for like-size files; singletons are dropped.
  3. Within the like-size groups, MD5 is used for a quick content check. Files in a size group with matching checksums are considered candidates for linking.
  4. The candidates from step 4 are byte-wise compared to ensure an exact match. If any two match, the second one is removed and the first is hard-linked to the second’s name. A tally is then added of how many bytes were saved. This is repeated for all other files in the group. (Note: byte-wise comparisons are made only against the first member of a group, on the assumption that MD5 most likely guarantees a match).
  5. At the end of the run — or after Control-C has been pressed — the total byte savings is displayed in human readable form. Use the --verbose option to watch the algorithm do its work.

The script is also designed to be abortable. You can hit Control-C any time if you get bored, and your files will be left in a good state. It does as much linking as it can, but once Control-C is pressed, it stops wherever it is, wraps up the last task, and ends safely.

It can be downloaded from my FTP server: linkdups.py.

Syndicated 2008-03-28 05:40:15 from johnw@newartisans.com

Defragmentation and disk images

There has been a small debate among some Mac users about whether defragmenting your disks is necessary for the smooth operation of OS X. I’ve always been in the camp of those who do it regularly, because I’ve seen what my disk ends up looking like after a few weeks of not doing it (and how pretty the graph looks afterwards). It could all be psychological, but I find the progress bar rather hypnotizing, and my wife has been known to find me staring at it for hours on end. Ok, very psychological.

Well, I’ve found one circumstance where defragmentation is definitively helpful: compressing sparse disk images.

I have a habit of creating encrypted disk images for every client I work for. A lot happens in those images, which tend to grow and shrink quite a bit. OS X has a command to squeeze out the unused space, which looks like this:

$ hdiutil compact DiskImage.sparseimage

I just ran this command on one of my work images, and it reported a savings of 786 Mb out of 3 Gb. Just for interest’s sake, I ran iDefrag on the same volume, which eliminated the small amount of fragmentation that had built up, and compacted the files down toward the beginning of the image.

Running hdiutil compact on the same disk image again resulted in a further savings of 518 Mb! Given that the image itself is now 1.4 Gb, that’s about a 30% further reduction.

So, if you’re like me and you use lots of virtual disk images — and you’ve always wondered if defragmentation tools were worth anything at all — here’s one reason to consider it.

Syndicated 2008-03-20 06:17:05 from johnw@newartisans.com

A Ledger success story

Over the Christmas break I went on Bahá’í pilgrimage to the holy sites in Haifa, Israel. It was a nine day stay in Haifa, with three extra days spent traveling and resting in Tel Aviv.

Three of us went: my wife and I, and my mother-in-law. We wanted to avoid the confusion of who pays for what, so we just pooled all our money together, assigned me as the accountant, and after the trip I was supposed to figure out exactly how much each person owed, less what they originally put into the pot. These finances were further complicated by having to deal with three currencies (USD, Euros, and New Israeli Shekels), and the use of both cash in two currencies, Traveler’s Checks in US dollars, and my MasterCard.

This is just such a case where my double-entry accounting tool, Ledger, ought to shine. Imagine keeping track of all those details in Quicken — which doesn’t even support multiple currencies in a “cash” type account! (At least, it didn’t the last time I used Quicken).

Anyway, not only did my general ledger happily balance to zero at the end (and, because of the double-entry book-keeping, found many errors in my paper register), but I knew exactly how much to pay each person back, and even how much money I lost due to the conversion from dollars to shekels and back again. All this accounting work was done today using the Common Lisp version of Ledger (CL-Ledger), and took just under 3 hours to complete with 65 entries total and 172 transactions.

Syndicated 2008-01-09 00:37:26 from johnw@newartisans.com

Script of the week: verify

This week’s script uses Leopard’s new “xattr” tool to store MD5 checksum information alongside any file you wish. Later, you can run the same script to ensure that the checksum has not changed.

For example, I run it on my Applications directory like this:

find /Applications -print0 | sudo xargs -0 verify

The first time it’s run, verify will print a lot of warnings about how none of the files currently have checksum information. But the second time you run it, it silently verifies that none of the files have changed. If there are changes, an error is printed to standard error, and an exit code of 1 is returned. It’s like a poor man’s Tripwire, but no external databases or configuration files are necessary, since it stores the checksum data in an extended attribute called “checksum”. This means that if you later move the file (even to another HFS+ volume), the verify script will still verify the current contents against the initial checksum.

If the file has legitimately changed, you must first delete the old checksum:

xattr -d checksum FILE

I recommend using this script on archival data, where changes are never expected. I created this script because I recently discovered some data corruption in a set of ISO files, and found to my dismay that ISO offers no built-in scheme to determine which files had been corrupted and which hadn’t, forcing me to throw out almost the entire lot. Now I use verify to ensure that from this point on, the ISO never changes.

On a security note: This scheme offers no security, unlike systems like Tripwire. A hacker who has permissions to change the file will also have permissions to reset the extended attribute containing the checksum.

Here’s the bash script:

#!/bin/bash

ATTRNAME=checksum

verbose=false
if [[ "$1" == "-v" ]]; then
verbose=true
shift 1
fi

error=false

for file in "$@"; do
name=$(basename "$file")
if [[ -f "$file" && \
"$name" != ".DS_Store" && "$name" != ".localized" ]]; then
CHKSUM=$(xattr -p $ATTRNAME "$file" 2> /dev/null)
CURSUM=$(md5 -q "$file")
if [[ -z "$CHKSUM" ]]; then
echo "Warning: No existing checksum for $file"
xattr -w $ATTRNAME $CURSUM "$file"
CHKSUM=$CURSUM
elif [[ $CURSUM != $CHKSUM ]]; then
echo "Error: Checksum mismatch for $file: $CURSUM != $CHKSUM" \
> /dev/stderr
error=true
fi

if [[ $verbose == true ]]; then
echo "$CHKSUM $file"
fi
fi
done

[[ $error == true ]] && exit 1

Syndicated 2008-01-07 04:09:31 from johnw@newartisans.com

FP techniques in Lisp: Data sharing

Common Lisp has often been called a “multi-paradigm” language, in that it allows you to program in many different styles, sometimes simultaneously: imperative, object-oriented, functional, statically typed, etc. It depends on what style you want to adopt, how your code will look.

Recently I’ve been porting a C++ accounting system to Common Lisp. And after only six weeks, the port is nearly complete — a feat I credit to the power of the Lisp language and the facilities it offers I’d been forced to replicate in C++.

But as the port nears completion, I find myself questioning some of the design decisions. Did C++ force me down a path where Lisp can offer a better alternative?

The imperative approach

One thing C++ does not support well at all is Functional Programming, or the idea that a program is simply a group of functions, each taking inputs and producing outputs, with none of them changing the environment. This has powerful implications for things like concurrency, an area where Erlang is currently experiencing considerable success.

There are many facets to FP, but one of its core tenets is this: That all inputs to a function are immutable; that all outputs from a function must remain immutable; and that no environmental side-effects can occur in any function.

In a language like C++, this approach incurs considerably data copying. In cases where data often cannot be changed — such as strings — it requires the evolution of “copy on write” strategies, to defer the copying until the last possible moment. Take for example a function which receives a vector, and must add a new element to the beginning of that vector, afterwards returning the new version:

std::vector<int> push_element(std::vector<int> foo, const int x) {
foo.push_front(x);
return foo;
}

Of course for longer lists this gets extremely expensive, causing most C+ programmers coders to switch to using references, modifying the list in place. While more efficient, the question then becomes: who else holds a reference to the list, and do they need to be informed of the insertion? If the vector had been one of objects, rather than integers, full copying would not be an option for non-trivial lists. In that case, pointers would have to be used to avoid the cost of all the additional copies. But then how do we know when the last pointer is freed? So not pointers, really, but rather thin objects of type shared_ptr which use reference counting to track object lifetimes. And believe me, the complexity has only just begun.

This level of complexity is avoided in C++ by carefully drawing lines between code boundaries, and safely allowing direct modification of data. It’s simple, straightforward and efficient. But it also places the burden on the author to know when data can be shared, and when it can’t. If the program is multi-threaded, certain ranges of code must be guarded against concurrent access, leading to sophisticated strategies involving multiple-reader, one-writer gates, with concomitant algorithms to prevent resource exhaustion. In other words, the simplicity gained in the single-threaded case in almost entirely lost in the multi-threaded case.

FP and maximal sharing

FP languages get around these two extreme by using a technique I call “maximal sharing”. Instead of the first case above, where the entire list was copied; and the second case, where an existing list was modified; FP creates a third alternative: create a new list by sharing one new value with the contents of the old. In this scenario, only the first element of the new list is truly “new”. The remainder of the old list is pointed to by the new list, meaning that all of its contents are shared between the two lists.

The type of scheme relies on memory algorithms like garbage collection to work well, because it becomes very important to know when an item is no longer referenced. In a typical FP-style program, a single element might be shared by hundreds of other objects, since each time that element is passed to a function, a new value involving that element might be created.

This sort of approach can be used quite readily in Common Lisp. Here’s a simple version of the push routine above which makes the single assumption that both lists must remain immutable after the call:

(defun push-element (list element)
(cons element list))

Note two things about this implementation: 1) It does not change the input list; and 2) the output list results required the allocation of only a single cons cell. As long as neither list is ever changed, it works beautifully. It’s fast, memory efficient, and as straightforward as possible (though granted, it’s a trivial example).

But can such an approach pan out when we’re modifying elements further down, in longer lists?

A new function: apply-to-list

To test this theory, I’ve created an algorithm called apply-to-list, as part of a functional programming library I’m working on. The job of apply-to-list is to generate a new list from an old one, while sharing as much structure as possible from the original list. Using apply-to-list, I’ve implemented a function called mapcar-if, which can be used just like mapcar to selectively generate new list elements, but with the new advantage of maximal sharing. Here’s the documentation for apply-to-list, which describes how it works:

(defmacro apply-to-list (list predicate function &key (first-only nil)
(skip-to-next t) (lookahead t))
...)

APPLY-TO-LIST: Given an input LIST, replicate as much of its structure as possible while applying some kind of transform on its value.

For every member of the list where PREDICATE returns T, FUNCTION is called with the list whose CAR is that member; FUNCTION should return the list which will be substituted at that point (this makes it possible to remove, change or insert the matched cell). If FIRST-ONLY is NIL, this is only done for every cell that matches. Note: Be very careful when setting FIRST-ONLY to T, for if FUNCTION returns a new list that also matches PREDICATE, an infinitely recursive loop can occur. If SKIP-TO-NEXT is T, scanning the function contains with the CDR of the value returned by FUNCTION, which can never lead to infinite recursion. If LOOKAHEAD is T, the list is prescanned to see if PREDICATE matches, otherwise copying is done regardless of whether there is a match in LIST or not; this mimics the behavior of CL:MAPCAR and is better for very short lists.

This function depends on the following contract with the caller:

  1. The input LIST is immutable after any call to APPLY-TO-LIST until the end of the program.
  2. The returned LIST is likewise immutable.

The memory savings offered by this function comes at two costs: The first is the subsequent immutability of the input data, and the second is an increase in functional complexity. Specifically, while CL:MAPCAR is O(N) for a given list, FPROG:APPLY-TO-LIST — when used to implement a sharing form of MAPCAR, such as FPROG:MAPCAR-IF — has complexity O(N) in the best case, and O(2N) in the worst case when LOOKAHEAD is T, otherwise it is also O(N) (where an element to be substituted occurs at the very end of the list).

Now, the cost of speed in the worst case can lead to dramatic improvements in memory usage in the average case, with an attendant speed advantage. Take the case of a list which is 500 elements long. In my environment, here are the timings for using MAPCAR to generate a new list from an old one where only one cons cell needs to be changed. These times were determined by calling the same code repeatedly 1,000,000 times (that code is near the end of this file, in the function TIMING-TESTS):

Evaluation took:
8.367 seconds of real time
7.931782 seconds of user run time
0.342331 seconds of system run time
[Run times include 2.679 seconds GC run time.]
0 calls to %EVAL
0 page faults and
4,024,029,056 bytes consed.

That’s 4 gigabytes of memory, probably to be expected. The only reason this doesn’t blow the heap is because all of the intermediate results are being thrown away, making a lot of the cons’ing "free". If the results are kept, the MAPCAR solution becomes impossible without dramatically increasing Lisp’s heap size.

The memory and time costs of using MAPCAR in this example are constant no matter whether the cons cell is substituted at the beginning, middle or end of the 500 element list. To compare, here are the time and memory statistics from FPROG:MAPCAR-IF for the same data, in all three cases (best, average, worst):

Evaluation took:
3.478 seconds of real time
3.474324 seconds of user run time
0.003887 seconds of system run time
[Run times include 0.026 seconds GC run time.]
0 calls to %EVAL
0 page faults and
40,007,952 bytes consed.

In the best case, memory usage is reduced by two orders of magnitude, with an appreciable boost in speed. If the results of this case are saved (using COLLECT in the LOOP instead of DO), the speed savings can become dramatic. Note also that except for the immutability constraints, the results from the two different approaches are EQUAL.

Evaluation took:
7.495 seconds of real time
7.272269 seconds of user run time
0.173947 seconds of system run time
[Run times include 1.416 seconds GC run time.]
0 calls to %EVAL
0 page faults and
2,032,015,008 bytes consed.

In the average case (middle of the list), memory usage is cut in half, while runtime speed is still faster. The cons’ing of CL:MAPCAR also gets more expensive the more the results are kept, so this trivial speed tests — where no results are saved — is not exactly fair between the two. But even still FPROG:MAPCAR-IF is doing well.

Evaluation took:
11.343 seconds of real time
10.969349 seconds of user run time
0.327477 seconds of system run time
[Run times include 2.679 seconds GC run time.]
0 calls to %EVAL
0 page faults and
4,024,030,568 bytes consed.

Finally, the pathological case, where MAPCAR-IF degenerates into an exact duplicate of MAPCAR. Memory use is the same, but speed is much slower because the call to MEMBER-IF is searching the entire list before we decide that all of it needs duplication.

The functionality offered by APPLY-TO-LIST is that every cons cell from the original LIST, after the last matching member, is shared entirely. This is quite different from COPY-LIST, which creates new cons cells for every position — even those that do not require a unique structure. For example, consider the following list:

(defparameter *alist* '((a . 1) (b . 2) (e . 3) (f . 6) (g . 7)))

The idea is to return another version of this immutable list, while sharing as much structure as possible — because the return value is also considered immutable. The following function call achieves this, using the Modify pattern from above:

(apply-to-list *alist* #'(lambda (member) (eq 'e (car member)))
#'(lambda (list) (cons (cons (caar list) 5)
(cdr list))))
=> '((a . 1) (b . 2) (e . 5) (f . 6) (g . 7))

In the returned list, 15 atoms are shared with the original, while one new cons cell and one new atom are created:

1, 2, 3:         (a . 1)
4, 5, 6: (b . 2)
7: e
8, 9, 10 11: ((f . 6) ...)
12, 13, 14, 15: ((g . 7))

The usual practice of calling MAPCAR and changing the incorrect element would have result in sharing only 13 atoms. That code might have looked like this:

(mapcar #'(lambda (cell)
(if (eq 'e (car cell))
(cons (car cell) 5)
cell))
*alist*)

Further, while a discrepancy of 2 cons cells may not seem like much in this example, the difference increases by one for every cell beyond the cell that matches. Thus, if the input list had contained 100 cells beyond (e . 3), the difference would have been 102 cells, and not merely 2.

Finally, in our example exactly 4 new cons cells and 1 new atom were created as a result of the call:

1: ((a . 1) ...)
2: ((b . 2) ...)
3: ((e . 5) ...)
4: (e . 5)
5: 5

This is the minimum amount of new information required to represent a new structure where the only change is that ‘e’ is paired with 5 instead of 3.

Conclusion

The idea of APPLY-TO-LIST is to support efficient functional programming, wherein immutable outputs are derived from immutable inputs by efficiently sharing as much structure as possible — resulting in the least new memory allocated. In cases where no references are held, this offers only a little gain over advanced generational garbage collection (such as lists passed within a recursive function); but if the results are held over the longer term, such as a series of computed values stored in a result list, the savings of this function become quite substantial. It was exactly this sort of sitation which motivated the creation of APPLY-TO-LIST: it made it possible to reduce overall memory consumption by a factor of 20, without introducing any additional complexity in the calling code.

Syndicated 2007-12-13 22:02:29 from johnw@newartisans.com

New version of Ready Lisp for Mac OS X available

There is a new version of Ready Lisp for Mac OS X available.  This version is based on SBCL 1.0.12.17, and requires OS X Leopard 10.5.  The most notable change from the previous version is that it is now fully universal, supporting PowerPC and 32- bit and 64-bit Intel machines.  Also, threading has been turned on for Intel processor.  See the NEWS below.

What is Ready Lisp?  It’s a binding together of several popular Lisp packages for OS X, including: Aquamacs, SBCL and SLIME.  Once downloaded, you’ll have a single application bundle which you can double-click — and find yourself in a fully configured Common Lisp REPL.  It’s ideal for OS X users who want to try out Lisp with a minimum of hassle.  The download is approximately 87 megabytes: ftp://ftp.newartisans.com/pub/lisp/ReadyLisp-1.0.12-10.5.1-2.dmg

There is a GnuPG signature for this file in the same directory; append “.asc” to the above filename to download it.  To install my public key onto your keyring, use this command:

$ gpg --keyserver pgp.mit.edu --recv 0x824715A0

Once installed, you can verify the download using the following command:

$ gpg --verify ReadyLisp-1.0.12-10.5.1.dmg.asc

Below is a full rundown of what’s new.

Now fully universal

Ready Lisp is now fully universal, and runs on the following platforms:

  • Intel 64-bit
  • Intel 32-bit
  • PowerPC 32-bit

There is no port of SBCL to 64-bit PowerPC. Experimental threading has been enabled for both Intel platforms.

Updated versions

The following pieces were updated:

  • SBCL, to version 1.0.12.17
  • SLIME, to CVS version 2007-12-06

Aquamacs remains at version 1.2a.

Full Info documentation

Info documentation for the Common Lisp pieces is now bundled in. Just type C-h i to read it. Also, when editing Common Lisp files, you can type C-h f to instantly access the HyperSpec index. In Emacs Lisp files, C-h f will get you help on Emacs Lisp functions.

There is also HTML and PDF versions of all documentation in:

  • Ready Lisp.app/Contents/Resources/html
  • Ready Lisp.app/Contents/Resources/doc

More libraries

There are a few more Common Lisp libraries bundled in the core file with this release:

  • CL-FAD
  • LOCAL-TIME
  • SERIES
  • MEMOIZE
  • CL-PPCRE

I find these libraries very handy, but mainly I’m including them because the upcoming release of my CL-Ledger accounting tool depends on them, so it will work for Ready Lisp users out-of-the-box. See the “doc” subdirectory above for documentation on how to use these libraries (except memoize, which does not have separate documentation; use memoize:memoize-function to mark a function as memoized).

Syndicated 2007-12-06 20:47:04 from johnw@newartisans.com

6 Dec 2007 (updated 8 Dec 2007 at 23:10 UTC) »

Fixed a few bugs in Ready Lisp

A couple of pathname issues were discovered in the release of Ready Lisp that was posted yesterday, leading to the inability to load asdf-install (or use it). These have been fixed in the new release uploaded today. If you now use asdf-install and choose a “system-wide” installation, the installed packages get saved in your Application bundle. However, due to the way that asdf-install itself works, if you then move your application bundle to another directory, symbolic links in the systems directory will get broken. So I recommend installing new packages into your home directory instead.

Also, the sources for SBCL are now included, meaning that if you use M-. (jump to definition) and pick a function like mapcar, it will drop you into the source code for SBCL’s mapcar implementation.

The new version is available here (the old link still works, it is now a reference to this URL): ftp://ftp.newartisans.com/pub/lisp/ReadyLisp-1.0.12-10.5.1-2.dmg

Lastly, I’ve created a new home page for the Ready Lisp project, which now lives here: http://www.newartisans.com/software/readylisp.html

Syndicated 2007-12-06 20:42:21 (Updated 2007-12-08 23:10:35) from johnw@newartisans.com

Common Lisp web servers

Recently at work my manager asked me to create a server solution that was both, “Fun, and easy for me to get things running quickly in.” Well, to me that’s a roundabout way of saying Common Lisp, so I started looking at possible solutions based on that platform. The solution will run as a webserver in a potentially high-demand scenario, so I figured it would pay to compare the options available to me.

Unfortunately, I did not have equal platforms on which to create these test results. If this were my only task this week, I would have setup a dedicated Linux machine and created identical tests for each setup described below. Because I do not have this luxury, some of the results may be due solely to where and how I ran them. So please take these numbers with a huge grain of salt, since you may see very different results in your own environment.

First, I wanted to test Apache serving static files, to get a feel for what Lisp is up against. On my MacBook Pro, running OS X 10.5.1 and Apache 2.2.6, Apache is able to serve a simple HTML file at roughly 4000 requests per sec. This was determined by repeatedly running ApacheBench, using 1000 accesses and 10 concurrent threads.

Next I tried Hunchentoot, serving the exact same file. I’m using SBCL 1.0.12.6 on Intel OS X (32-bit), with threading enabled. Note that threading is not officially supported on this platform, and was something I had to explicitly enable during the build process. What you get from MacPorts will not have threading enabled. It could be this that led to some of the results I saw.

Anyway, Hunchentoot in this scenario serves between 250-400 pages per second. Honestly, I’m not sure if that’s good or bad for a pure-Common Lisp implementation, but I did find myself a bit disheartened. Also, Hunchentoot on OS X does not handle worker thread exhaustion very well. I’m certain that part of this is due to the preliminary thread support on OS X, but then again, the same tests has troubles on Linux as well (see below). Using 1 concurrent thread in ApacheBench, Hunchentoot does fine. At 5 threads it does fine most of the time. At 10 threads, Hunchentoot has a tendency to drop requests, causing ApacheBench to stop working altogether. I had to try several times in a row before I could get all 1000 page loads to complete.

Next I tried Portable AllegroServe with the same SBCL version. I used paserve 1.2.47-vbz-0.1.4, a version custom-fitted for use with SBCL. The numbers here were a bit more promising: on average, 1000 pages/sec. The real advantage, though, is that listener thread exhaustion only causes AllegroServe to wait until another thread is available. I was able to run ApacheBench with 1000 concurrent connections and AllegroServe didn’t bat an eye.

Since I don’t have mod_lisp running on my OS X box, the next tests all used Linux virtual machines running under VMware Fusion. I used 64-bit CentOS 5 (x86-64) for all tests.

The Apache baseline is much slower in this case, as expected. Now Apache serves direct files at ~1200 pages/sec. Hunchentoot running behind mod_lisp serves pages at around 190 pages/sec. The nice thing, though, is that threading problems are non-existent. What’s strange, however, is that if I increase the number of concurrent access, performance actually goes up! At 100 concurrent accesses (as opposed to 10), Hunchentoot behind mod_lisp is able to serve pages at around 330 pages/sec. I have no idea why this would be the case. Also, I’m not doing static file tests this time, but using the dynamically generated default page for Hunchentoot.

If I access Hunchentoot in this setting directly, it doesn’t break off connections the way it does on OS X, but it does stall them. As a result, with 10 concurrent accesses, it serves up data at around 30-50 pages/sec. So using mod_lisp is definitely a huge win. Also, at 100 concurrent accesses, I was able to send the SBCL process into a 100% CPU burn, from which it took over a minute to recover — even after aborting ApacheBench. At 5 concurrent access, there was less contention and it served data at around 200 pages/sec.

For the last scenario, I switched to another Lisp implementation altogether, this time to Armed Bear Common Lisp. I figured I would try running a Java servlet under Tomcat, thereby leveraging the stability and great threading support I’ve come to expect from the Java VM. Also, I can easily integrate with other Java code we have in house, which is sure to become a requirement if the project succeeds. So I created another 64-bit CentOS 5 virtual machine and loaded Tomcat5 using JPackage. I also built my own tomcat-native package, so I could have Tomcat running as fast as possible. This setup used Sun’s JDK5 JVM, Tomcat 5.5, and Armed Bear version 0.0.10.

Rendering a dynamic Lisp page, the Tomcat solution sees speeds on average of 330 pages/sec, in the same scenario where Hunchentoot behind mod_lisp offers 30-50 pages/sec (due to thread contention). Also, concurrency with the Tomcat servlet is no problem at all; with 100 concurrent accesses, speed is simply not affected. Even at 1000 concurrent accesses, nothing really changes. This could be due to smart caching on Tomcat’s part, or just to great thread handling.

I did not try AllegroServe on Linux, because I think running on the Java VM will be smart for other reasons not relating to Common Lisp. I was a bit disheartened by Hunchentoot’s performance and threading behavior, but at the same was quite pleasantly surprised by Armed Bear. I do hope the author of ABCL continues to put effort into his project; and I’m crossing my fingers that basing code on a 0.0.10 release won’t end up being a huge mistake. If it does become a problem, though, I should be able to switch over to Groovy pretty quickly, without changing any of the framework at all. It’s just that using Common Lisp would make this task such a joy.

Syndicated 2007-12-01 20:49:48 from johnw@newartisans.com

CL HyperSpec Info pages in Emacs

I just discovered the following blog article by Bill Clementson, from way back in 2003. Luckily, the links still worked, so I was able to get Info pages today for the Common Lisp HyperSpec courtesy of the GCL project.

Once installed, I found I could not easily lookup documentation for, say, mapcar, because it’s actually on the page for mapc. But SLIME’s hyperspec.el contained the indexing info I needed to write a new module which fires up the Info system on the correct section for the symbol you want defined.

This new module is called cl-info.el and is available from my Lisp repository. It rebinds the standard Emacs key for function help (C-h f) to lookup help in the HyperSpec instead, if you’re in a lisp-mode buffer.

Syndicated 2007-11-16 03:03:48 from johnw@newartisans.com

Script of the week: redirect

This week’s script of the week is so simple, it doesn’t really deserve to be called a script. But since it’s highly useful and comes as a surprise to many people that it can be done so easily, here it is.

The purpose of this script is to create momentary TCP routes. TCP routing is also called Layer 4 routing. That is, one machine momentarily serves as a transparent gateway between two TCP ports on two other machines. The advantages to layer 4 routing are:

  1. It’s “port to port” (you aren’t opening up subnets to each other, or even whole machines).
  2. It doesn’t require complicated routing tables entries, or IP forwarding, or NAT.
  3. It can be done entirely in userspace. No strange kernel drivers required!

Here’s an example: Let’s say you use a web server sitting on a private network which you access over VPN. You can see the server just fine by typing it’s address in your web browser. One day, however, you find a bug on the server, but it only happen on that server like for your friend — who knows about such servers — to see what’s happening, but you obviously can’t grant him access to your secured network.

What would be really cool is if your friend could connect to your machine instead, and have your machine transparently proxy the connection into the VPN and over to that web server. It would also proxy responses back, so that from your friend’s point of view: your machine becomes the web server for as long as you keep the link up.

Here’s the command to do this, assuming I expose port 8080 on my machine for my friend to connect to, and I’m linking him to port 80 on the VPN’s web server:

$ tcpserver <MY-PUBLIC-IP> 8080 nc <VPN-WEB-SERVER-IP> 80

Did I mention that this doesn’t even require root priveleges to work?

Syndicated 2007-11-12 08:59:25 from johnw@newartisans.com

24 older entries...

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!