When running Postgres in EXEC_BACKEND mode on Linux, make sure to do:
echo 0 > /proc/sys/kernel/randomize_va_space
Preferably, before spending a while wondering about the non-deterministic process startup errors that will otherwise occur.
When running Postgres in EXEC_BACKEND mode on Linux, make sure to do:
echo 0 > /proc/sys/kernel/randomize_va_space
Preferably, before spending a while wondering about the non-deterministic process startup errors that will otherwise occur.
I gave a revised version of the "Introduction to Hacking PostgreSQL" tutorial at PgCon earlier today. I've posted the slides, handouts, and example patch here; this version uses a completely new example patch, and much of the introductory material has been revised. You can also find the slides at the PgCon page for the talk, which also includes a link to give me feedback.
I've been busy at school recently, finishing my undergraduate thesis and taking my final exams. I've got one more exam on Thursday, but once that's finished, I should be pretty much finished my undergraduate degree (fingers crossed).
In a complexity theory class I'm taking, I recently gave a talk on Kolmogorov complexity, introducing the basic ideas and discussing some notable applications. I'm no expert at AIT, so take the contents with a grain of salt.
Work
I've decided to take a full-time position at Amalgamated Insight, working on their PostgreSQL-based data stream management system (DSMS). I interned at AI last summer and I was very impressed, so I'm excited at the chance to work for them again. This also means I'll be moving to the Bay Area in early June.
Speaking of stream processing, I'll be giving a talk about data stream query processing at PgCon 2007. Gavin and I are also doing an introduction to modifying the PostgreSQL source code, similar in spirit to the "Introduction to Hacking PostgreSQL" tutorial we gave at the Anniversary Summit. The details are still a little hazy, but the goal is to make the session rewarding to both newcomers and to those who attended the previous tutorial.
The Kolmogorov complexity of a string x is the length of the description of the minimal computer program that outputs x, relative to a given model of computation. Hence, complexity describes the amount of redundant information, or structure, in the string: as the string's complexity approaches its length, the string becomes increasingly random. (A string's complexity can be no greater than an additive constant of its length, since we can always emit the string by embedding it in a Turing machine that just prints out the string).
I'd like to write more about Kolmogorov complexity later, but for now I'll just point to an interesting application of this idea to studying biological life. Zhanna Reznikova has applied information theory to studying the communication between animals, in particular ants. This paper describes an interesting result: an experiment was setup in which a "scout" ant communicates the location of some food to another group of "forager" ants. The path from the ants to the food is a binary maze, so the path can be represented as a series of left or right turns (e.g. "RRR", "RLR", etc.). Reznikova measured the time it took for the scout to transmit the location of the food to the other ants, and then varied the path, to see how the time of transmission varied with the complexity of the route. As it turns out (p. 9, table 2), the ants were able to more efficiently communicate paths that could be encoded as strings with low complexity. For example, the path "RRRRRR" required a mean duration of 88 seconds to transmit, whereas "RLRLRL" took 135 seconds.
Simon Peyton-Jones and Simon Marlow's work on software transactional memory offers a very interesting alternative to traditional lock-based or lock-free approaches to implementing concurrent programs. Their paper "Composable Memory Transactions" is a fascinating read: it isn't often that you see research that is simultaneously so practical and so beautiful.
I had read, and been impressed by, the STM paper a year or so ago. So I was happy to notice recently that:
"Hello, World Considered Harmful" Considered
Harmful
Ralph Westfall's "Hello,
World Considered Harmful" (CACM, 2001) begins by making
a sensible point: the classic K&R example program
"Hello, world" is not a very useful example for an
introductory object-oriented programming course in Java:
public class HelloWorld { public static void main(String[] args) { System.out.println("hello, world"); } }
Fair enough: "Hello, world" is fundamentally procedural, and
doesn't teach students anything about how object-oriented
programming (it might also teach them that Java is a
bondage-and-discipline language that forces them to endure
plenty of unnecessarily verbose syntax, but they are going
to learn that eventually regardless).
Unfortunately, Westfall's solution to the problem falls
short of the mark:
class HelloWorld { public static void printHello() { System.out.println("hello, world"); } }
public class UseHello { public static void main(String[] args) { HelloWorld myHello = new HelloWorld(); myHello.printHello(); } }
Now, we can raise a few minor objections: for
example, it is
pointless to create an instance of a class that contains
only static methods, and it is bad style to invoke a static
method on an instance, IMHO.
But the
essential point is that Westfall's solution merely applies
some irrelevant syntactic sugar. The reason that "hello,
world" isn't a good example of object-oriented programming
is because it isn't object-oriented in the first place. The
intuitive appeal of object-oriented programming stems from
the fact that it is similar to objects in the "real world",
which combine data and behavior. An instance of the
HelloWorld object has neither data nor behavior. In
fact, the modified example serves mostly to showcase an
annoying property of Java: the dependence on static methods
as a way of emulating global functions.
So why
not apply a little creativity, and use a fresh
example that actually illustrates the intuitive appeal
of objects? Model
a few real-world objects as classes, and give them some
behavior: for example, simple geometric shapes like Triangle
and Circle, which can then eventually be developed into a
lesson on inheritance.
(Another improvement would
be to use a better teaching language than Java, but that's
another story.)
pgmemcache 1.2 beta is out. This release makes some notable changes to the pgmemcache API, including:
The release notes have more information. Note that I haven't had a chance yet to update all the documentation for these changes (e.g. seanc's slides on pgmemcache are becomingly increasingly out of date).
Also, there's a bug in libmemcache that effects the new SRF memcache_server_list: connected servers were not properly marked as such by libmemcache. I've fixed this in the latest libmemcache SVN, but I haven't had a chance to roll 1.4.0rc3 yet.
Download: pgmemcache_1.2beta1.tar.gz, Release Notes
I released pgmemcache 1.1 final today. It contains a few minor fixes and additions relative to the recent 1.1 beta release.
Download: pgmemcache_1.1.tar.gz
Release
Notes.
"Randomness and Proof" is a nice, introductory-level article on algorithmic information theory. It's by G. J. Chaitin, who has personally done a lot of the foundational work in this area. A random, interesting snippet:
Solomonoff represented a scientist's observations as a series of binary digits. The scientist seeks to explain these observations through a theory, which can be regarded as an algorithm capable of generating the series and extending it, that is, predicting future observations. For any given series of observations there are always several competing theories, and the scientist must choose among them. The model demands that the smallest algorithm, the one consisting of the fewest bits, be selected. Stated another way, this rule is the familiar formulation of Occam's razor: Given differing theories of apparently equal merit, the simplest is to be preferred.Thus in the Solomonoff model a theory that enables one to understand a series of observations is seen as a small computer program that reproduces the observations and makes predictions about possible future observations. The smaller the program, the more comprehensive the theory and the greater the degree of understanding. Observations that are random cannot be reproduced by a small program and therefore cannot be explained by a theory. In addition the future behavior of a random system cannot be predicted. For random data the most compact way for the scientist to communicate his observations is for him to publish them in their entirety.
I'm writing a paper on Kolmogorov complexity for a class in complexity theory I'm taking this semester -- if I get a chance, I'll try to blog about some of the fascinating ideas in this field.
/proc/sys/vm/drop_caches could be useful — to clean the kernel's buffer cache between runs of a benchmark, for example. Available in recent Linux kernels (>= 2.6.16).
New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.
Keep up with the latest Advogato features by reading the Advogato status blog.
If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!