5 Sep 2003 (updated 7 Sep 2003 at 01:47 UTC) »

I just lost a page-long entry because the Post took so long that my browser timed out. And dammit I'm not going to rewrite it!

Update: Lost entry restored! Thanks Nymia!

This commentary on fundamental OS research is pretty amusing. The author motivates his discussion with some silly statistics like: the time to read the entire hard disk has gone from <1 minute in 1990 to about an hour in 2003. Going from there to demanding more CS research, is like demanding better transportation technology because it took your grandfather 10 minutes to walk to school but you had to sit through a 40 minute bus ride.

Then he goes on to list of areas that need more research. Leave it to a kernel hacker to think a page replacement algorithm is a fundamental area of research. Let me tell you, operating systems is one of the least fundamental areas of computer science research, and the making-your-computer-faster (because the ratio of memory to cpu has changed once again) side of os research is some of the most short-lived of that.

This piece did make me think of something I wrote once, while taking an intro class on Operating Systems. Here is my solution to the swapping problem. It could be titled We Don't Need Another Page Replacement Algorithm.

Disk i/o is such an expensive operation these days that it can render interactive applications unusable, and for batch processes i/o can be the sole determining factor of throughput. This implies that we want to avoid disk i/o as much as possible. And when disk i/o is absolutely necessary, we want to give applications complete control over how it happens, so that they can be tuned to minimize it.

I propose that it would be better to enforce hard limits on the physical memory usage of each process, rather than the current abstraction in which each process thinks it has the entire virtual address space. This would work as so. When a process requests memory from the system, it is always granted physical memory. If the process has surpassed its hard limit, the memory request fails and the process has three options: it can cease to function, it can make do without the additional memory, or it can explicitly request that some of its pages be swapped out in exchange for the new memory. If then the process tries to access data that has been swapped out of its physical memory, it again will be given the options of exiting, or swapping out some other data to make room.

The benefit of this would be that each process is guaranteed to always be resident in memory. With the current abundance of RAM it is reasonable to assume that ALL the processes running on a machine can fit in memory at once. The exception, which I will address later, is when an unusually large number of processes are running at once. The downside of this system is the increased work for the application programmer. But I argue that this complexity is essential to the applications, and will be gladly embraced by the programmers.

In cases where an application's working set can be larger than the available physical memory, the performance of the application will depend primarily on the careful management of disk i/o. Many of the applications that face this problem, such as large databases and high resolution image/video manipulation, already subvert the operating system's normal memory management services.

I have been intentionally vague on how the system decides on which of a process's pages get swapped out as it requests more memory than it has been allotted. There is a trade off between simplicity and degree of control for the application programmer. One option is to use a traditional page replacement algorithm (LRU, MRU, etc.), but on a per-process basis. This can either be compleatly transparent to the application, or the application can select which page-replacement algorithm to use, or even provide its own. The next level of programmer control comes from allowing the process to allocate memory in pools. The memory in each pool is grouped together on the same pages. Then the process can select which data gets swapped out by selecting one of the pools. The two approaches can be used together, the application can specify a different page replacement algorithm for each pool.

In the case where the system is faced with too many processes to keep in memory, and any other time the working set is greater than physical memory, most current systems fail spectacularly. Not only does the Nth process cease to function, but all processes grind to a halt when the system starts swapping. I have seen this behavior on systems ranging from desktop machines to high availability servers. Usually the solution to this problem is for a user to intercede and manually kill off the "least essential" processes, or the "pig". Certainly it would be better if the system avoids going into such a state in the first place. The system I've proposed would refuse to start a process in the first place if it does not have the physical memory available to support it.

8 Aug 2003 (updated 9 Aug 2003 at 03:12 UTC) »
MichaelCrawford: When people with outdated browsers visit your site, I think you would be better served by linking to an explanation of how to upgrade, rather than a diatribe about standards compliance. I would guess that people running Netscape 4.7, IE 5, or the like, fall into one of two categories. The first is people who don't understand the process of upgrading their browser. Probably the most effective approach to getting them to do it is explaining how, emphasizing that many sites will look better afterwords. The other category of people with old browsers are those who don't have direct control of what software is running on their machine. You can instruct them to what they might say, to request an upgrade from whoever maintains the computer they are using.
26 Jun 2003 (updated 27 Jun 2003 at 14:46 UTC) »
Perl 6 Design Philosophy

Scroll down to the The Principle of Distinction in the Perl 6 Design Philosophy for a lucid discussion of how to name api functions, the topic of my last entry.

24 Jun 2003 (updated 5 Sep 2003 at 06:11 UTC) »
EWD 1044: To hell with "meaningful identifiers"!

lindsey: Dispite the title of the article, Dijkstra isn't arguing against using identifiers that have meaning for the reader e.g. his negative example "disposable." In the very same article he co-opts the term "plural" to mean integer greater than or equal to 2, because of its analagous common meaning. The difference between the two examples, Dijkstra states, is that the first term is used without giving it a precise definition, relying on the reader to make assumptions about what it means. While the latter term is precisely defined when it is used.

Similar things should look different

On the topic of chosing names for api functions that do almost the same thing as each other, the rule of thumb on this is the more similar two things are then the more different their names should be. This is counter-intuitive. Shouldn't the similarity of the names reflect the similarity of their meanings? The answer is no. If both the names are similar and the meanings are similar it is very hard to remember which name goes with which meaning. I learned this from Larry Wall, and I assume he learned it through the hard experience of mistakes in perl's past (chomp, chop).

Similar things should look the same

Sigh. Life is never simple.

7 Feb 2003 (updated 24 Jun 2003 at 16:04 UTC) »
Stupidest Misuse of the C Standard Library

char *name;
name[strlen(name)] = '\0';

Found in a codebase that will remain nameless (variable names have been changed to protect the innocent).

3 Feb 2003 (updated 24 Jun 2003 at 18:35 UTC) »
Attempt to contribute to glibc

I wrote a patch to give glibc support for profiling multiple shared libraries at once. Right now if you set the environment variable LD_PROFILE to the name of a shared library, glibc will generate gprof style profiling information for that library without you recompiling anything. But that only works with one library at a time. I sent a patch to fix this limitation to the glibc maintainer before the New Year and I've heard nothing back from him yet. Bummer... I don't know if there was something wrong with it, or if it's just not something he's interested in.

I guess I can put the patch up on a web page and let people find it through google. But it seems almost pointless, as it's only a matter of time before the offical codebase moves on and the patch gets stale.


kwoo:If your only problems with Scheme are that the library is too small, and the function names are too long, then getting where you want to go could be as easy as writing a few macro definitions, and some glue code to hook up the missing libraries you want.

6 Nov 2002 (updated 31 Dec 2002 at 21:12 UTC) »
The Development of Weblogs

The way the weblog space (blogspace) is developing reminds me of the web back in 1994. In the very beginning the web was composed of weakly connected islands of pages. The way you found new pages was through external links, like posts on usenet, or addresses published in print mags like WIRED. Likewise when people first started keeping weblogs, you found them through web pages that were not weblogs themselves. There was not any weblog space to speak of yet.

The web quickly formed its own internal entry points, this was the "Cool Site of the Day" model. Weblogs paralleled this, but in a more distributed manner when they started linking to each other. The way you found new weblogs was when the ones that you already read linked to others that they found interesting. This creating a "browsing" user experience.

The next step the web took was with directory sites like Yahoo. Although directory sites originally intended to provide direct links to information you wanted, they ended up augmenting the browsing proccess by making it faster and more efficient, instead of replacing browsing completely. The weblog space parallels this with backlinks, blogrolling, and other technology enabled by RSS. This is the present state of weblogs.

15 Sep 2002 (updated 21 Oct 2003 at 06:33 UTC) »
In the eating

graydon speaks as if a test is a boring type of proof. I disagree that a test is a proof, and here is why. He formalizes the notion that testing some case, is equivalent to generating a proof for that case:

any test can be translated into a proof in a silly logic easily: the proof is simply the trace of your processor executing your program's code on your test's input, and the logic is one in which each machine transition that happened is an axiom. but that proof is boring

But remember that a test can fail to terminate. We are not converting the test to a proof, but the execution trace of the test. And that is an impossible task for tests that do not terminate.

Although it could seem like this is a silly technicality, it is big enough that you should not consider a test itself any kind of proof, unless it comes with a proof of termination. To me what distiguishes a proof from anything else, is that there is a totally routine way of checking if it is valid or not. A process that might fail to terminate, does not qualify as totally routine.

9 Sep 2002 (updated 31 Jul 2003 at 00:39 UTC) »
Testing Software Can Make it Easier to Prove Correctness

raph mentions Dijkstra's quote that testing can only show the presence of bugs, never their absence

Implied in this is that if you develop a formal proof that a program is correct, then the testing becomes superfluous. But in fact, some types of testing can reduce the burdon of developing the proof.

Instead of proving that a program is always correct, you prove a weaker condition. That is: if the program is correct in one case then it is correct in all cases. Then you write a test to establish that the one case you did not prove actually works.

One specialized version of this technique is, using a proof to establish the induction step of a proof by induction, but then writing a test in order to establish the base case.

