5 Sep 2003 jdybnis   » (Apprentice)

I just lost a page-long entry because the Post took so long that my browser timed out. And dammit I'm not going to rewrite it!

Update: Lost entry restored! Thanks Nymia!

This commentary on fundamental OS research is pretty amusing. The author motivates his discussion with some silly statistics like: the time to read the entire hard disk has gone from <1 minute in 1990 to about an hour in 2003. Going from there to demanding more CS research, is like demanding better transportation technology because it took your grandfather 10 minutes to walk to school but you had to sit through a 40 minute bus ride.

Then he goes on to list of areas that need more research. Leave it to a kernel hacker to think a page replacement algorithm is a fundamental area of research. Let me tell you, operating systems is one of the least fundamental areas of computer science research, and the making-your-computer-faster (because the ratio of memory to cpu has changed once again) side of os research is some of the most short-lived of that.

This piece did make me think of something I wrote once, while taking an intro class on Operating Systems. Here is my solution to the swapping problem. It could be titled We Don't Need Another Page Replacement Algorithm.

Disk i/o is such an expensive operation these days that it can render interactive applications unusable, and for batch processes i/o can be the sole determining factor of throughput. This implies that we want to avoid disk i/o as much as possible. And when disk i/o is absolutely necessary, we want to give applications complete control over how it happens, so that they can be tuned to minimize it.

I propose that it would be better to enforce hard limits on the physical memory usage of each process, rather than the current abstraction in which each process thinks it has the entire virtual address space. This would work as so. When a process requests memory from the system, it is always granted physical memory. If the process has surpassed its hard limit, the memory request fails and the process has three options: it can cease to function, it can make do without the additional memory, or it can explicitly request that some of its pages be swapped out in exchange for the new memory. If then the process tries to access data that has been swapped out of its physical memory, it again will be given the options of exiting, or swapping out some other data to make room.

The benefit of this would be that each process is guaranteed to always be resident in memory. With the current abundance of RAM it is reasonable to assume that ALL the processes running on a machine can fit in memory at once. The exception, which I will address later, is when an unusually large number of processes are running at once. The downside of this system is the increased work for the application programmer. But I argue that this complexity is essential to the applications, and will be gladly embraced by the programmers.

In cases where an application's working set can be larger than the available physical memory, the performance of the application will depend primarily on the careful management of disk i/o. Many of the applications that face this problem, such as large databases and high resolution image/video manipulation, already subvert the operating system's normal memory management services.

I have been intentionally vague on how the system decides on which of a process's pages get swapped out as it requests more memory than it has been allotted. There is a trade off between simplicity and degree of control for the application programmer. One option is to use a traditional page replacement algorithm (LRU, MRU, etc.), but on a per-process basis. This can either be compleatly transparent to the application, or the application can select which page-replacement algorithm to use, or even provide its own. The next level of programmer control comes from allowing the process to allocate memory in pools. The memory in each pool is grouped together on the same pages. Then the process can select which data gets swapped out by selecting one of the pools. The two approaches can be used together, the application can specify a different page replacement algorithm for each pool.

In the case where the system is faced with too many processes to keep in memory, and any other time the working set is greater than physical memory, most current systems fail spectacularly. Not only does the Nth process cease to function, but all processes grind to a halt when the system starts swapping. I have seen this behavior on systems ranging from desktop machines to high availability servers. Usually the solution to this problem is for a user to intercede and manually kill off the "least essential" processes, or the "pig". Certainly it would be better if the system avoids going into such a state in the first place. The system I've proposed would refuse to start a process in the first place if it does not have the physical memory available to support it.

Latest blog entries     Older blog entries

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!