Older blog entries for lindsey (starting at number 20)

16 Jul 2002 (updated 16 Jul 2002 at 14:58 UTC) »
MPEG Video Decoder project

I've recently started an MPEG-2 video decoder project; our goal is to make analysis and processing of the video stream easy. Thus, we're using a modern approach to the interpretor, by having various parser/decoder classes which publish decoded objects as they appear on the bitstream.

For example, an external object could subscribe to the Macroblock events which occur in the bitstream, or it could subscribe to the Picture event to get whole, decoded pictures from the stream.

We're intentionally not spending a lot of effort on efficiency; in fact, we chose to write a new decoder specifically for analysis purposes because the other software decoders tend to be obscure and optimized for speed -- frame-rate processing on 1996-era computers, e.g..

Contact me if you're interested in participating. The project (me!) is funded currently through UNC-CH Computer Science. I'm going to have to find something better than CVS to manage my code, though.

Computer Scientists' Identity Crisis

We don't really fit well into any of the common, established categories, you know. And it bothers me.

Part of what we do is definitely craft -- we make tools out of primitive components; we hone and refine until we have reliable instruments which fit nicely into the minds of the users. We're particularly good at it when we also use the tools, and can un-selfconsciously improve them to suit us better. So we're craftsmen (and craftschicks).

And we also engineer systems -- we compare large components, their interactions, and their suitability for long-term use. We analyze how they (virtual machines, compilers, operating systems, compilers, utilities, applications, protocols, algorithms) will be used, and how they will withstand the pressures on them. We balance the various pressures of simplicity, useability, flexibility, performance, cost, reliability. We apply scientific methods to objectively discern which of the best known techniques should be applied; and we consider failures and weaknesses of our systems to optimize our future designs. We're all engineers, at heart.

But we also do science -- we reason about our methods for encapsulating information and procedures; we seek to discover how to cause our Computing Machinery to do things which machines have never done. We discover better ways of thinking about automated information processing, and we test our theories by making software machinery which exploits our ideas. (For example, programming languages are fun because with each new language, we can experiment with new ways of modeling information and procedural structures). Some of us even do scientific, empirical studies of systems and algorithms to compare algorithms and protocols. So, while we're not doing it in the quite the same way that Chemists and Physicists do it, we discover knowledge just like any Scientist.

Yet many of us still strive for elegance, and for completeness. We seek theories which model our current machines(/algorithms), and which model machines which could be. We prove properties of these machines -- especially what the machines cannot do. To a large extent, we don't really care how useful a model is -- because it can be beautiful and fascinating devoid of application. The acorn doesn't fall far from the tree, and the Computer Scientist is not far from the Mathematician.

I wonder how I'd look with a long beard, a labcoat, toolbelt, and a pocket protector.

XP and Patterns

Pattern-oriented design provides highly-flexible, well-structured, cleanly-decoupled designs, but the approach can lead to overengineering -- i.e., providing flexibility that's really not necessary quite yet. XP encourages solving today's problem and refactoring code when it gets crufty.

Initially these ideas seem to be in opposition, but according to one of my professors, they're really not. An XP developer who appreciates the value of design patterns wouldn't make the mistake of trying to anticipate the whole design of a large system -- he could work on a small part of it until it smelled right, applying patterns as they seem appropriate. Then when tomorrow's problems appear, he adds the new functionality, then refactors it until it smells right, possibly adding other patterns that fit the circumstance better.

So why think about patterns at all? Each design pattern fits nicely in your head, possibly together with other patterns. Patterns provide building blocks of considerable scale, as opposed to the most basic object-oriented constructs of classes, fields, interfaces, composition, etc.

Depriving the Craftsman

As a long-time programmer and `Computer Scientist', I think of myself as a craftsman; as Fred Brooks puts it, I'm just a toolsmith. I comprehend a problem, I understand my tools, and I craft something which helps to solve that problem.

But the problem with this craft is that I'm never allowed to do anything that I'm good at. An artisan becomes great by great talent and much thoughtful repetition -- but we programmers are not normally allowed to repeat anything! I'm always pushing the boundaries of my current skills, trying to re-use as many existing components as are available. Unfortunately, doing so is quite unsatisfying. I never get to do anything significant over and over, re-thinking my approach until I finally get something great.

28 May 2002 (updated 29 May 2002 at 00:28 UTC) »
On Academic Computer Science

Whilst I was back home a couple of weeks ago, cmiller asked whether I was as dissatisfied with Computer Science Academia as my advogato entries seem to indicate.

I've just completed my first year of graduate school in Computer Science at UNC Chapel Hill. It's been a lot of fun, but I was alarmed to realize a few facts:

  • CS grad students aren't like top CS undergrads. I was surprised to learn that many of the grad students I've met didn't study CS as undergraduates; that many of them have only been using computers for a few years; and that many of them don't even particularly like programming. My theory to explain the difference between CS undergrads and CS grad students is this: most CS undergrads who are genuinely good at making computers do things find gainful and satisfying employment.

  • ``Computer Science is embarrassed by the Computer'' There is a strong current through CS that is at once irritated by the difficulties of real-world implementations, and seeks to trivialize the value of making computers do things. In the worst cases, PhD Computer Scientists can be little more than mathematicians, ready and willing to burn years on thought experiments and research based on apparently- arbitrary assumptions. To these theoreticians, it's more important to fully explore every point along every dimension of an old problem than it is to find a new problem which might have a helpful solution. (Some other Computer Scientists claim that this isn't really CS.) Theoreticians seem to be afraid of tackling the details of real-world systems.

    I've noticed that if a professor carries a laptop, he's likely to be more realistic and do more-interesting things than otherwise.

  • Computer Science trivializes important problems. For example, Computer Security has not been considered a significant issue within CS. Nor has large-scale system configuration. These problems, CS people have said to me, are just issues of proper software engineering practice. But both of these are leading causes of operational failure in existing systems.

    Lesson: Some fields which deserve concerted CS research have not been embraced by the CS community. My theory is that CS has really solved a problem when it's not causing a problem any more for real systems and their real users. That is, it's not enough for CS types to publish some abstract descriptions of how the world should be.

So, then, why did I still enjoy a year of it, and intend to continue on further?

  • I get to have great stuff explained to me. In my first year, I've had professors carefully explain to me how some of the best-designed systems in all of CS work: the domain name system (DNS); the network file system (NFS), the Andrew File System (AFS), HTTP, TCP Congestion Control, IP, TCP, and UDP protocol operation, DiffServ, IntServ, Class-Based Queueing (CBQ), online Real-Time scheduling algorithms, MPEG audio (mp3) and video encoding, JPEG video, RTP, NTSC video encoding -- and others.

    Of course, I could have read about these myself -- and I already knew about some of them -- but there's something great about having someone carefully explain a system or an algorithm with the aid of slides, and to know that they're explaining something that's actually useful because I use them.

  • CS course provide motivation to do good things. Even CS practitioners like me can fall for the trap of believing that something is easy because we can conceptually imagine an approach to it. These are all of the programs that I imagine that I know how to write, but I don't actually go to the effort to write them. CS professors actually make me wrote some of those programs, and wrestle with the details inherent.

    I learned from undergrad CS programming assignments, but rarely very much. They weren't very hard; most of them took four hours or less, even in the senior-level classes. By constrast, graduate level programming assignments can be rough -- some take weeks of daily effort. But I implemented some neat stuff -- many parts of an operating system including distributed IPC, scheduling, and process management; a framework for distributed, fault-tolerant applications; two real-time scheduling simulators; a web server (in C); an RPC-accessible cache server; and a model checker for software validation (in ML).

  • I'm learning about scientific evaluation. CS is about identifying problems, developing solutions, and showing that the solutions actually solve the problem. This later part helps to make CS a science, but it's not something that we do naturally.

    It's about claims: if I claim that this program does something, I need to be prepared to prove it. And not just prove it to a willing accomplice. I need to be able to show that a system necessarily has certain properties. This is done with analytical proof (by developing theoretical models about which properties can be logically deduced); with simulations, and with actual experiments.

    The quality of the simulation or of the experiment matters, too -- even a skeptic should not be able to argue that my results are invalid. And the quality of your evaluation matters, too: I'm learning to be skeptical, and to require that you substantiate your claims with quality evaluation.

    Having good ideas is just the beginning.

Polished courseness

The MacOS X user interface, `Aqua', is very nicely polished; rendering seems fast and beautiful. But underneath the glitz, it leaves some things to be desired.

Examples related to debugging the system: man pages are not provided for quite everything, and the man pages that are provided are not entirely consistent with the system as sold.

No strace or truss utility is provided.

When Mac programs started with the "open" utility (i.e., gui programs) die, they don't dump core, they don't seem to log anything to the syslog facility, and there's no accessible stderr.

For example: the "Help Viewer" program seems to die any time it's opened from within another program, but I don't know of any way to figure out why it's dying. If it were an X program, I look at its stdout, or strace it to see where it's croaking.

Probably the root cause for this is that it's all relatively new to the Apple folks, and so they're having to work out the bugs. But I'm a tad bit further irritated that Apple doesn't make it easy to contact them in order to suggest that something might be a bug. I may be the only person who's tickled some of the bugs I've run into -- will they ever be fixed? How will Apple ever know that these bugs exist?

A beautiful UI is nice, but I'm not sure that it's worth the hassle of using a poorly-evolved system.

A Theory of Research

Most of the proofs that I've seen lately really aren't very startling, once explained. It just appears that a scientist or mathematician was thinking carefully about something, and derived something, then wrote a proof convince the rest of us.

Perhaps this sort of research is what happens when you focus your mind on one specific little topic, allocating all of your registers to that one topic. It's relatively rare, then, because we're rarely willing to focus on one tiny little topic in all of its excruciating detail.

26 Apr 2002 (updated 10 Oct 2002 at 17:00 UTC) »
Theoretical Computer Science: useful, for something

I've commented previously about my skepticism with theoretical Computer Science. My first personal, real encounter with it came in the form of a Real-Time Systems Course which has a heavy theoretical component.

My basic issue with it boils down to this: CS Theory predicts that some things will be possible when they're not possible on real machinery. It works in a world of algorithms and structures which have no real manifestation.

For example, Preemptive Earliest-Deadline-First scheduling can produce a schedule for any real-time task set that can be scheduled. But, in reality, preemptive EDF may have oodles of context switching which means that some task sets declared "feasible" cannot actually run on any real machines. I like to make things run, and this thing seems to be lying to me. See 2002 October 10 ammendment to this comment.

However, I've come to realize one big benefit from theoretical analysis: even though some things are possible in theory that are not possible in practice, nothing that's impossible in theory is going to be possible in practice. I.e., theoretical analysis is useful for telling us what we cannot do.

MacOS X -- maybe a little buggy

Averaged over the last two weeks, my MacOS X machine has hiccuped once every three days. Network activity stops and it won't start some programs. That's an annoyingly-high failure rate, and because there's no readily-available working shell (like a text console outside of the windowing environment) I haven't been able to analyze the running system to see what's happening.

While it's in this weird state, some programs won't start even in a running shell window. ping doesn't run; netstat does. 'ps' with no args does, 'ps auxw' won't run.

So I can't tell if it's just a kooky GUI thing (forgiveable, since GUIs are always kooky) or if it's something more fundamental (like a broken linker).

Grr

I had to reboot my MacOS X machine because I couldn't get to a real terminal and clean things up.

It somehow went nuts; I couldn't start a shell, and, even in the shell window that I did have, some commands didn't seem to work; e.g., netstat did work, ping didn't work, ps didn't work. I couldn't start anything new from the "Dock", and the "Finder" program (which seems to be somehow related to a window manager) was only partially working -- as if some of its threads had locked up.

Now, in an X11 world, I'd just switch over to a real terminal (Ctrl-Alt-F1, on a suitably-configured FreeBSD or Linux system) kill the offending programs, and restart the UI. But in MacOS X I had to (a) manually "force kill" each of the programs (Ctrl-Option-Esc), then (b) restart.

Fortunately, the force-kill thing worked reliably. It seems to behave like a SIGKILL (`kill -9'). And there was enough of the environment still running to shutdown cleanly, so at least I didn't have to remove the battery.

Apple's mother-may-I approach to user environments runs contrary to my Unix, remount-readonly, kill-and-restart-as-necessary, anything-but-reboot way.

I want a text console!

16 Apr 2002 (updated 16 Apr 2002 at 17:14 UTC) »
Initial Experience with MacOS X

MacOS X does indeed appear to be Unix, although it's not entirely Unixy. For example: conventional MacOS applications live under a directory /Applications -- this is sort-of semantically similar to /opt on some Unixes.

It can boot either to login to a specific user by default (normal single-user system behavior), or to an xdm-esque login manager.

The normal database files exist -- e.g., /etc/passwd, /etc/group -- but, apparently, they're only used by naive programs and while in single-user mode. Programs are supposed to use NetInfo, which appears to be an NIS-like thing.

Man pages are provided for many of the `normal' Unix utilities, but -- unlike other commercial Unixes, like Solaris -- man pages are not provided for all of the proprietary programs. For example, there's a program called "SecurityServer" that runs all the time, but there is no SecurityServer man page. Apparently the Apple guys know about it, and they don't expect us users to tweak SecurityServer's parameters. (IMHO, if Apple wants to be taken seriously as a Unix vendor, they'll need to address this mindset. Of course, shipping MacOS X at all may have been a higher-priority than shipping MacOS X with complete man pages.)

MacOS X doesn't ship with compilers, but they're freely available in the 220MB "Developer Tools" package. The Developer Tools package contains at least a C compiler (but not gcc) and a relatively-recent JDK.

Apparently, the Darwin Folks have lots of stuff available for folks like me. I'll be learning about them.

11 older entries...

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!