Older blog entries for lindsey (starting at number 25)

Scripting != Rapidly-Changing Applications

Several of my respected friends/colleagues view Perl, PHP and the ilk as great -- for "fast" jobs. We've had experiences with these scripting languages that make us tend away from them for "serious" work. We all like scripting, but we don't want to be stuck hacking on the same scripts for several months.

What does "fast" or "serious" mean? Some of the problems we've had with scripting include --

  • Difficulty finding bugs. These languages are flexible -- very flexible. Logical and even syntax bugs can be difficult to find because simple mistakes in assumptions have to be checked by the programmer everywhere.

  • Difficulty reading the code. Perl is like "interpreted line noise", according to jcv one of my friends. In Paul Graham's essay Hackers and Painters, he also notes this phenomenon:
    Many a hacker has written a program only to find on returning to it six months later that he has no idea how it works. I know several people who've sworn off Perl after such experiences.

  • Inconsistent language/library design PHP, for example, seemed to be a mish-mash of useful routines, each with its own special calling conventions and partial documentation.

It's as if we have a love/hate relationship with scripting languages. They're great -- until you have to start maintaining them. "All programming is maintenance programming."

Maintenance becomes a problem with scripting languages because I can't fit the entire system into my head. I don't remember all of the rules and assumptions that are intrinsic to my application. (I'm speaking mostly of the gnarly vertical-market business applications here.) I'm crafting new software, yes; and I need to be able to change it rapidly -- therefore, I need system support to keep me in the sandbox of my application.

But Tim O'Reilly's comments on Why Scripting Languages Matter run at variance to this:

The reason why dynamic languages like Perl, Python, and PHP are so important is key to understanding the paradigm shift. Unlike applications from the previous paradigm, web applications are not released in one to three year cycles. They are updated every day, sometimes every hour. Rather than being finished paintings, they are sketches, continually being redrawn in response to new data.

I'll buy this comment about rapidly-changing applications -- but I don't understand why scripting languages are perceived as being more suitable for this.

Indeed, to make changes rapidly, you need a system helping you find when you've violated assumptions -- this is why agile programming methods are often associated with testing. And this is why we need strong language support for limiting the behavior of a system -- especially for rapidly-changing applications.

I haven't found this kind of support in scripting languages, by and large. But what attracts us to scripting languages in the first place? Maybe it's the fact that we don't have to define a class and stick a method in it just to do a simple job. This suggests that it's the lack of system support for constraint validation that attracts us to scripting.

Coercion by Programming Languages

etrepum noted that "programming languages should do what you want" when I questioned the wisdom C# designers, because they didn't force you to explicitly handle exceptions. In Java terms, all C# exceptions are effectively "unchecked".

In practical terms, this just means the runtime environment (the "CLR") provides an exception handler at the bottom of the stack, as does Java. It also means that the programmer may be unaware of the exceptions that may be thrown by the routine that he's writing. The question for debate is whether that's a bad thing.

It might be a bad thing if the programmer may be able to recover from an exception; e.g., if the DBMS transaction fails due to deadlock, then maybe we should just try it again. Maybe this is what Drayton, Albahari, and Neward mean in C# in a nutshell by saying that "Application exceptions should be treated as nonfatal."

OTOH, as a programming culture, we really haven't made peace with exceptions as they appear in Java and in the .net languages. Are exceptions supposed to be used as a fancy means to control flow during normal execution? Or are they intended to make it easy to detect errors? Recall that in system programming, virtually every important function/system call returns a value that must be checked. Exception handling seems to have been intended to factor all of those checks out to the "catch" clauses, so that the kernel of logic within a routine can be evident.

.net programming
My current situation has afforded me the chance to use Microsoft's .net-brand collection of programming tools. I'm glad to make this report: they've started to learn a lot from us.

For example, the language "C#" is clearly largely modeled after Java. Good for them! I'm glad to see that the recognize good ideas when they see them. The are some notable differences, though.

  • C# doesn't force you to keep track of all the types of exceptions that can be thrown and handle each one. This seems bad.

  • In fact, programmer-defined exceptions are supposed to subclass "ApplicationException"; according to some normally reputable source, such exceptions are supposed to be non-fatal. The reasoning is left to mystery.

  • "delegates" are a new way of identifying a static method; it seems to be a grown-up replacement for function pointers, in that you can be sure that the delegatee adheres to a specific call signature. A single delegate reference can refer to multiple functions; when you call the delegate reference all of the attached functions are called, and in a deterministic order.

  • C# allows crazy stuff like pointer arithmetic.

There are other differences. These come to mind.

My primary desktop platform at work is Windows. I shuddered at first, but with cygwin and a dose of humility, I'm making peace with it. It's too bad, though -- shuddering was great exercise.

Need for rapid web development -- your help needed

I like programming in Java. But I can't find any tools for rapidly developing simple database applications in java -- even if I'm willing to spend money doing it.

Maybe there are some out there -- has anybody seen any? I really just need something that'll help me with form validation, and creating/modifying/deleting rows from a database. I need more than just a one-shot code generator; I'll need to modify the web forms and reports quite often. I.e., code generators are okay -- but I must be able to specify all of my page markup and such in the input language to the code generator.

Foray into Microsoft

Given the need for a rapid web-application development environment, my boss and I have started looking at Microsoft ways of doing it. To be honest, the asp.net stuff seems to come pretty close to doing what we need.

I have three initial impressions about the world of MS-oriented software development:

  • There are lots of fuzzy, feel-good, technical terms for establishing a relationship between two entities; e.g., linking and wiring. The late Edsgar Dijkstra, in his EWD 1044: To hell with "meaningful identifiers"!, discusses the danger of using such fuzzy terms to mean specific things. I buy his argument -- that it can be misleading to use such terms that appear to have an intuitive meaning. However, I don't completely buy his conclusion; carefully chosen identifiers can be helpful without implying too much.

  • There seem to be lots of related ways to do the same thing. I.e., the explanations of precisely what they do seem to be vague. E.g., I'm accustomed to seeing a locale specification in the Unix/Java world; but in asp.net, there's Culture, UICulture, and LCID (Locale ID) -- all which seem to have something to do with changing the language and locale of the page. I'm sure each one of them is independent, and there's probably a good explanation for splitting them up. I just wish I knew that explanation.

  • MS devotees seem to be heavy on marketing. It's as if Microsoft programmers feel the need to validate and be validated in their choice of MS development environments. Everything is splendiforous to them.

Correction to EDF comment

Back in April, I posted a comment about the Earliest-Deadline-First (EDF) scheduling discipline that was not fully accurate. I had commented that the original, overly-simplified explanations of EDF claimed that task sets would be `feasible', even though the explanations did not include any accounting for context switching.

Sanjoy, a Theoretician here in my department, corrected my assertions, saying

It can be shown [...] that an edf schedule on n jobs will have <= 2n-1 context switches, rather than oodles. In analyzing a system, this context-switch overhead is accounted for by "inflating" (in the analysis) the execution requirement parameters of each job by the amount of time taken to perform 2 context switches.
Of course, he's right. He's published several important papers in the Real-Time area and knows it all far better than I do.

In retrospect, I was really commenting on the methodology of the proof more than on EDF per se; the original proofs of EDF's optimality (from Liu and Layland) do not account for context-switching cost. As such, I found the proofs unconvincing -- at least for practicable definitions of feasible.

16 Jul 2002 (updated 16 Jul 2002 at 14:58 UTC) »
MPEG Video Decoder project

I've recently started an MPEG-2 video decoder project; our goal is to make analysis and processing of the video stream easy. Thus, we're using a modern approach to the interpretor, by having various parser/decoder classes which publish decoded objects as they appear on the bitstream.

For example, an external object could subscribe to the Macroblock events which occur in the bitstream, or it could subscribe to the Picture event to get whole, decoded pictures from the stream.

We're intentionally not spending a lot of effort on efficiency; in fact, we chose to write a new decoder specifically for analysis purposes because the other software decoders tend to be obscure and optimized for speed -- frame-rate processing on 1996-era computers, e.g..

Contact me if you're interested in participating. The project (me!) is funded currently through UNC-CH Computer Science. I'm going to have to find something better than CVS to manage my code, though.

Computer Scientists' Identity Crisis

We don't really fit well into any of the common, established categories, you know. And it bothers me.

Part of what we do is definitely craft -- we make tools out of primitive components; we hone and refine until we have reliable instruments which fit nicely into the minds of the users. We're particularly good at it when we also use the tools, and can un-selfconsciously improve them to suit us better. So we're craftsmen (and craftschicks).

And we also engineer systems -- we compare large components, their interactions, and their suitability for long-term use. We analyze how they (virtual machines, compilers, operating systems, compilers, utilities, applications, protocols, algorithms) will be used, and how they will withstand the pressures on them. We balance the various pressures of simplicity, useability, flexibility, performance, cost, reliability. We apply scientific methods to objectively discern which of the best known techniques should be applied; and we consider failures and weaknesses of our systems to optimize our future designs. We're all engineers, at heart.

But we also do science -- we reason about our methods for encapsulating information and procedures; we seek to discover how to cause our Computing Machinery to do things which machines have never done. We discover better ways of thinking about automated information processing, and we test our theories by making software machinery which exploits our ideas. (For example, programming languages are fun because with each new language, we can experiment with new ways of modeling information and procedural structures). Some of us even do scientific, empirical studies of systems and algorithms to compare algorithms and protocols. So, while we're not doing it in the quite the same way that Chemists and Physicists do it, we discover knowledge just like any Scientist.

Yet many of us still strive for elegance, and for completeness. We seek theories which model our current machines(/algorithms), and which model machines which could be. We prove properties of these machines -- especially what the machines cannot do. To a large extent, we don't really care how useful a model is -- because it can be beautiful and fascinating devoid of application. The acorn doesn't fall far from the tree, and the Computer Scientist is not far from the Mathematician.

I wonder how I'd look with a long beard, a labcoat, toolbelt, and a pocket protector.

XP and Patterns

Pattern-oriented design provides highly-flexible, well-structured, cleanly-decoupled designs, but the approach can lead to overengineering -- i.e., providing flexibility that's really not necessary quite yet. XP encourages solving today's problem and refactoring code when it gets crufty.

Initially these ideas seem to be in opposition, but according to one of my professors, they're really not. An XP developer who appreciates the value of design patterns wouldn't make the mistake of trying to anticipate the whole design of a large system -- he could work on a small part of it until it smelled right, applying patterns as they seem appropriate. Then when tomorrow's problems appear, he adds the new functionality, then refactors it until it smells right, possibly adding other patterns that fit the circumstance better.

So why think about patterns at all? Each design pattern fits nicely in your head, possibly together with other patterns. Patterns provide building blocks of considerable scale, as opposed to the most basic object-oriented constructs of classes, fields, interfaces, composition, etc.

Depriving the Craftsman

As a long-time programmer and `Computer Scientist', I think of myself as a craftsman; as Fred Brooks puts it, I'm just a toolsmith. I comprehend a problem, I understand my tools, and I craft something which helps to solve that problem.

But the problem with this craft is that I'm never allowed to do anything that I'm good at. An artisan becomes great by great talent and much thoughtful repetition -- but we programmers are not normally allowed to repeat anything! I'm always pushing the boundaries of my current skills, trying to re-use as many existing components as are available. Unfortunately, doing so is quite unsatisfying. I never get to do anything significant over and over, re-thinking my approach until I finally get something great.

28 May 2002 (updated 29 May 2002 at 00:28 UTC) »
On Academic Computer Science

Whilst I was back home a couple of weeks ago, cmiller asked whether I was as dissatisfied with Computer Science Academia as my advogato entries seem to indicate.

I've just completed my first year of graduate school in Computer Science at UNC Chapel Hill. It's been a lot of fun, but I was alarmed to realize a few facts:

  • CS grad students aren't like top CS undergrads. I was surprised to learn that many of the grad students I've met didn't study CS as undergraduates; that many of them have only been using computers for a few years; and that many of them don't even particularly like programming. My theory to explain the difference between CS undergrads and CS grad students is this: most CS undergrads who are genuinely good at making computers do things find gainful and satisfying employment.

  • ``Computer Science is embarrassed by the Computer'' There is a strong current through CS that is at once irritated by the difficulties of real-world implementations, and seeks to trivialize the value of making computers do things. In the worst cases, PhD Computer Scientists can be little more than mathematicians, ready and willing to burn years on thought experiments and research based on apparently- arbitrary assumptions. To these theoreticians, it's more important to fully explore every point along every dimension of an old problem than it is to find a new problem which might have a helpful solution. (Some other Computer Scientists claim that this isn't really CS.) Theoreticians seem to be afraid of tackling the details of real-world systems.

    I've noticed that if a professor carries a laptop, he's likely to be more realistic and do more-interesting things than otherwise.

  • Computer Science trivializes important problems. For example, Computer Security has not been considered a significant issue within CS. Nor has large-scale system configuration. These problems, CS people have said to me, are just issues of proper software engineering practice. But both of these are leading causes of operational failure in existing systems.

    Lesson: Some fields which deserve concerted CS research have not been embraced by the CS community. My theory is that CS has really solved a problem when it's not causing a problem any more for real systems and their real users. That is, it's not enough for CS types to publish some abstract descriptions of how the world should be.

So, then, why did I still enjoy a year of it, and intend to continue on further?

  • I get to have great stuff explained to me. In my first year, I've had professors carefully explain to me how some of the best-designed systems in all of CS work: the domain name system (DNS); the network file system (NFS), the Andrew File System (AFS), HTTP, TCP Congestion Control, IP, TCP, and UDP protocol operation, DiffServ, IntServ, Class-Based Queueing (CBQ), online Real-Time scheduling algorithms, MPEG audio (mp3) and video encoding, JPEG video, RTP, NTSC video encoding -- and others.

    Of course, I could have read about these myself -- and I already knew about some of them -- but there's something great about having someone carefully explain a system or an algorithm with the aid of slides, and to know that they're explaining something that's actually useful because I use them.

  • CS course provide motivation to do good things. Even CS practitioners like me can fall for the trap of believing that something is easy because we can conceptually imagine an approach to it. These are all of the programs that I imagine that I know how to write, but I don't actually go to the effort to write them. CS professors actually make me wrote some of those programs, and wrestle with the details inherent.

    I learned from undergrad CS programming assignments, but rarely very much. They weren't very hard; most of them took four hours or less, even in the senior-level classes. By constrast, graduate level programming assignments can be rough -- some take weeks of daily effort. But I implemented some neat stuff -- many parts of an operating system including distributed IPC, scheduling, and process management; a framework for distributed, fault-tolerant applications; two real-time scheduling simulators; a web server (in C); an RPC-accessible cache server; and a model checker for software validation (in ML).

  • I'm learning about scientific evaluation. CS is about identifying problems, developing solutions, and showing that the solutions actually solve the problem. This later part helps to make CS a science, but it's not something that we do naturally.

    It's about claims: if I claim that this program does something, I need to be prepared to prove it. And not just prove it to a willing accomplice. I need to be able to show that a system necessarily has certain properties. This is done with analytical proof (by developing theoretical models about which properties can be logically deduced); with simulations, and with actual experiments.

    The quality of the simulation or of the experiment matters, too -- even a skeptic should not be able to argue that my results are invalid. And the quality of your evaluation matters, too: I'm learning to be skeptical, and to require that you substantiate your claims with quality evaluation.

    Having good ideas is just the beginning.

16 older entries...

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!