29 May 2008 elanthis   » (Journeyer)

Design of PHP

Having worked with PHP professionally for some 9 years now, I’ve slowly acquired a very dismal view of the language. Don’t get me wrong - it works, it gets the job done, and in a lot of cases there simply isn’t a sensible alternative. But let’s be honest with ourselves: PHP sucks, and it could have been a far better language.

The shortest summary of my complaints against PHP are that there was absolutely no real design effort put out before the language was written. Or before any major version release after the first. Or in the next version release, PHP 6.

PHP has a large number of well-known “oopses” built up over the years. Auto globals and magic quoting are perhaps two of the best known. Auto globals might seem like a good idea at first, but even a little bit of thought would have shown how problematic it would turn out to be. Magic quoting is another idea that might seem good if given only a few seconds thought, but if one sits down and really thinks about the problem — developers not quoting the strings they’re concatenating together to form SQL queries — it’s not hard to come up with a handful of better solutions.

Aside from the glaring mistakes, PHP also suffers from what I call the Misplaced Generality Tendency. PHP, like many other languages, are designed with a very general-purpose syntax and library despite the fact that PHP was written for one purpose: web applications. Web applications for the most part do two things very frequently (database queries and HTML output) and a lot of other things very infrequently. It would have made a lot more sense to specialize PHP towards DB queries and SQL as well as easy and safe HTML output rather than designing this general purpose C/Java/Perl like language that doesn’t really do anything better than those languages other than having an Apache module (which time has shown us is a bad idea for security reasons - hello fastcgi and suexec) and being easily embeddable in HTML (which most of us don’t even do anymore - we use domain-languages like Smarty or PHP Sugar for output).

It would be nice to think that PHP is slowly getting better, but that does not appear to be the case. When MDB2 was released as the replacement for PEAR::DB, we found that using SQL was just as much of a pain in the ass and easy to get wrong as before. You have to jump through extra hoops to get the placeholder syntax (which, in turn, uses stored procedures even for one-off queries) with MDB2 instead of it being the default, recommended way of doing things. You’re still forced to treat all SQL as just a regular string — same as all that dangerous user input — instead of treating SQL as a first-class citizen of the language. Imagine for a moment that you could write something like:

$result = $dbh->query({{ SELECT column FROM table WHERE id=$_REQUEST['id'] }});

The {{ }} syntax is for illustration only: there are a number of alternatives that might be more aesthetically pleasing. Now, imagine that PHP not only recognized the SQL expression, but also knew to automatically quote the $_REQUEST['id'] variable appropriately. Sometimes you do need to build up your SQL queries like they were strings - it’s rare, but it happens. The syntax above makes it trivial to support this.

$conditions = {{ }};
if ($_REQUEST['name']) $conditions .= {{ WHERE name={$_REQUEST['name']} }};
$result = $dbh->query({{ SELECT column FROM table $conditions }});

When a SQL value is inserted into or concatenated with another SQL value, the result is just what you’d expect.

The query method on the $dbh object would reject any input that isn’t a SQL value. It would not automatically coerce a string into a SQL value or anything like that.

In those instances where you really do want to take user input and turn it into a query (for software like PHPMyAdmin), you can provide a simple method to convert a string into a SQL value. Make it sound scary if you want, or just document it well if you trust your users to read (I don’t).

$sql = unsafe_string_to_sql($user_input);

Life would be so much easier if PHP actually decided to support SQL in the language, like any language that works 90% with SQL should.

HTML output with PHP isn’t much better. Tools like echo or print are almost as dangerous as the existing SQL libraries in PHP. The problem lies with XSS attacks and malicious content injection attacks. Simply spitting user data out to the page allows users to inject JavaScript, Flash, ActiveX, Silverlight, or other potentially harmful data into a page. If the site stores requests from users and then displays that data to other users, you’ve got yourself a big problem. Almost as big of a problem as if you just let users inject raw SQL into your database queries.

Sure, you can make your output safe using htmlentities() or htmlspecialchars(), but that’s kind of a pain — just as much of a pain as having to add $dbh->quote() all over the place in your SQL string concatenations. At least MDB2 allows you to use placeholders after enabling them; PHP has no such feature for output.

Of course, most of us use a templating engine anyway. It’s really not the core application’s job to be spitting out HTML. Really, there’s no even any good reason for the embedding syntax (the stuff) in PHP. It may have made sense back when PHP was just a hopped up server-side includes mechanism, but in the days of PHP6 (or even PHP4) it’s practically useless.

Several design flaws are illustrated by the templating needs among PHP developers. First is the fact that PHP is not at all intended to be a templating engine and yet still tries to pretend that it is one. Second is the fact that PHP is not intended to be a templating engine at all. Think about it: why is Smarty or PHP Sugar necessary? Why not just include another PHP file?

Well, for starts, there is no sandboxing mechanism in PHP. There’s no way to include a file and guarantee that that file can’t access dangerous library routines or modify application data. The PHP syntax is also relatively unfriendly to Web design professionals — PHP is focused on generalism and the C/Java/Perl syntax instead of being focused on its core domain, remember? Finally, simply including PHP scripts as your templates would provide you with a separation of core business logic and presentation but would not grant you any other niceties such as easier output of safe content.

Unfortunately, just as MDB2 doesn’t do nearly enough to offset PHP’s non-existent SQL integration, projects like the Smarty engine (which by all appearances is the semi-official PHP templating solution) don’t do nearly enough to offset the numerous design flaws PHP has regarding its HTML output facilities.

For example, even though Smarty make function calls look more like HTML tags (good for Web designers), it also managed to choose a code delimiter and conflicts with JavaScript and CSS; expose pointless internal complexities on the user like forcing them to know when to use -> and when to use . to access value properties; makes it easier to unsafely output user data than to safely output user data by requiring explicit escaping; and focusing on generalism instead of offering as many tools that are frequently needed by designers as possible. In most ways, Smarty seems to have been designed the same way as PHP — which is to say it wasn’t really “designed” at all.

PHP could alleviate many of these problems. It could have used a syntax more familiar to Web developers (which might include trying to look more like JavaScript than Perl). It could have offered a sandboxing facility. It could allow an include mechanism that enabled the embedding syntax while leaving it disabled by default for regular files. It could make sure that echo statements or statements automatically HTML escape their output by default and make outputting raw data require the extra steps instead.

Web developers would be best supported by a language actually designed for Web development, which is not the same as a language that was haphazardly slapped together with the intent of using it for Web development.

In general, I believe that it’s often better to learn and use an assortment of domain-specific languages rather than trying to make one language fit all needs. It’s often a lot easier to learn a small language specific to a single problem domain with syntax and functionality targeted exactly at what a developer is trying to accomplish rather than trying to learn large and complicated tricks to get a general purpose language to do what is needed. I often hear the mantra that developers love being able to do all their work in one language — such as being able to code both the server logic and the client behavior in C# using ASP.NET and Silverlight — but I contend that developers ask for this only because they have already been forced to build up a large assortment of C#-specific tricks and hacks to accomplish oft-repeated goals and that the alternatives to Silverlight are themselves general-purpose languages shoe-horned into a relatively simple problem domain (UI behavior).

It’s easy to see that this philosophy has been in effect amongst true computer scientists for at least 30 years just by looking at one of the core UNIX design principles: many small tools that each do a specific job and do it very well. Even in terms of language, UNIX has plenty of examples: awk, sed, tr, regular expressions, C, shell script, yacc/lex, and so on. It may take some time for someone already familiar with C to learn the awk syntax, but in for very specific yet frequently occurring problem domains it will take a programmer less time to learn awk and then write an awk script than to try to develop the equivalent logic in C. Tools like Perl may all but make languages like awk obsolete (a design goal of Perl if I recall) but there are still plenty of other problem domains with Perl barely does better than C.

Unfortunately for all of us working with PHP professionally, the only thing PHP has going for it that any other languages don’t is that PHP comes as standard in pretty much every web hosting provider service out there. After that convenience is granted, we have to start struggling in order to overcome PHP’s misdesigns and do our job: writing maintainable, secure, stable, efficient web applications. PHP is a toolbox full of interesting and useful tools — just not the ones we need to use most often.

Syndicated 2008-05-29 01:23:52 from Sean Middleditch

Latest blog entries     Older blog entries

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!