30 Nov 2000 apm   » (Journeyer)

I've been working on documentation for the last little while and I needed a break and some real coding. So I decided to make semicolons optional in the Suneido language. I've been toying with this idea for a while and over my coffee at Tim Hortons one morning, I worked out how to do it and it seemed pretty straightforward. In the end, it took me about half a day to get it 99% working, and another day and a half, and several complete rewrites, to get the last 1%. Typical. On the positive side, the current version is a lot cleaner than the initial version.

One thing I was afraid of was that the changes would ``break'' a bunch of the existing code in stdlib. So I wrote a quick function to syntax check all the records in stdlib.

QueryApply('stdlib')
    {|x| try x.text.Eval(); 
    catch (e) Print(x.name " - " e); }
Then I put this in a text file so I could run it even if I there were too many errors to get to the WorkSpace. (With the Print changed to output to a textfile.)

My first stab at the changes resulted in about 500 records with syntax errors. After fixing a few blatant issues, it was down to about 30. A few more fixes and voila, no errors. But that was on code that had semicolons on every statement. When I started to test examples without semicolons I ran into a bunch more problems that took quite a bit longer to fix.

I tried to follow a good refactoring approach, although, technically this wasn't refactoring because I was changing functionality. But I was preserving all the existing functionality. The basic plan was

1. Change the scanner to return a NEWLINE token instead of a WHITESPACE token for any run of whitespace that contained a linefeed or return. Then change the parser to ignore NEWLINE tokens.

This should not have changed anything - and it didn't. All my tests ran, and no syntax errors were introduced.

2. Add a variable to track nesting of (), [], and {} and ignore NEWLINE tokens if inside one of these. Also skip NEWLINE's after binary, and trinary operators.

Seems simple but this ended up taking quite a bit of fiddling to get right. At first, I was adjusting the nesting counter in various parsing methods. (Suneido uses a recursive descent parser so there is a method for each grammar construct.) But this got pretty ugly, so I ended up counting (), [], and {} in the method (match) that reads new tokens from the scanner. After this breakthrough (which, of course, seems obvious now) it went fairly smoothly.

When I released this version internally, we found two records in stdlib and a few more in other libraries, where there were no syntax errors, but the new interpretation was different from the old interpretation. For example:

return
    ... ;
used to be one statement, but now was two i.e. return ; ... ;

Another case was:

s = s
    .Replace(...)
    .Replace(...);
Which was now three statements instead of one. The solution is to put the operators at the end of the lines instead of at the beginning, which is our normal style guideline anyway.
s = s.
    Replace(...).
    Replace(...)
Overall, I'm pretty happy with this change. Personally, I'm so used to having to have semicolons in C and C++ that it's not really an issue for me. But I have noticed it can be a problem for beginners. And if you don't need them, why require them.

The next step on this path is to make braces optional and use indenting instead, like Python. One of these days when I need another fix of serious hacking...

Andrew McKinlay
Suneido Software

Latest blog entries     Older blog entries

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!