7 Mar 2005 Ohayou   » (Apprentice)

Javascript

I've seriously started digging into Venkman now, the Mozilla project's javascript debugger and profiler. I'm sure it's a great tool if it's your own baby or if you have someone initiated around to teach you its ways, but short of that, you need to find good webpages to help you get anywhere, such as figuring out how to set a simple breakpoint. It's a bit like learning to make good use of Emacs, though in a GUI application. Striking.

Anyway, once you acquire some basic working skills, it has a lot to offer. I fell in love with the profiling tools, not so much because I tend to write javascript code in need of optimisation, but for being beautifully done. (Once you are sitting with the profiling data and have left Venkman's GUI safely behind, anyway.)

As it happens, though, I read through Svend Tofte's good guide referred above, and ended up at the BrainJar JavaScript Crunchinator. "Hey, cool hack!" I thought to myself, and tried feeding it my present work project, an application weighing in at 32 kilobyte, and it sat there for a long time grinding on it. Minutes later, it spat out a big chunk of code that started much like my own code and ended in mid air, three kilobyte short of the end of the file. Weird.

So I inspected my own source code, and found that I had commented out a block with /*...*/ just before where the crunchinator had given up, and the block ended in a // comment, inside the block comment -- and lo, the mystery was solved.

As I was curious to see if the crunched code would actually work, once that issue was resolved, I peeked on the comment stripper, decided it was beyond fixing and decided to run my own instead. After trying a regexp cut and paste approach, I was again annoyed at Javascript RegExps, for some reason not eating entire input strings (why does (.*) not match the rest of my input data? M'kay, I suppose I will read find the answer myself in ECMA-262 next time I'm bit by this and sufficiently annoyed to learn from the specification).

On the other hand, a regexp cut-and-paste solution is by rule of thumb always the wrong solution, for one reason or another, and after having given the matter some thought and made a brief inventory of the search methods on offer in javascript (thank you so much for Javascript: the Definitive Guide, David Flanagan!), I found a much more aesthetic solution built from String.search, String.indexOf and Array.join:

function removeComments( s )
{
  var found, code = [], commentStart = /\x2f[\x2f\x2a]/, commentEnd;
  while( (found = s.search( commentStart )) >= 0 )
  {
    code.push( s.substring( 0, found ) );
    if( s[++found] == '*' )
      commentEnd = '*/';
    else
      commentEnd = '\n';
    if( (found = s.indexOf( commentEnd, found )) >= 0 )
      s = s.substring( found + commentEnd.length );
    else
      s = '';
  }
  s = code.join(' ') + s;
  return s.replace( /\n/g, ' ' );
}

I paste it back into the crunchinator, fire away, and in mere seconds, the result pops up this time, no truncation to be seen. Surprised, I test it again. Sure enough, a speedy weasel indeed. I apply my newfound Venkman knowledge, sleep through the original code's 154.34 seconds worth of heavy processing (81 of which were spent in the original removeComments function), run my own version and get a lean 4.70 seconds for running the entire script. That's some mean garbage collection gains. Just to be sure I'm not measuring something irrelevant, I run the tests again in the other order. No difference worth mentioning.

I sumbit my improvements to the original author, notice that my additions just feel into the GPL (you know where to find the license, folks) and figure it's been a decent hack. Maybe someone could even learn from it. For reference, here is the original source code (don't do this at home):

function removeComments(s) {
  var lines, i, t;
  // Remove '/* ... */' comments.
  lines = s.split("*/");
  t = "";
  for (i = 0; i < lines.length; i++)
    t += lines[i].replace(/(.*)\x2f\x2a(.*)$/g, "$1 ");
  // Remove '//' comments from each line.
  lines = t.split("\n");

t = ""; for (i = 0; i < lines.length; i++) t += lines[i].replace(/([^\x2f]*)\x2f\x2f.*$/, "$1"); // Replace newline characters with spaces. t = t.replace(/(.*)\n(.*)/g, "$1 $2"); return t; }

"The results?" I hear you asking. Well, my original 32 kilobyte application weighed in at 19, a pleasing 59.8% of its original weight, without resorting to variable renaming and similar destructive modifications. It worked, after fixing only six slight misses, half of which were my own (missing end-of-line semicolons and a case of an operator on both sides of a newline). The other half were inside regexps -- two related to apostrophes and quotation marks, the last one being the regexp /  /, which had been optimised to // (...ow! -- and the rest of the line , or the rest of the script if you so prefer, was thus effectively cut off :-).

I suppose that means that another healthy exercise would be to rewrite the string literal parsing code too, but I would suspect that any improvements over the present would mean to parse by language grammar rather than crude string matching, and somehow it doesn't feel like very gratifying work. Not that I have peeked at the code, though.

Latest blog entries     Older blog entries

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!