Older blog entries for Ohayou (starting at number 13)

11 Apr 2005 (updated 20 Jun 2005 at 20:32 UTC) »
Javascript: Using XPath as an XML query tool

Long story short: Mozilla: sure. IE / MSXML2 (3.0): well, given a strong stomach, perhaps.

I've been toying with a set top box user interface at work for some time now, all HTML, images, CSS and Javascript. Very non-web, non-computer feely, remote control rules all sort of thing. The target environment is very slightly spiced-up OEM, run of the mill wintel machines, IE the browser to run it on. I'm not overly happy about that last bit.

My development environment is rather Mozilla bound, mostly because there are tools useful when debugging and because IE6 gets error messages wrong, lacks the toSource method and lots of other very useful things that add up to make it a really unfriendly, sluggish environment providing no leverage.

On adding an EPG system to this beast, I decided on using the XMLTV format for the data, and threw together some sketchy parsing using the XPathEvaluator APIs kindly provided by the Mozilla people. It worked beautifully -- here's a comfy sample:

function query_xpath( node, xpath )
  var document = node.ownerDocument || node;
  var evaluator = new XPathEvaluator(), ANY = XPathResult.ANY_TYPE;
  var resolver = evaluator.createNSResolver( document.documentElement );
  return document.evaluate( path, node, resolver, ANY, null );

with which I count nodes matching some given criterion or criteria, for instance - or any other xpath functions. I did this happily enough and left the IE bit for later. Bad move. Do you think I'd be given that freedom with MSXML? I did. I am not there yet, after several hours of reading MSDN and net resources; as it turns out, to do anything but node set operations you must add some XSLT indirection to the brew, I gathered from an MSXML based tool page not kindly enclosing the only interesting bits: source code. Most annoying of them to mark it up well to get google hits from poor sods trying to get MSXML to cooperate, in my grumpy opinion.

This might be on the right track though, should I choose to still rely on doing any XPath based queries on the raw XMLTV data (untested):

function query_xpath( node, xpath )
  var xsl = new ActiveXObject( 'Msxml2.FreeThreadedDOMDocument.3.0' );
  xsl.loadXML('<xsl:stylesheet><xsl:template match="/"><xsl:value-of '+
	      'select="'+ path +'"/></xsl:template></xsl:stylesheet>');
  var xslt = new ActiveXObject( 'Msxml2.XSLTemplate.3.0' );
  xslt.stylesheet = xsl;
  var xml = node.ownerDocument || node;
  var proc = xslt.createProcessor();
  proc.input = xml;
  return proc.output;

Not pretty. Doesn't comply with the API of processing a node set only, but rather needs the entire document, and it reeks of needless computron waste.

End of rant. I feel a little better already.

7 Mar 2005 (updated 7 Mar 2005 at 00:58 UTC) »

I've seriously started digging into Venkman now, the Mozilla project's javascript debugger and profiler. I'm sure it's a great tool if it's your own baby or if you have someone initiated around to teach you its ways, but short of that, you need to find good webpages to help you get anywhere, such as figuring out how to set a simple breakpoint. It's a bit like learning to make good use of Emacs, though in a GUI application. Striking.

Anyway, once you acquire some basic working skills, it has a lot to offer. I fell in love with the profiling tools, not so much because I tend to write javascript code in need of optimisation, but for being beautifully done. (Once you are sitting with the profiling data and have left Venkman's GUI safely behind, anyway.)

As it happens, though, I read through Svend Tofte's good guide referred above, and ended up at the BrainJar JavaScript Crunchinator. "Hey, cool hack!" I thought to myself, and tried feeding it my present work project, an application weighing in at 32 kilobyte, and it sat there for a long time grinding on it. Minutes later, it spat out a big chunk of code that started much like my own code and ended in mid air, three kilobyte short of the end of the file. Weird.

So I inspected my own source code, and found that I had commented out a block with /*...*/ just before where the crunchinator had given up, and the block ended in a // comment, inside the block comment -- and lo, the mystery was solved.

As I was curious to see if the crunched code would actually work, once that issue was resolved, I peeked on the comment stripper, decided it was beyond fixing and decided to run my own instead. After trying a regexp cut and paste approach, I was again annoyed at Javascript RegExps, for some reason not eating entire input strings (why does (.*) not match the rest of my input data? M'kay, I suppose I will read find the answer myself in ECMA-262 next time I'm bit by this and sufficiently annoyed to learn from the specification).

On the other hand, a regexp cut-and-paste solution is by rule of thumb always the wrong solution, for one reason or another, and after having given the matter some thought and made a brief inventory of the search methods on offer in javascript (thank you so much for Javascript: the Definitive Guide, David Flanagan!), I found a much more aesthetic solution built from String.search, String.indexOf and Array.join:

function removeComments( s )
  var found, code = [], commentStart = /\x2f[\x2f\x2a]/, commentEnd;
  while( (found = s.search( commentStart )) >= 0 )
    code.push( s.substring( 0, found ) );
    if( s[++found] == '*' )
      commentEnd = '*/';
      commentEnd = '\n';
    if( (found = s.indexOf( commentEnd, found )) >= 0 )
      s = s.substring( found + commentEnd.length );
      s = '';
  s = code.join(' ') + s;
  return s.replace( /\n/g, ' ' );

I paste it back into the crunchinator, fire away, and in mere seconds, the result pops up this time, no truncation to be seen. Surprised, I test it again. Sure enough, a speedy weasel indeed. I apply my newfound Venkman knowledge, sleep through the original code's 154.34 seconds worth of heavy processing (81 of which were spent in the original removeComments function), run my own version and get a lean 4.70 seconds for running the entire script. That's some mean garbage collection gains. Just to be sure I'm not measuring something irrelevant, I run the tests again in the other order. No difference worth mentioning.

I sumbit my improvements to the original author, notice that my additions just feel into the GPL (you know where to find the license, folks) and figure it's been a decent hack. Maybe someone could even learn from it. For reference, here is the original source code (don't do this at home):

function removeComments(s) {
  var lines, i, t;
  // Remove '/* ... */' comments.
  lines = s.split("*/");
  t = "";
  for (i = 0; i < lines.length; i++)
    t += lines[i].replace(/(.*)\x2f\x2a(.*)$/g, "$1 ");
  // Remove '//' comments from each line.
  lines = t.split("\n");

t = ""; for (i = 0; i < lines.length; i++) t += lines[i].replace(/([^\x2f]*)\x2f\x2f.*$/, "$1"); // Replace newline characters with spaces. t = t.replace(/(.*)\n(.*)/g, "$1 $2"); return t; }

"The results?" I hear you asking. Well, my original 32 kilobyte application weighed in at 19, a pleasing 59.8% of its original weight, without resorting to variable renaming and similar destructive modifications. It worked, after fixing only six slight misses, half of which were my own (missing end-of-line semicolons and a case of an operator on both sides of a newline). The other half were inside regexps -- two related to apostrophes and quotation marks, the last one being the regexp /  /, which had been optimised to // (...ow! -- and the rest of the line , or the rest of the script if you so prefer, was thus effectively cut off :-).

I suppose that means that another healthy exercise would be to rewrite the string literal parsing code too, but I would suspect that any improvements over the present would mean to parse by language grammar rather than crude string matching, and somehow it doesn't feel like very gratifying work. Not that I have peeked at the code, though.

3 Mar 2005 (updated 3 Mar 2005 at 17:44 UTC) »
Bookmarklet (and calendar rant)

I absolutely detest all numeric non-ISO date formats, M/D/Y probably most of all. So when I encountered the Kingdom of Loathing calendar some benevolent (albeit calendrally challenged) person had published, I did not track down said person to tell him how glad I was at finding what I was looking for and how I felt about the format in which it was published. The meld of feelings would just not make any sense, and after all, the information was both there and fairly easily deciphered. I just strongly feel that deciphering is best left to computers.

Enter today's bookmarklet (feel free to bookmark it). It will ask you which (numeric) date format to convert from, harvest all frames for dates on that format and reformat them to readable ISO YYYY-MM-DD dates. If you go with the default M/D/Y, it will find 3/2/5, being sillyspeak for yesterday, and turn it into 2005-03-02. Short dates in the future (such as 3 / 3 / 6) will be assumed to mean the corresponding date from last century. Run it a year from now you will see 2006-03-03, though. Unless your clock is off, by a lot.

Upon googling for date tables to try it out on, I found a hilarious hallmark of stupidity - an excel sheet featuring the column "Employee Start date", "m/d/y or y/m/d e.g. 5/17/2 or 2002/5/17". The web is a silly place. Let's not go there.


Okay, the last procrastinational activity for today: a very (stone axe) crude kind of javascript indent tool built on GNU indent. So it doesn't really reindent Javascript code, but C. Silly me. But while it does trash the occasional RegExp literal, it's less of a hassle than doing all the work manually.

Besides, most C users wouldn't use web tools to indent their code. Not me, anyway. But if you would, feel free to use this one. The worst thing that could happen is that it might at some time in the future also work with keywords and literals you do not use in your C code, where they are not supported.

(I wish I had some tool that turns javascript input to an abstract syntax tree, which I could then just traverse to cook my own Function.prototype.toSource(). ...Okay, back to work now.)


How does one intercept modal browser alerts about connection problems, file transfer failues and the like from javascript?

Most of all, I want to trap the "The operation timed out when attempting to contact www.kingdomofloathing.com" dialog, which pops up very frequently (typically at least once every session) when I load a new frame from my javascript code. Of course it has nothing to do with which method I employ to load it, but it's mostly when I initiate the transfers myself when I'm really interested in trapping the alert to handle it in a more constructive and less user-involving manner.

Even a solution requiring me to be a Mozilla plugin would be useful, though what I really want is something that could live in a scriptlet or web page and handle the call, ideally cross-browser. With browser Javascript not being nearly as wing-clipped today as it was back in the nineteen hundreds, I've come to expect to be able to solve issues like these. (Let's hope I'm not being overly optimistic.)

Hmm. I suppose due to the lack of talkback features here, I'd better leave a mail address -- feedback to oyasumi at gmail dot com would be very welcome indeed. I'll be sure to write some kind of article about the solution once I find or get pointed to it.

24 Feb 2005 (updated 24 Feb 2005 at 21:07 UTC) »


My signature-uncluttering phpBB Firefox extension seems to have been listed recently at Extension Room. And as proof of someone already using it, a few hits in the access log show that browsers are eagerly polling for updates -- and I received a second feature request, not counting my own. (With a bit of luck, I'll learn how to get a callback once just the HTML bits of a loading page have been fully loaded, so I can address my own issues with it. After all, it doesn't make any sense loading lots of graphics just to purge them a second later.)

Macromedia security team got back to me reporting that they have reproduced the problem and that it fortunately only feeds mouse events to the background flash application (not including data on which window should have seen the events). I hope the Macromedia and Mozilla people get to share intelligence on the subject; I'd be delighted if the former nailed their bug and the latter found a way of fixing the entire problem class from their side. Since IE for once seems unaffected, I have hope it's possible to somehow alter the plugin interface sandbox to catch this.


Oops, I just encountered a solution Net doesn't consider valid. Minesweeper addicts beware: this game carries the same mind tying properties that your former brain virus did.

Ooo, shiny! Completed the master level of Net in 13:37 (pun not intended) m:s. What an extraordinarily geeky personal record. One probably ought to quit when encountering such divine hints that there are better ways of wasting time.

I could, for instance, finish that object oriented javascript+HTML set top box interface I have a deadline to complete before next Friday and get actual pay for doing, by the hour. That last somehow makes the code very pretty, easy to debug, extend, explain and reuse in a way my pet projects seldom seem to. Why isn't personal hobby hacking paid by the hour?

(Yeah, I know. It's amusing how some question make sense in feelings but come out inane traps of logic short circuit if spelled out in words. Life can be so much fun if you are as easily amused as I am at times.)

22 Feb 2005 (updated 22 Feb 2005 at 05:36 UTC) »

Hmm. My Firefox 1.0 lets through mouse clicks in my advogato tab to the flash game running in its own tab. When clicking places in my advogato tab where the game tab had a widget, I hear the characteristic clicking noise evoked by the game. I wonder if there's a bugzilla ticket on this yet.

...Well, now there is, anyway. (It also seems bug id:s have more than doubled in the almost three years that passed since my first and prior issue was registered.)

Upon closer inspection, that's hardly a mozilla issue at all, since XEmacs and Photoshop exhibit the same behaviour. Oh, well, now it's reported to the hopefully security minded engineers at Macromedia too.

And thank god for Google to helt bypass the many tightly woven webs of frustration on icky corporate websites to get to what you want to go. I tried to find that page unaidedly at first, and was very close to dying in the process. Never again.

21 Feb 2005 (updated 22 Feb 2005 at 02:01 UTC) »

Today's addictive game encounter (pointed out by Jesse Ruderman): Net by Pāvils Jurjāns. It's one of these really good basic, well designed mind games where you have a complete overview of the board, and set it straight in as few turns (and, I presume, as little time) as possible, to the set of logic rules dictated by the game. I'm not yet sure if it's as addictive as Minesweeper which, despite suffering from the misfeature of occasionally forcing the element of chance on your completion of a stage, through similarly basic rules, comes out a great time sink. Or, as might be argued, that may well be what makes Minesweeper so compelling to some.

Either way, I started out trying to find the rules, to soon be puzzled finding my optimal solutions were a few turns more than the reported minimum required amount of turns. As it turns out, the turn count was not how many times a tile had been rotated 90 degrees, but how many turns a tile had been rotated any angle you well please. (Okay, so let's not make necessary partial rotations until we know for sure just which rotation is the correct configuration of every tile.) Results crept down to suggested figures. Until I suddenly found myself having completed a minimum 44 turn stage in 43 turns. (Yay! :-) Wish I'd had a screenshot of it before I started, too, but I suppose I'll bug report it, either way. Not that it matters much to game play, but the author might want to know (assuming he isn't already aware of the issue).

Maybe this bug actually adds to the game more than had it not been there in the first place; it's very rewarding to not only complete a game, but to actually beat it. I'll feel much better leaving the game in a mental state of victory over leaving it after a lengthy session of having just played it successfully, beating my time scores for as many times as that remains interesting. In a free game, this might be a feature indeed, whereas it would probably be an economic set-back in a pay per play time arcade or internet game.

(Seems it can be off by more than one turn, too; I just completed a nine by nine board in 55 turns of 58. :-)

21 Feb 2005 (updated 21 Feb 2005 at 02:25 UTC) »
Javascript tip of the day:

For all the annoying cases where javascript gives you an object with numeric indices but which isn't a proper Array object, you can upgrade it to become one using Array.prototype.slice.apply(object) It's more cruft than had the language already provided us with true arrays (in cases like the arguments variable bound in functions, the window.frames list and the various arrayish return values of numerous DOM methods, to name a few), but it's one less needless javascript level loop to do the transition.

And why doesn't javascript come with your basic set of higher level methods like mapcar and friends? Sure we can cook our own, but it's as if the language tries to imitate a blunt stone age axe, when it's capable of being a real power tool, with just a few touches here and there to get a proper class library. (Is this some misguided attempt at making the language more graspable to a web developer in diapers?)

4 older entries...

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!