3 Apr 2001 technik   » (Journeyer)

Work:
Go, Webmonkey! Go! A corporate update is underway and my team is retrofitting the new look-and-feel onto existing products. The old stuff was ugly and the new design is beautiful and easier to use but rehabilitating the stuff built by partly WYSIWYG editors is hell. The stuff violates the HTML 4 spec., renders differently in each browser, and is peppered with unnecessary tags and attributes.
Facing fifty-odd static and dynamic pages myself, and knowing that there are a few hundred in total distributed across the team, I reached for Perl (ask for it by name!) and Sean Burke's updated HTML::Tree module which I recalled reading about in The Perl Journal a few months ago (Aside, does anyone at liberty to say know with certainy what is going with TPJ?). A little preprocessing, a little post processing, a run through the parser and I had reduced by more than half the amount of manual editing. Compare the below code snippets to running search and replace in your favorite editor. Compare it for 50 files, or 100.

use HTML::TreeBuilder;
my $tree = HTML::TreeBuilder->new();
$tree->parse_file($filename);
...
# deleting font elements
@tags=$tree->look_down('_tag','font');
  if (@tags) {
    foreach $tag (@tags) {
      $tag->replace_with_content()->delete();
    }
  } 
...
@tags=$tree->look_down('_tag','b');
    if (@tags) {
        foreach $tag (@tags) {
            if ($tag->is_inside('table')) {
                # apply style to <td> that holds this element
                $parent=$tag->parent();
                $parent->attr('class','tblBold');
            }
            $tag->replace_with_content()->delete();
        }
    }
...
# More madness... illegal WIDTH="%" attribute must be removed
@tags=$tree->look_down('width','%');
   if (@tags) {
     foreach $tag (@tags) {
       $tag->attr('width',undef);
     }
   }
...
I'm going to have to buy him a beer sometime.

Play:
Attended the New York Perl Monger's meeting last week. Met a few of the local mongers and Randal Schwartz, who was visiting. Nice time.

Gave an older computer to my 92-year-old grandmother. She had expressed interest in getting online and I had the pile of parts my wife had used for running MS-Word. She is a remarkably quick learner and was enthusiatic about it. In a little over an hour, she had gone from never having used a PC to managing quite well. She was satisfied for the time being with Solitaire and FreeCell (yep, you know what's coming), but hopefully will get comfortable and go on-line. I seriously thought about putting Debian GNU/Linux on the box and Mozilla but my wife pointed out that:
  1. She probably needs as much help as possible.
  2. We live two hours away and don't have a "normal" schedule.
  3. I'm the only person in the family who can support Unix.
So I knuckled under and gave her a system running Win98, IE5, and AOL. The horror.

Latest blog entries     Older blog entries

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!