Older blog entries for chromatic (starting at number 236)

13 Nov 2005 (updated 13 Nov 2005 at 01:19 UTC) »

Or Possibly a Helicopter:

Being an editor, I find bad writing painful (except for leading dangling participles). Don't get me started on most technical writing. (If you've written for me, you likely know at least two of my pet peeves, though I try to be gentle about explaining them.)

If there's anything worse than technical writing -- and I'm leaving out weblogs here -- it's most newspapers. My mother sent me a story this afternoon about a an accident where a crane fell off of a bridge near my hometown. It amuses me greatly to read:

Another crane will likely be used to remove the debris, or possibly a helicopter...

I vote to remove the debris -- not the helicopter. (Unfair quoting? Not really -- the rest of the sentence is worse.)

6 Nov 2005 (updated 6 Nov 2005 at 23:12 UTC) »

Resumable Exceptions (sort of):

Sometimes it's nice to have a single exception block guarding several potentially fatal operations. This lets you re-use error-handling code. However, sometimes you can recover from a single exception and move on to the next operation. Unfortunately, Perl doesn't really support this.

That's why I wrote some very simple code. Here's how it looks like from the Perl side:


use strict; use warnings;

use Runops::Resume;

my $text; eval { $text = 'before'; print $text, "\n"; $text = 'after'; die "Goodbye!\n"; print $text, "\n"; };

warn "Died '$@'" if $@; resume();

As you ought to expect, the output is:

Died 'Goodbye!
' at example.pl line 17.

Note that declaring lexicals within the eval block doesn't quite work correctly. That would be scary code.

Between Perl and C, it's fewer than 70 lines of well-spaced code. It's reasonably trivial to make a stack of resumable exceptions, too -- probably fewer than ten more lines of C. Runops::Resume.

Trace Your Ops:

I've been writing some weird code lately.

As I was falling asleep last night, I thought about pluggable runloops and figured out a way to instrument the standard runloop in such a way as to provide loads of debugging information to pure-Perl code written by people who don't want to have to dive into XS to learn interesting things.

Perl, not generating machine code directly, has a runloop that takes the optree generated by the compiler and walks it in order. It's a very simple subroutine in run.c that basically says "Call the function pointer of the current op, which returns the next op. If that next op exists, repeat."

Instrumenting that turns out to be pretty easy. The only tricky part is making sure that the tracing code doesn't itself get traced, which would lead to all sorts of infinite recursions.

Okay, another tricky part is making sure not to leave any residue of the calls to the tracing code on the stack without requiring that non-XS programmers always end their code with bare return statements. (If you've ever used call_method(), you probably know that G_VOID doesn't cut it here. I know that too, but I wasted a few minutes not knowing that.)

Anyway, here's Runops::Trace 0.01. It's not spectacular or even useful, but it totally works as a proof of concept. There's a little more design to do on the user side, such as "Which parts of the op should the user be able to see?" and "How should users be able to trace certain ops but not others?", so let me know if you have any opinions.

If you don't care, I'll give you a moral for this story anyway: if you're poking at the Perl interpreter and your code doesn't segfault, it must be fairly correct.

Secrets of Contextual Analysis:

I'm analyzing the content of some documents in order to find potential correlations between them. Breaking each document into individual words, stemming those words, and throwing out the stopwords gave me some 18,000 unique words from a 600-document corpus, with over 40% of words appearing only once in the corpus and almost 80% of the words appearing fewer than ten times.

I knew my existing list of stop words was insufficient, but I really don't want to pick out the top 1000 or 2000 useful words from a list of 18,000, especially because this is a test corpus of perhaps 7% of the actual corpus.

Now I start to wonder if some of the lexical analysis modules would be useful in picking out only the nouns (unstemmed) and verbs (stemmed) from a document, rather than taking all of the words of a document as significant. The correlation algorithm appears sound, but if I can throw out lots of irrelevant data, I can improve the performance and utility of the application.

Any thoughts?

31 Jul 2005 (updated 31 Jul 2005 at 06:50 UTC) »

1500 Lines Today:

I wrote around 1500 lines of code and documentation today, all in Test::Builder for Parrot. Geoff Young wanted it, and how can you say no to a code request from him? Leo also seemed really impressed at the idea of running Parrot tests a fair bit more quickly than they currently do. Okay, 1500 lines isn't as impressive when I mention that it's PIR code, but it's a fair bit of code. 1500 lines of anything is good, whether code, non-fiction, or fiction.

In other news, Test::Builder for Perl 6 works completely. Okay, there are a couple of workarounds for missing features in both ports for Pugs and Parrot, but they both work.

Unfortunately, I haven't finished the PseudoPod to PDF converter for my day job, as I'm trying to write it well and maintainably (and creating pixel-perfect layouts is a tremendous pain, especially as I don't care, I just want to put something on the page.)

At least now I can bask in the glow of "Hey, I wrote something tremendously cool!" at OSCON next week, rather than promising to write something cool and then, well, not.

Bootstrapping Is Harder Than It Looks:

In my copious free time not taken up by a dozen other projects, I've been working on Module::Build::TestReporter to make version 1.0 available before Perl Testing: A Developer's Notebook comes out. (I still have yet to bundle Test::Kwalitee.)

After the first release, Stig Brautaset pointed out that there's a bootstrapping problem for people who want to use MBTR. It depends on a few non-core modules, but if you're using Module::Build::TestReporter for your distribution, it won't load without those dependencies -- and you want to mark the build dependencies in your Build.PL file. (You also need to ship MBTR with your distribution, but I can't do much besides document that fact.)

The approach I chose was to attempt to load the necessary modules, trapping any errors. If something failed, I install a different constructor that adds the necessary files to the build_requires parameter and calls Module::Build::new(), which handles the dependencies appropriately.

Testing that logic was a bit of fun, too. Sometimes I wonder if we test module writers test our test modules just to make us less smug.

Making It Easier For Users to Report Test Failures:

Perl's module tools make it easy to bundle, distribute, and install software, most of the time. Other tools make it fairly easy to test software too. Of course, the tools for end-users lag somewhat behind the tools for developers, especially in places where developers are happy with their tools.

If you're a Perl-savvy developer and tests fail in a module you're installing, you know what your options are. Consider what a non-developer could do, though. Yet developers who ship their tests for end-users to run rely on receiving feedback about failures so that they can fix the code, the tests, the assumptions, or whatever's not right.

I've just released Module::Build::TestReporter which runs the tests as usual, hijacks their output, keeps a log of any failures and their diagnostics, and tells users what to do to report any failures to the developers. If you think this will solve a problem for you, give it a whirl. (I'd love to have feedback before I release it to the CPAN in a week or so.)


Why call walkoptree() yourself (see B.pm) when you have the power of XPath (at least as much as Class::XPath supports?


use strict; use warnings;

use B::XPath;

use vars qw( $foo $bar );

sub some_sub { my $x = shift; $foo = $x; print "\$x is $x\n\$foo is $foo\n"; }

my $node = B::XPath->fetch_root( \&some_sub );

for my $bar ($node->match( '//gvsv[@NAME="foo"]' )) { printf( "Found global '%s' at %s:%d\n (defined at %s:%d)\n", map { $bar->$_ } qw( NAME find_file find_line FILE LINE ) ); }

I'm sure you're on the edge of your seat for the output:

$ perl find_global_name.pl
Found global 'foo' at find_global_name.pl:13
  (defined at /usr/lib/perl5/5.8.6/vars.pm:35)
Found global 'foo' at find_global_name.pl:14
  (defined at /usr/lib/perl5/5.8.6/vars.pm:35

There are two drawbacks (besides the fact that it's a proof of concept and not releasable yet): Class::XPath has little axis support and you have to know an awful lot about the structure of the optree for which you want to search. I think the latter is solvable, but it will require more thought.

Crueler Months:

In the past month, outside of my busy day job, I:

  • Finished writing my novel. It's 72,500 words. Strangely, it's neither science fiction nor fantasy. It's just a modern novel. Now I'm letting it sit for a bit before I edit it. I can't promise when it'll be in stores though.

  • Finished writing a book with Ian Langworth. It's good stuff. It'll be out before OSCON.

  • Started applying documentation patches like mad to Parrot. At least, I've been bolder about it than before. (I also fixed a segfault, which was nice.)

  • Released Class::StorageFactory. Has it been a month already? Note that I renamed load() to fetch() and save() to store().

  • Released a (new maintainer!) version of SUPER. I volunteered to take over this module because I wanted to fix it to work with Class::Roles. That's one part down.

  • Adopted two cats. My plants downstairs are suffering a bit.

  • Learned enough about rake to write a Rakefile for Pacuby. (If that doesn't make any sense, consider that the first time I had an upcoming book deadline, I wrote tests for a fair swath of the Perl core.)

  • Wrote two actually useful testing modules I plan to release to the CPAN very soon.

  • Started brainstorming another book project.

Maybe now I can sleep again.

A Simply Serializing Factory:

I've found a nice pattern of using objects and classes for configuration information, but I don't want to go all of the way to Class::DBI with them because I like the simplicity of working with flat files.

The third or fourth time I found myself writing a factory to load and to save objects to and from YAML files, I resolved someday to write a module that does that and refactor the other attempts to use the module. Today, instead of working on one of two books, cleaning my house, or doing either of the programming projects that crossed my mind, I wrote Class::StorageFactory (and Class::StorageFactory::YAML, which is what I wanted).

If it seems useful and no one objects, I'll upload it to the CPAN in a couple of days. I looked for prior art but didn't find much in a few minutes.

I have new versions of SUPER and Test::MockObject to upload soon too.

227 older entries...

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!