Older blog entries for fxn (starting at number 238)

17 Sep 2003 (updated 17 Sep 2003 at 14:43 UTC) »

The last Ruby one-liner I posted yesterday is one of the prettiests I've seen, it was sent by Kurt M. Dresner to ruby-talk:

    ruby -pe 'gsub!(/\B\w+\B/){$&.split(//).sort_by{rand}.join}'
This is how it works, in case anyone is interested.

We want to filter a given text shuffling the letters of each word, except the first and the last. For instance, given

    This is an example
    text.
a possible output could be
    Tihs is an elampxe
    txet.

Note that punctuation, whitespace, etc. have to be preserved, and words one, two, or three letters long must remain untouched.

OK, in the first place, since a word cannot have a newline we can safely write a line-oriented loop. The -p flag does this, and we can imagine it wraps the code like this:

    while there are more lines
        assign next line to $_
        execute the code
        print $_
    end
so we can take advantage of that to modify each line through $_ and have it automatically printed afterwards.

This is what gsub! does, modifies $_ in place. In the form we call the method (gsub! is a method of the Kernel class) it receives a regular expression and a block of code. The g in gsub! means global and we are going to perform a global subtitution in $_, for each match, the matched substring of $_ will be substituted by the value returned by the block.

The regular expression \B\w+\B means match word letters in a row, but only if it is the case that to the left and to the right of the chain there are NOT word boundaries. Given the word foobar we can visualize the regex engine working like this:

  1. f matches \w, does it have a word boundary to its left? Yes, so forget about it.
  2. o matches \w, does it have a word boundary to its left? No, so keep on matching.
  3. The quantifier makes the engine advance until the end of the word, that is, oobar.
  4. Do we have a word-boundary to the right? Yes, so backtrack one word letter.
  5. Now, do we have a word boundary to the right? No, so we've got a match, which is ooba.

You see, we match with that regex exactly the part of the word we want to munge. In addition, note that words with just one or two letters do not match because at any given letter we have a word-boundary in some side.

In the block $& refers to the matched substring, and $&.split(//) splits it using the empty pattern, which results in the array of its letters. In the example above that's ['o', 'o', 'b', 'a'].

And this is the shining star of the one-liner: the Array class does not have a method to shuffle a given array, so we need some short code to emulate it. Array#sort_by {|a| a.method} sorts a given array according to the value returned by a.method for each element a of the array. The clever trick is to plainly ignore the item passed to the block and just call rand. Awesome!

The array returned by Array#sort_by contains the letters we wanted to shuffle in random order, so we just need to join them back to get the string we want ooba to be substitued with.

16 Sep 2003 (updated 17 Sep 2003 at 06:12 UTC) »

Trehe is this ietetnrnsig actlire tdaoy at Shlosdat taht eixapnls taht aicdcnorg to smoe Eigslnh uveirnitsy the oredr of lteters in words is not relaly rnlaeevt for rdaieng as long as the fsrit and the lsat are at the rghit place. (It semes taht does not hold in Spisanh).

Trehe is a link to a Perl feitlr that megnus a gvien text taht way and I watned to write a mock-up as a one-lnier, but in Ruby this tmie. Hree it is (but see Updtae 2):

    ruby -pe 'gsub!(/\w+/){|w|r=1..w.size-2;w[r]=w[r].split(//).sort_by{rand}.to_s;w}'

Udpate: And tihs is ahotner one in Perl (but see Utdape 3):

    perl -MList::Util=shuffle -pe 's:(?<=\w)\w+(?=\w):join"",shuffle+split//,$&:ge'    

Updtae 2: And that in Ruby bemoces (but see Update 4):

    ruby -pe 'gsub!(/(\w)(\w+)(?=\w)/){$1+$2.split(//).sort_by{rand}.to_s}'

Utdape 3: Wihch in trun gevis:

   perl -pe 's:\B\w+\B:join"",sort{.5<=>rand}split//,$&:ge'

Update 4: This improvement comes from ruby-talk:

    ruby -pe 'gsub!(/\B\w+\B/){$&.split(//).sort_by{rand}.join}'

Hey what a funny time!

15 Sep 2003 (updated 15 Sep 2003 at 22:29 UTC) »

Alright, I give up flirting with literate programming. It just doesn't work for me. My head is gonna explode trying to remember the relations between chunks, figuring out what needs to be modified to refactor something, variable scopes, .... I think I can honestly say it has stressed me. I was supposed to be enjoying the beauty of Ruby so I'm gonna go back to use standard layout and RDoc.

One of the problems I see is that you are using flat files to represent a tree. In Emacs noweb-outline.el can help a bit, but after this exercise I think I would need a powerful outline editor to feel relaxed and productive.

Another problem is that I have bad memory and I think that's important for the way you work in literate programming. Exercising my memory is something I should do seriously sometime in the future.

How on earth Knuth could figure all this out, write the tools, and write TeX and METAFONT? What a brain man.

13 Sep 2003 (updated 13 Sep 2003 at 06:51 UTC) »

There was no answer to my post in news:comp.programming.literate, so I've tried to get a style that I like. The current DVI file is twelve pages long with almost all documentation chunks yet empty.

I am not yet convinced myself that writing Halk as a literate program is worth the effort, but I'll keep on trying. This is a new methodology for me so it is normal I am not efficient. I think I am even feeling a bit stressed, but then stuff like

    << halk.rb >>=
    << require used libraries >>
    << define the Halk module >>
    if << we are being called as a program >>
      << launch the presentation >>
    end
or
    << get file content >>=
    if @use_cache?
      if << file in cache needs update >>
        << get file from disk >>
        << update cache metadata >>
      end
      << get file from cache >>
    else
      << get file from disk >>
    end
is so utterly clear and self-documented!
9 Sep 2003 (updated 9 Sep 2003 at 20:06 UTC) »

My coworker Juanma González has kindly made a nice logo for Halk. Thank you Juanma!

Now I am a bit stuck in the development. I am writing Halk as a literate program because that's a paradigm I like very much and want to try it. But since this is my first serious attempt some difficulties arise. I have written a message to news:comp.programming.literate asking for some guidelines.

6 Sep 2003 (updated 6 Sep 2003 at 17:33 UTC) »

Listings in Halk

The design of the interface for listing inclusion, which is the main feature of Halk, is almost finished. Authors should be able to express what they want in a single call. For instance:

<%= halk.include(:src       => 'foo.c',
                 :bin       => 'foo',
                 :args      => 2,
                 :max_lines => 10,) %>
would mean we want the listing of foo.c included in the current slide. That listing would correspond to an executable file named foo that would receive two arguments, so Halk would put a pair of boxes and a button for execution in the page. When the user pressed that button the engine would chdir to the slide's directory, execute foo, capture at most 10 lines of its output, and return a page with it below the listing. The result would look more or less as in this example that was running in the prototype, except for the number of arguments.

That's just an exemple, more parameters are supported and most of them are optional.

In general the method will try to follow DWIM. If there was no :src no listing would be included, though the form and execution would work. No :max_lines would mean to slurp the whole output. If :exec was true and no :bin was specified we'd call an interpreter if the source had a known extension or would execute the very program. If :src was given but :exec was false, then just the listing would be included... you get the idea.

Pretty-printing in Halk

As for pretty-printing, I have been pondering different solutions, from documenting ways to recurse a directory tree and generate HTML files from code, to including scripts that supported some beautifiers. But that's unnecessarily complicated.

The current decision about this issue is to support conversion in the very engine in a simple way: if the author wants pretty-printing or any other transformation (as removing shebangs) he can just tell Halk to use some filter to generate an HTML version of the source for inclusion. So the recursion is done by the very engine, and in most cases a simple sh one-liner written by the author will suffice.

5 Sep 2003 (updated 5 Sep 2003 at 06:59 UTC) »

A couple of ideas for Halk:

  • It should be possible to assign a timeout to a slide. If a slide had one we would send a refresh pointing to the next one. Good!
  • Since a presentation is run in a single process in stand-alone mode (no CGI mode is planned for 1.0) we can offer the possibility to record the time spent in each slide to allow authors study the timing of their presentations. That time would be defined as the seconds that lasted between the request for the current slide and the next request for a different slide. Simple but sounds kind of meaningful enough for the intended purpose.

    That can be munged from the logs in debug mode, but a clean file with timings already computed by the program (and independent of debug mode) would be far better.

This project is getting exciting!

29 Aug 2003 (updated 29 Aug 2003 at 15:27 UTC) »

diablod3, those numbers are rather old. There are up to date Advogato statistics, maintained by bagder. Have a look at this page and this page.

27 Aug 2003 (updated 27 Aug 2003 at 22:24 UTC) »

I have come across Exerb, which produces a Windows executable from your Ruby program with an interpreter included, so it won't even require Ruby installed to be executed in Windows. Moreover, you can make the binaries in any platform with Ruby.

That's yet another great thing for Halk, because authors will be able to distribute presentations for Windows as a stand-alone .exe file with no further requirements (unless the presentation itself have dependencies. For instance, a Python course would normally require Python installed to let the reader run the code snippets of the slides). I don't know the size of the executable, but at least you have the choice to provide one.

In another front, the look and feel one, a coworker pointed out the use of CSS and JavaScript in some HTML presentations by Brett Merkey. Hey, they have transitions!!!

I don't particularly like transition effects, but it would be a fancy feature, so I rapidly thought I could provide them out of the box so authors could use them if they wanted to just setting a parameter in the corresponding slide. They worked in Mozilla (Windows and Linux) and Explorer. They didn't in Konqueror (which cooled my enthusiasm down a bit), and don't know what happens with Galeon or Safari (if anyone tries please let me know, I would appreciate it very much to take a decision). So I don't know what to do. It might be reasonable to provide them with a warning (the slides themselves would be seen alright in Konqueror anyway, so having the transition won't hurt in that particular browser).

Talking about browsers, another cool thing about all this is that with a non-fancy theme (which I'll certainly provide with the distribution) one will be able to give the presentations in some text-based web browser. I saw a couple of presentations running on terminals in the YAPC::Europe and they were way cool. BTW, the engine of one of them, which was about Parrot's performance, was written in Parrot bytecode, how devil.

I am not coding this week to let the ideas rest a bit.

25 Aug 2003 (updated 25 Aug 2003 at 00:56 UTC) »

Halk

Halk has error handling and logging written. There's now a file cache that can optionally stat files on disk to keep them up to date in memory. Templates are cached in compiled form. Presentations have associated a theme (look and feel), but individual slides can be configured to use some other theme. I don't know whether that will be used, but looked funny and easy to implement.

The executable has a --port option. But if no port is specified the HTTP server tries to find silently an available port in the range 8000..8999 automatically. This way Windows users will be able to just double-click and have the default browser launched showing the cover of the presentation.

I'll play a bit with this recently released Tar2RubyScript. If the user (presentation writer) had it installed it would be possible to put the whole dynamic presentation in a single ready-to-run Ruby file, which would be ideal for its distribution.

Reading

I have just finished the second edition of Mastering Regular Expressions. It's a really impressive work. Regular expressions is supposed to be a dry topic, but this book treats the subject in a very exciting way.

229 older entries...

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!