17 Sep 2003 fxn   » (Master)

The last Ruby one-liner I posted yesterday is one of the prettiests I've seen, it was sent by Kurt M. Dresner to ruby-talk:

    ruby -pe 'gsub!(/\B\w+\B/){$&.split(//).sort_by{rand}.join}'
This is how it works, in case anyone is interested.

We want to filter a given text shuffling the letters of each word, except the first and the last. For instance, given

    This is an example
    text.
a possible output could be
    Tihs is an elampxe
    txet.

Note that punctuation, whitespace, etc. have to be preserved, and words one, two, or three letters long must remain untouched.

OK, in the first place, since a word cannot have a newline we can safely write a line-oriented loop. The -p flag does this, and we can imagine it wraps the code like this:

    while there are more lines
        assign next line to $_
        execute the code
        print $_
    end
so we can take advantage of that to modify each line through $_ and have it automatically printed afterwards.

This is what gsub! does, modifies $_ in place. In the form we call the method (gsub! is a method of the Kernel class) it receives a regular expression and a block of code. The g in gsub! means global and we are going to perform a global subtitution in $_, for each match, the matched substring of $_ will be substituted by the value returned by the block.

The regular expression \B\w+\B means match word letters in a row, but only if it is the case that to the left and to the right of the chain there are NOT word boundaries. Given the word foobar we can visualize the regex engine working like this:

  1. f matches \w, does it have a word boundary to its left? Yes, so forget about it.
  2. o matches \w, does it have a word boundary to its left? No, so keep on matching.
  3. The quantifier makes the engine advance until the end of the word, that is, oobar.
  4. Do we have a word-boundary to the right? Yes, so backtrack one word letter.
  5. Now, do we have a word boundary to the right? No, so we've got a match, which is ooba.

You see, we match with that regex exactly the part of the word we want to munge. In addition, note that words with just one or two letters do not match because at any given letter we have a word-boundary in some side.

In the block $& refers to the matched substring, and $&.split(//) splits it using the empty pattern, which results in the array of its letters. In the example above that's ['o', 'o', 'b', 'a'].

And this is the shining star of the one-liner: the Array class does not have a method to shuffle a given array, so we need some short code to emulate it. Array#sort_by {|a| a.method} sorts a given array according to the value returned by a.method for each element a of the array. The clever trick is to plainly ignore the item passed to the block and just call rand. Awesome!

The array returned by Array#sort_by contains the letters we wanted to shuffle in random order, so we just need to join them back to get the string we want ooba to be substitued with.

Latest blog entries     Older blog entries

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!