Older blog entries for avassalotti (starting at number 30)

Summer of Code Weekly #4

All is well for me and my project. I finished the merge of cStringIO and StringIO, and I am now moving to the more challenging cPickle/pickle merge. During the last two weeks, I mostly spend my time analyzing the pickle module and thinking how I will clean up cPickle. My current plan is:

  1. Make cPickle’s source code conform to PEP-7.
  2. Remove the dependency on the now obsolete cStringIO.
  3. Benchmark cPickle and pickle.
  4. Add subclassing support to Pickler/Unpickler.
  5. Reduce the size of cPickle’s source code based on the bottlenecks found by the benchmarks.

Hopefully, cPickle/pickle merge will be as smooth (and as fun) as the cStringIO/StringIO merge.

Syndicated 2007-07-06 19:16:35 (Updated 2007-07-06 19:17:31) from Alexandre Vassalotti

18 Jun 2007 (updated 19 Jun 2007 at 21:07 UTC) »

Pickle: An interesting stack language

The pickle module provides a convenient method to add data persistence to your Python programs. How it does that, is pure magic to most people. However, in reality, it is simple. The output of a pickle is a “program” able to create Python data-structures. A limited stack language is used to write these programs. By limited, I mean you can’t write anything fancy like a for-loop or an if-statement. Yet, I found it interesting to learn. That is why I would like to share my little discovery.

Throughout this post, I use a simple interpreter to load pickle streams. Just copy-and-paste the following code in a file:

  import code
import pickle
import sys

sys.ps1 = "pik> "
sys.ps2 = "...> "
banner = "Pik -- The stupid pickle loader.\nPress Ctrl-D to quit."

class PikConsole(code.InteractiveConsole):
    def runsource(self, source, filename="<stdin>"):
        if not source.endswith(pickle.STOP):
            return True  # more input is needed
        try:
            print repr(pickle.loads(source))
        except:
            self.showsyntaxerror(filename)
        return False

pik = PikConsole()
pik.interact(banner)

Then, launch it with Python:

  $ python pik.py
Pik -- The stupid pickle loader.
Press Ctrl-D to quit.
pik>

So, nothing crazy yet. The easiest objects to create are the empty one. For example, to create a empty list:

  pik> ].
[]

Similarly, you can also create a dictionary and a tuple:

  pik> }.
{}
pik> ).
()

Remark that every pickle stream ends with a period. That symbol pops the topmost object from the stack and returns it. So, let’s say you pile up a series of integers and end the stream. Then, the result will be last item you entered:

  pik> I1
...> I2
...> I3
...> .
3

As you see, an integer starts with the symbol ‘I’ and end with a newline. Strings, and floating-point number are represented in a similar fashion:

  pik> F1.0
...> .
1.0
pik> S'abc'
...> .
'abc'
pik> Vabc
...> .
u'abc'

Now that you know the basics, we can move to something slightly more complex — constructing compound objects. As you will see later, tuples are everywhere in Python, so let’s begin with that one:

  pik> (I1
...> S'abc'
...> F2.0
...> t.
(1, 'abc', 2.0)

There is two new symbols in this example, ‘(’ and ‘t’. The ‘(’ is simply a marker. It is a object in the stack that tells the tuple builder, ‘t’, when to stop. The tuple builder pops items from the stack until it reaches a marker. Then, it creates a tuple with these items and pushes this tuple back on the stack. You can use multiple markers to construct a nested tuple:

  pik> (I1
...> (I2
...> I3
...> tt.
(1, (2, 3))

You use a similar method to build a list or a dictionary:

  pik> (I0
...> I1
...> I2
...> l.
[0, 1, 2]
pik> (S'red'
...> I00
...> S'blue'
...> I01
...> d.
{'blue': True, 'red': False}

The only difference is that dictionary items are packed by key/value pairs. Note that I slipped in the symbols for True and False, which looks like the integers 0 and 1, but with an extra zero.

Like tuples, you can nest lists and dictionaries:

  pik> ((I1
...> I2
...> t(I3
...> I4
...> ld.
{(1, 2): [3, 4]}

There is another method for creating lists or dictionaries. Instead of using a marker to delimit a compound object, you create an empty one and add stuff to it:

  pik> ]I0
...> aI1
...> aI2
...> a.
[0, 1, 2]

The symbols ‘a’ means “append”. It pops an item and a list; appends the item to the list; and finally, pushes the list back on the stack. Here how you do a nested list with this method:

  pik> ]I0
...> a]I1
...> aI2
...> aa.
[0, [1, 2]]

If this is not cryptic enough for you, consider this:

  pik> (lI0
...> a(lI1
...> aI2
...> aa.
[0, [1, 2]]

Instead of using the empty list symbol, ‘]’, I used a marker immediately followed by a list builder to create an empty list. That is the notation the Pickler object uses, by default, when dumping objects.

Like lists, dictionaries can be constructed using a similar method:

  pik> }S'red'
...> I1
...> sS'blue'
...> I2
...> s.
{'blue': 2, 'red': 1}

However, to set items to a dictionary you use the symbol ‘s’, not ‘a’. Unlike ‘a’, it takes a key/value pair instead of a single item.

You can build recursive data-structures, too:

  pik> (Vzoom
...> lp0
...> g0
...> a.
[u'zoom', [...]]

The trick is to use a “register” (or as called in pickle, a memo). The ‘p’ symbol (for “put”) copies the top item of the stack in a memo. Here, I used ‘0’ for the name of the memo, but it could have been anything. To get the item back, you use the symbol ‘g’. It will copy an item from a memo and put it on top of the stack.

But, what about sets? Now, we have a small problem, since there is no special notation for building sets. The only way to build a set is to call the built-in function set() on a list (or a tuple):

  pik> c__builtin__
...> set
...> ((S'a'
...> S'a'
...> S'b'
...> ltR.
set(['a', 'b'])

There is a few new things here. The ‘c’ symbol retrieves an object from a module and puts it on the stack. And the reduce symbol, ‘R’, apply a tuple to a function. Same semantic again, ‘R’ pops a tuple and a function from the stack, then pushes the result back on it. So, the above example is roughly the equivalent of the following in Python:

  >>> import __builtin__
>>> apply(__builtin__.set, (['a', 'a', 'b'],))

Or, using the star notation:

  >>> __builtin__.set(*(['a', 'a', 'b'],))

And, that is the same thing as writing:

  >>> set(['a', 'a', 'b'])

Or shorter even, using the set notation from the upcoming Python 3000:

  >>> {'a', 'a', 'b'}

These two new symbols, ‘t’ and ‘R’, allows us to execute arbitrary code from the standard library. So, you must be careful to never load untrusted pickle streams. Someone malicious could easily slip in the stream a command to delete your data. Meanwhile, you can use that power for something less evil, like launching a clock:

  pik> cos
...> system
...> (S'xclock'
...> tR.

Even if the language doesn’t support looping directly, that doesn’t stop you from using the implicit loops:

  pik> c__builtin__
...> map
...> (cmath
...> sqrt
...> c__builtin__
...> range
...> (I1
...> I10
...> tRtR.
[1.0, 1.4142135623730951, 1.7320508075688772, 2.0, 2.2360679774997898,
2.4494897427831779, 2.6457513110645907, 2.8284271247461903, 3.0]

I am sure you could you fake an if-statement by defining it as a function, and then load it from a module.

  def my_if(cond, then_val, else_val):
    if cond:
        return then_val
    else:
        return else_val

That works well for simple cases:

  >>> my_if(True, 1, 0)
1
>>> my_if(False, 1, 0)
0

However, you run into some problems if mix that with recursion:

  >>> def factorial(n):
...     return my_if(n == 1,
...                  1, n * factorial(n - 1))
... 
>>> factorial(2)
RuntimeError: maximum recursion depth exceeded in cmp

On the other hand, I don’t think you really want to create recursive pickle streams, unless you want to win an obfuscated code contest.

That is about all I had to say about this simple stack language. There is a few things haven’t told you about, but I sure you will be able figure them out. Just read the source code of the pickle module. And, take a look at the pickletools module, which provides a disassembler for pickle streams. As always, comments are welcome.

Syndicated 2007-06-18 18:14:17 (Updated 2007-06-18 21:23:25) from Alexandre Vassalotti

8 Jun 2007 (updated 19 Jun 2007 at 21:07 UTC) »

Summer of Code Weekly #3

During this third week of the Summer of Code, I found very difficult to concentrate on my work — I been a lightbulb instead of a laser. The result was little code done. On the other hand, I learned a lot about other things. For example, I now finally understand assembly language; how to use gdb; the basics of the design of the Linux kernel; etc, etc.

I also read the book “Producing Open Source Software”, by Karl Fogel. It is really good primer to the world of free software. If you have a burning desire to contribute open source projects, just like me, I highly recommend that you get your own copy, or read it online.

Syndicated 2007-06-08 00:55:14 (Updated 2007-06-08 00:56:31) from Alexandre Vassalotti

1 Jun 2007 (updated 19 Jun 2007 at 21:07 UTC) »

Summer of Code Weekly #2

I can confirm it now, this second week of coding was even better. It was harder on my brain cells, though. I am mostly done with the StringIO merge. I now have working implementations in C of the BytesIO and the StringIO objects. The only thing remaining to do, for these two modules, is polishing the unit tests. And that shouldn’t that me very long to do. So, in basically one week of work, I completed the merge of cStringIO. I am certainly proud of that.

Now, I will need to attack the cPickle and cProfile modules. I don’t know yet which I work on first. cPickle still seems very scary to me, and unlike cStringIO it’s huge. It’s about five or six times bigger. cProfile, on the other hand, is about the same size of cStringIO and well documented. I even wonder if I need to code anything for cProfile. It will be a piece of cake to merge. Now, one question remains: should I take the cake now, or keep it for the end?

Syndicated 2007-05-31 23:51:29 (Updated 2007-05-31 23:54:36) from Alexandre Vassalotti

25 May 2007 (updated 19 Jun 2007 at 21:07 UTC) »

Summer of Code Weekly #1

During this summer, I will post each week a short summary of what I did, the challenges I encountered and what I learned during my Summer of Code project. I am doing this for helping me to keep track of my progresses.

So how was my first week? It was great. I don’t know why but I love programming in C. It is just plain fun. I thought learning Python C API was going to be hard, but it is quite easy after all. I just read the code in Python itself and check the reference manual for the things I don’t know. My biggest surprise, this week, was really learning how to do subclassable types. It is strikingly easy, however it’s quite verbose. You can look at my scratch extension module, if you want a minimal working example.

Other than learning the C API, I started working on the cStringIO/StringIO merge. My current plan is to separate the cStringIO module into two private submodules, _bytes_io and _string_io. One will be for bytes literals (ASCII), and the other for Unicode. This will reflect the changes made to the I/O subsystem in Python 3000. These two submodules will provide optional implementations for the speed-critical methods, like .read() and .write().

One the best things, of this week, was the great feedback I got from other Python developers, and particularly from my mentor Brett Cannon, who cheerfully answers all my questions. Now, I just hope the following week will be as fun, or even more, as this one.

Syndicated 2007-05-25 03:14:37 (Updated 2007-05-28 15:57:35) from Alexandre Vassalotti

12 May 2007 (updated 24 May 2007 at 23:26 UTC) »
12 May 2007 (updated 9 Jul 2007 at 21:08 UTC) »

Blogging with Emacs

This is my first blog entry with my brand new toy, the weblogging mode for Emacs. It uses the XML-RPC interface of your favorite blogging platform to manage your blog. In other words, it transformes Emacs into thermonuclear blog editor.

Even better, the installation is simple and easy. Here the instructions how to get it working. First, check out the source code of weblogger into your .emacs.d directory:

  cd ~/.emacs.d/
cvs -z3 -d:pserver:anonymous@cvs.savannah.nongnu.org:/sources/emacsweblogs \
  co -d weblogger weblogger/lisp

Then, make Emacs load this mode on startup by adding these two lines to your .emacs configuration:

  (add-to-list 'load-path "~/.emacs.d/weblogger")
(require 'weblogger)

Now, you probably want to reload your configuration with M-x eval-buffer (assuming your .emacs is still open). Finally, setup weblogger for your blog with M-x weblogger-setup-weblog. This command will ask you a few simple questions, like your username and password for your blog. It will also ask you for the location of the XML-RPC interface of your blog. If you’re using Wordpress, it will be somewhere like http://example.com/blog/xmlrpc.php. If you’re using another blog publishing platform like Blogger or MovableType, it will be somewhere else, so check your documentation.

And you’re done! You can now start new a new post with M-x weblogger-start-entry. Weblogger also includes a whole set of other commands for managing your blog. Look them up, with C-h a weblogger RET. Happy blogging!

Syndicated 2007-05-12 03:37:42 (Updated 2007-05-12 18:14:34) from Alexandre Vassalotti

11 May 2007 (updated 19 Jun 2007 at 21:07 UTC) »

Am I dreaming?

Syndicated 2007-05-11 00:38:06 (Updated 2007-05-11 00:43:53) from Alexandre Vassalotti

4 May 2007 (updated 19 Jun 2007 at 21:07 UTC) »

View plain text emails in fixed font in Gmail

Quick hack: here a script for Greasemonkey that changes the default proportional font to fixed font on Gmail. I was tired reading distorted PEP, and code patches. And since Gmail doesn’t allow changing the font style, I had to write this simple script. Enjoy!

Syndicated 2007-05-04 02:31:34 (Updated 2007-06-16 14:40:09) from Alexandre Vassalotti

3 May 2007 (updated 19 Jun 2007 at 21:07 UTC) »

Almost Summer

Bird are singing; the Sun is rising; and rivers are flowing again. In short, another beautiful summer is coming. To me that means the end of classes, and a wave of exams which will crush me for a week. But after, I will be free to do what I love — i.e. coding on open-source software, writing, and enjoying the weather while playing sports with my friends. Anyway, enough dreaming for today, I got some news.

I just finished the facelift to my blog’s layout. The new layout still keeps its original simplicity, while being more colorful and appealing. Personally, I am quite satisfied with the result. And thanks to two Firefox add-ons, called Firebug and Web Developer, the whole process was a breeze (and fun too). While I am at it, I would like to also thanks Becca Wei for the initial theme, Almost Spring, on which I built my theme upon. Feel free to comment about what you like or dislike. Since it’s you after all, who will use it (unless, of course, you’re using a feed reader).

A few longer posts will be coming up after I passed through my exams sessions. Plus, there will a weekly post about the status of my Google Summer of Code project. Thanks for reading!

Syndicated 2007-05-03 22:24:47 (Updated 2007-05-03 22:29:35) from Alexandre Vassalotti

21 older entries...

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!