Older blog entries for ralsina (starting at number 492)

Scraping doesn't hurt

I am in general allergic to HTML, specially when it comes to parsing it. However, every now and then something comes up and it's fun to keep the muscles stretched.

So, consider the Ted Talks site. They have a really nice table with information about their talks, just in case you want to do something with them.

But how do you get that information? By scraping it. And what's an easy way to do it? By using Python and BeautifulSoup:

from BeautifulSoup import BeautifulSoup
import urllib

# Read the whole page.
data = urllib.urlopen('http://www.ted.com/talks/quick-list').read()
# Parse it
soup = BeautifulSoup(data)

# Find the table with the data
table = soup.findAll('table', attrs= {"class": "downloads notranslate"})[0]
# Get the rows, skip the first one
rows = table.findAll('tr')[1:]

items = []
# For each row, get the data
# And store it somewhere
for row in rows:
    cells = row.findAll('td')
    item = {}
    item['date'] = cells[0].text
    item['event'] = cells[1].text
    item['title'] = cells[2].text
    item['duration'] = cells[3].text
    item['links'] = [a['href'] for a in cells[4].findAll('a')]
    items.append(item)

And that's it! Surprisingly pain-free!


Syndicated 2012-02-17 20:34:33 from Lateral Opinion

To write, and to write what.

Some of you may know I have written about 30% of a book, called "Python No Muerde", available at http://nomuerde.netmanagers.com.ar (in spanish only).That book has stagnated for a long time.

On the other hand, I wrote a very popular series of posts, called PyQt by Example, which has (you guessed it) stagnated for a long time.

The main problem with the book was that I tried to cover way too much ground. When complete, it would be a 500 page book, and that would involve writing half a dozen example apps, some of them in areas I am no expert.

The main problem with the post series is that the example is lame (a TODO app!) and expanding it is boring.

¡So, what better way to fix both things at once, than to merge them!

I will leave Python No Muerde as it is, and will do a new book, called PyQt No Muerde. It will keep the tone and language of Python No Muerde, and will even share some chapters, but will focus on developing a PyQt app or two, instead of the much more ambitious goals of Python No Muerde. It will be about 200 pages.

I have acquired permission from my superiors (my wife) to work on this project a couple of hours a day, in the early morning. So, it may move forward, or it may not. This is, as usual, an experiment, not a promise.


Syndicated 2012-02-17 03:18:35 from Lateral Opinion

Antisocial Networks

I love http://goodreads.com very much. It has measurably improved my life as a reader. I have read authors I wouldn't have read without it, books from those authors I would have ignored, and keeps track of what I read, am reading and will read.

What it has never been for me, is a social network. I would be about as happy with it if I knew noone else on the site, if it were just me and a bazillion strangers whose taste I can leech off.

Sure, I have a few friends there nowadays, but I hardly ever do anything "social" beyond accepting requests and posting reviews which I have no idea if someone reads.

I love Flickr where I put most of my pictures (soon: all of my pictures). It's cheap and I can upload an almost infinite amount of pics there, and I can share them with friends and family sometimes (by reposting them to facebook).

They were even kind enough to store the pictures I uploaded as a free user until I paid for the space to store them 5 years later.

I love Twitter because it's a place to post short things that don't deserve a blog post, to chatter with friends and not-so-friends, to know more people, and to waste some time every day.

One of those things is not like the others. One of those things I use for its social features, the others I use for other reasons, and I don't really care about them being social or not.

I think nowadays, for a social network to succeed, it has to cater to the antisocial, at least at first, when you know noone there. I don't go to Flickr to debate. I don't go to Goodreads to chat. I go there to put pictures and keep my books straight. And that's what kept me there long enough to meet people.


Syndicated 2012-02-16 01:47:18 from Lateral Opinion

The blogs I don't have

  • Things you only like or believe because your mom said so.
  • Tips for Time Travelers.
  • Cute plants and their antics.
  • 1001 ways to peal a cat.
  • Things morticians say.
  • Traveling for Time Tippers.
  • Coins of the world: what do they taste like?
  • Things found in people's noses.
  • Surprise, that is not chicken!
  • Time for Tip Travelers.
  • World of Lint.


Syndicated 2012-02-15 20:36:53 from Lateral Opinion

PyQt Quickie: Don't Get Garbage Collected

There is one area where Qt and Python (and in consequence PyQt) have major disagreements. That area is memory management.

While Qt has its own mechanisms to handle object allocation and disposal (the hierarchical QObject trees, smart pointers, etc.), PyQt runs on Python, so it has garbage collection.

Let's consider a simple example:

from PyQt4 import QtCore

def finished():
    print "The process is done!"
    # Quit the app
    QtCore.QCoreApplication.instance().quit()

def launch_process():
    # Do something asynchronously
    proc = QtCore.QProcess()
    proc.start("/bin/sleep 3")
    # After it finishes, call finished
    proc.finished.connect(finished)

def main():
    app = QtCore.QCoreApplication([])
    # Launch the process
    launch_process()
    app.exec_()

main()

If you run this, this is what will happen:

QProcess: Destroyed while process is still running.
The process is done!

Plus, the script never ends. Fun! The problem is that proc is being deleted at the end of launch_process because there are no more references to it.

Here is a better way to do it:

from PyQt4 import QtCore

processes = set([])

def finished():
    print "The process is done!"
    # Quit the app
    QtCore.QCoreApplication.instance().quit()

def launch_process():
    # Do something asynchronously
    proc = QtCore.QProcess()
    processes.add(proc)
    proc.start("/bin/sleep 3")
    # After it finishes, call finished
    proc.finished.connect(finished)

def main():
    app = QtCore.QCoreApplication([])
    # Launch the process
    launch_process()
    app.exec_()

main()

Here, we add a global processes set and add proc there so we always keep a reference to it. Now, the program works as intended. However, it still has an issue: we are leaking QProcess objects.

While in this case the leak is very short-lived, since we are ending the program right after the process ends, in a real program this is not a good idea.

So, we would need to add a way to remove proc from processes in finished. This is not as easy as it may seem. Here is an idea that will not work as you expect:

def launch_process():
    # Do something asynchronously
    proc = QtCore.QProcess()
    processes.add(proc)
    proc.start("/bin/sleep 3")
    # Remove the process from the global set when done
    proc.finished.connect(lambda: processes.remove(proc))
    # After it finishes, call finished
    proc.finished.connect(finished)

In this version, we will still leak proc, even though processes is empty! Why? Because we are keeping a reference to proc in the lambda!

I don't really have a good answer for that that doesn't involve turning everything into members of a QObject and using sender to figure out what process is ending, or using QSignalMapper. That version is left as an exercise.


Syndicated 2012-02-10 22:57:35 from Lateral Opinion

Es sobre Divididos, debe ser en dos partes.

Again: spanish only!


Después de mi post de ayer acerca de la letra de "Paisano de Hurlingham" recibí un aluvión de correcciones y explicaciones, que enumero a continuación.

Opalina:

Es una referencia a Opalinas Hurlingham, una fábrica abierta en 1948, y abandonada desde 1994. Hay una interesante colección de fotos de su interior en flickr

La empresa fué a la quiebra principalmente por un juicio pionero sobre daño ambiental: envenenaba las napas de la zona con arsénico.

La línea de tren:
Es el ramal Retiro/Pilar del ferrocarril San Martín, que efectivamente pasa por Hurlingham. Mea culpa.
Abejas con ombú:
Las abejas son trabajadoras. Ombú es la marca más conocida de ropa de trabajo en Argentina. Habla tal vez mal de mí que no se me ocurriera.
Sapo explota:

Mi mamá hacía explotar sapos forzándolos a fumar, cuando era chica. No he oído otra referencia a sapos explosivos.

Sigue siendo un misterio el porqué el sapo explota los domingos a las 10.

Berretín de mayor:
Jerga zonal acerca de ser un malandra.

Me acercan rock checo, aunque no eslovaco.

Desde ya muchas gracias por sus aportes!


Syndicated 2012-02-08 22:49:10 from Lateral Opinion

Visto de cerca, todo está hecho de nada

Sorry, spanish only post. But you can listen to the song here


Divididos

Si tuviera 5 blogs y la energía para postear en todos, el cuarto sería "analizando demasiado la letra de canciones". En homenaje a ese blog que nunca va a existir, este sería el primer post: Paisano de Hurlingham, de Divididos (probablemente mi banda favorita).

Primero, la letra completa:

Paisano de Hurlingham
poda neblina
moneda o botón
ciego bilingüe
paso morales fue
sin la opalina
de Retiro a Pilar
busca el chancho al chabon.
Sapo explota en San Martín
los domingos a las diez
sable recto en la estación
berretin de mayor.
Canilla en el anden
gotea noticias
te grita el titular
mentiras sin picar.
Abejas con ombu
viajando en el panal
va la timba en el furgón.

Al parecer hay un cierto consenso (entre las tres personas a quienes les pregunté), de que esta canción es la descripción de un viaje en tren.

Las letras de Divididos no se caracterizan por ser interpretables linealmente, De hecho, sospecho que la mayoría son simplemente una serie de palabras una después de la otra porque "suenan bien juntas".

El primer verso "Paisano de Hurlingham" es el título mismo de la canción, y, supongo, el protagonista de esta mínima odisea suburbana. "poda neblina" es interesante. No encuentro (gracias google) ninguna referencia a esa frase fuera de esta canción. Es posible que nadie jamás haya dicho "poda neblina" hasta que Mollo cantó esa estrofa.

Para que se hagan una idea de lo raro que es eso, hay dos referencias independientes a "navaja desierta", que son dos palabras al azar sacadas del diccionario. ¿Entonces, qué es "poda neblina"? Bueno, si es muy temprano, hay neblina, y el paisano la atraviesa, la corta, la poda. Así que, tirando de los pelos (que lo vamos a hacer bastante), podemos suponer que nos ubica temporalmente en una madrugada neblinosa.

Más obvio es "moneda o botón". Habla de hacer trampa, de pasar un botón donde debería haber una moneda. No se puede hacer eso con el cajero, ni con un vendedor, pero sí se puede en la limosna del "ciego bilingüe".

"paso morales" es obvia. Es la calle Paso Morales, en Villa Tesei. De hecho esa calle corta la vía del tren que viene de Chacarita, lo que confirma que hablamos de un viaje por las vías.

Es difícil justificar "sin la opalina". De hecho, no lo voy a intentar. "de Retiro a Pilar / busca el chancho al chabón" es tal vez la línea más obvia: es un guarda de tren que busca a un pasajero que no paga, y lo persigue desde Retiro a Pilar. El problema con esa obviedad es que:

  1. El tren que hace Retiro/Pilar no pasa por Hurlingham
  2. El tren que corta Paso Morales sale de Chacarita (y sí pasa por Hurlingham).

¿Es tal vez que "De Chacarita a Hurlingham" es imposible desde un punto de vista de métrica? Sería comprensible si así fuera.

"Sapo explota en San Martín / los domingos a las diez" es oscura. Además de que ninguna de las líneas de ferrocarril mencionadas pasa por San Martín. ¡Pero el recorrido Retiro/Pilar es de la línea San Martín! Si bien no logramos esclarecer qué sapo explota, porqué ni dónde, sí sabemos el cuándo. Esto fortalece la hipótesis de que el paisano por algún motivo está yendo a Pilar.

También es sanmartiniana la referencia al "sable recto en la estación", por contraposición al famoso sable corvo del General. Que nunca jamás tuvo el grado de Mayor (ascendió de capitán a general), lo que complica encasillar "berretín de mayor".

"Canilla en el andén / gotea noticias / te grita el titular / mentiras sin picar" es directa. Un canillita, un canilla, es un vendedor de diarios. Las canillas gotean, los canillitas gotean noticias. Gritan los titulares (aunque creo que ningún canillita grita los títulos desde 1947 o algo así). Mentiras sin picar, porque el papel no está picado, todavía, porque es un diario de hoy.

Y llegamos a la estrofa final, "Abejas con ombu / viajando en el panal / va la timba en el furgón." Me resisto a dar una interpretación, mas allá de que las abejas en el panal van apretadas, y que ombú es una marca de papel para armar cigarrillos, que seguramente la gente de la banda ha usado en abundancia para drogarse, un ingrediente que sospecho importante en la escritura de sus letras.

¿Qué conclusión podemos sacar de este análisis? Bueno, yo, personalmente, preferiría no entender lo que dicen, que Divididos fuera una banda de rock eslovaco, y poder sentir la patada en la frente que es esta canción sin tratar de entender qué carajo es "sapo explota en san martín". Pero eso es un problema mío.


Syndicated 2012-02-07 23:12:07 from Lateral Opinion

I've seen things you people wouldn't believe. Attack ships on fire off the shoulder of Orion.

I am very distracted when I walk down the street. Or rather, I am paying a lot of attention, but it's spread over a whole lot of different things.

http://s1.i1.picplzthumbs.com/upload/img/97/a6/53/97a653243bafc7a94a84ef41c5c5a361ef5d1647_wmeg.jpg

Her name is Faith Popcorn. Seen on Mar del Plata.

My favourite, since I am a compulsive reader, is reading street signs. There is always something off about signs in a foreign country. They are either about things other countries don't care about, or are written in a completely different style.

http://s0.i1.picplzthumbs.com/upload/img/14/b6/0d/14b60d5b36ab730e06a5926ccf72bcdfa435e93a_wmeg_00001.jpg

Fixed hair braiding prices in Bahamas

And sometimes you run into things you just have never seen before. Those things can be found anywhere, and can be anything, since ... well, you have never seen them before.

http://farm4.staticflickr.com/3094/5716714548_e2417400f0.jpg

And now, a hydrant wearing a sweater in Budapest.

It doesn't have to be something really strange, it may just be something you have not seen before by chance.

http://farm6.staticflickr.com/5107/5680154086_a29bb67376.jpg

Street sweepers get OCD too. Seen on San Isidro.

Or maybe you just figure something out right there and then.

http://farm4.staticflickr.com/3245/5706390321_dd9ee67e16.jpg

So that's why sugar cubes are better. Seen in Budapest.

Or ... you don't know what to say.

http://farm6.staticflickr.com/5143/5644622284_f35d445242.jpg

Seen at Tigre. I have no idea.

Or things you don't have where you come from.

http://farm6.staticflickr.com/5029/5624290042_a792d40365.jpg

Blimp! Seen in London.

Or they are just so polite to ask.

http://s0.i1.picplzthumbs.com/upload/img/9b/be/1e/9bbe1e8c53e414b0c1c652b6e8684586ddbe9d3e_400r.jpg

Or you don't understand at first.

http://farm3.staticflickr.com/2176/5716716950_fb9287dfd4.jpg

How did that kid get there?

And then you do.

http://farm4.staticflickr.com/3594/5716730390_1bb2c9e200.jpg

It's a trick fountain! Seen in Budapest.

Or maybe it's something you see every day, out of context.

http://farm4.staticflickr.com/3179/5716152091_0ef7f3d92d.jpg

A typical argentinian milanesa sandwich. Bought on the street in Budapest.

http://farm6.staticflickr.com/5228/5776570777_9d360518f2.jpg

A DIA% supermarket, like the one near my home. In Istanbul.

And sometimes it's something you never suspected even existed, or how it could exist.

http://farm4.staticflickr.com/3504/5751512857_132b97f1cc.jpg

This is a chapter in a turkish book. It takes place at my wedding.

Or out of context.

http://farm5.staticflickr.com/4140/4930179412_207cc28c56.jpg

Seen around the corner of my house.

Or alien.

http://s0.i1.picplzthumbs.com/upload/img/b5/7a/a0/b57aa073af7ffad86ea3b153d8cd32fd87fcf2cf_wmeg.jpg

Hotel towel, seen in Orlando, Florida.

Or

http://s0.i1.picplzthumbs.com/upload/img/19/bd/83/19bd830ac32416219d08727a50bc7233444a3830_wmeg.jpg

Yes, I did get a haircut. Seen in London.

Or

http://s1.i1.picplzthumbs.com/upload/img/55/46/56/554656e3ab605fa845871cdf621d2f619d86ae58_400r.jpg

Seen in Junín.

Or

http://s1.i1.picplzthumbs.com/upload/img/03/d5/cb/03d5cb310285dc21078f817c94e4dc9041643aed_wmeg.jpg

Ferry in Istanbul

Or

http://s0.i1.picplzthumbs.com/upload/img/53/b9/f2/53b9f2a04f82e7ab38742a6e872ac20debcb585a_400r.jpg

True TV Remote seen in a hotel in Avenida de Mayo, Buenos Aires, in 2004.

This, except for that TV remote, is just a small sample of what I have seen in the last 12 months. These have been a really cool 12 months.


Syndicated 2012-02-06 22:55:10 from Lateral Opinion

Caption contest!

IMAG0345

Yes, that's my son drinking mate out of Donald Duck's skull. Caption?


Syndicated 2012-02-05 21:40:00 from Lateral Opinion

483 older entries...

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!