23 Mar 2008 mikal   » (Journeyer)

The Internet is a strange place

As mentioned previously, I've been downloading HTTP pages as part of my survey of Internet mail servers in order to detect domain parking behaviour. I should have thought a bit harder about that code though, because the implementation is a bit naive. Specifically, the code downloads the source of the web page (to RAM), and then base64 encodes it (to RAM), and finally writes it to the log file. That means that there is a little bit more than two copies of a given page's source in RAM before the operation is complete. However, it hadn't occurred to me that sites such as http://sixela.com/ would exist. That URL results in an endless stream of the word "blah". It took me three worker deaths before I had figured out what the problem was, mainly because when workers use to much RAM their slice is killed, and often the log files are lost.

So the moral of this tale? Don't trust the Internets.

Tags for this post: research(S)

Comment on this post

Syndicated 2008-03-24 07:14:00 from stillhq.com : Mikal, a geek from Canberra living in Silicon Valley

Latest blog entries     Older blog entries

New Advogato Features

FOAF updates: Trust rankings are now exported, making the data available to other users and websites. An external FOAF URI has been added, allowing users to link to an additional FOAF file.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!