28 Dec 2004 titus   » (Journeyer)

Web browsing via Python, refreshed.

The saga continues... after many patches, and diversions, and attempts to grok, I finally got PBP to work for a variety of purposes. In particular, I can now:

  • Use maxq to record browsing sessions to PBP scripts, and run those scripts successfully in PBP;
  • Use an only slightly altered HTMLParser.HTMLParser class to parse the moderately crappy SourceForge/mailman output;
  • Grok RFC 2965 cookies well enough to actually log into and play with mailman admindb pages;
  • Save, load, and view cookies in PBP.

I've submitted a bunch of patches to PBP & hopefully Cory can take a look at them in the next few weeks.

HTML/HTTP/URL support in Python's base is surprisingly messy, given my past experience with Python modules. I look forward to the day when Python 2.4 or above is the standard and 2.3 is no longer supported -- it will make cookie and URL handling much simpler!

I may spend some quality time refitting the HTMLParser classes in the htmllib and HTMLParser modules; as-is, they don't have good failure modes.

--titus

"He realized the fastest way to change is to laugh at your own
folly -- then you can let go and quickly move on."
-- via Dossy Shiobara

Latest blog entries     Older blog entries

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!