Older blog entries for titus (starting at number 28)

8 Jan 2005 (updated 9 Jan 2005 at 02:19 UTC) »
"Batteries included" may be a bad term

After some of the recent types kerfuffle, I decided I didn't like the "batteries included" description of Python. Let me explain.

Python-the-language is great, and fits most of my needs. Heck, it's done that since before 2.0 came out. It's always nice to discover that some nifty new feature (like list comprehensions) has been added, and often I can take advantage of them to write tighter/neater/more understandable code. Nonetheless, I'm fundamentally a systems & application programmer, and I don't generally write language frameworks, toolkits, etc., so these new features aren't usually all that critical to my work.

What is critical is the fantastic standard library that comes with Python. It has been steadily expanded in the years that I've been a Python programmer, to the point that when I'm looking around on the 'net for some functionality it's already in the Python lib more often than not. This saves me a lot of time & trouble installing new products. (I've also said that I think that the state of the included Python code, as well as the quality of the documentation & example code, plays a big part in educating new Pythonistas about acceptable coding practices with Python. The Python cookbook is a particularly nice collection of code, too. These both play a big role in the success of Python, IMO.)

So, batteries are included, if you think of the library as the "batteries" -- something essential to function, yet not terribly interesting in its own right. Looking at that description of batteries, though, it sounds more like it fits my conception of the Python language, though. After all, the language that things are written in is essential, yet not always all that interesting; doing things in the real world is more interesting. For that you usually need to interact with users or protocols, and that means some sort of interface, and ... you end up with a library of code that enables a lot of standard interactions. In Python, that code is distributed with the interpreter, which enables a stunningly wide range of functionality in your basic Python installation.

In other words, I think the included library & modules are more interesting than the word "batteries" might imply ;).

There may be a deeper schism lurking behind the recent decorator & types controversy: language vs tools. It's very sexy to work on extending the language: there are plenty of deep problems to be tackled, much thought must be expended, and only really really smart thoughtful people can do it well. As Iwan van der Kleyn pointed out, though, there are other things to do that will extend Python's reach, power, and utility. It might be worth the core team having a look at those, if their goal is to advance Python as a whole. If, instead, the goal is to advance Python-the-language, then I think they're doing a fantastic job of it & should keep it up. Personally I'm more worried about the future of the library.

UPDATE: AMK posted about this months ago, and got a lot of responses. I think that's what got me started...

And, before the naysayers get up & yell at me for my multitude of sins, I do have some specific work and proposals in mind. It's hard to grasp the library as a whole, though. (It's easier to write a criticism than it is to do the work, too, that's for sure.) Watch This Space.

As for the long-term future of Python, Guido's first priority clearly needs to be to grow a beard.

--titus

Python's future -- language, library, or tools?

There's been quite some kerfuffle about Guido van Rossum's proposal to add optional static typing to Python. In particular, Iwan van der Kleyn pointed out that Python seems to be caught between language extenders and application programmers, and that at the moment the language extenders are winning. Iwan points out that for application programmers, there are several library or tool improvements that are probably as, or more, important than any language addition. While I don't agree with all of his feature/app requests, I do agree with the general idea that Python would benefit hugely from investment in something other than optional static typing. Or decorators. Or (insert feature of the week).

There was an ACM Queue article a while back (if someone has the link, I'd appreciate it!) that discussed the divide between language power users & tool power users. As I recall, the article described the split between developers who invest time in learning new and more powerful languages, and developers who invest time in learning IDEs and language-specific tools. What seems to be happening in Python is that GvR & the core team consist of very smart people who believe the language should be extended -- and are busy doing that. IDEs and tool extension is left to others & their efforts are patched into the library system where appropriate.

The real question about language extensions like optional type checking and decorators is how contaminatory they are. Istvan Albert believes that "optional" type checking will soon become effectively non-optional; while he mentions only the social (workplace) pressures, I'm more worried about whether or not the language will require it in places where I don't care to use it. If a library module uses type checking, will any code that uses that library module be required to adhere to the type behavior specified?

I can see three scenarios.

The first scenario is that any code that uses type-checked code will be required to adhere to the type spec. In the medium term, type checking will become mandatory throughout the codebase because people will start using it in the standard library.

The second scenario is that "duck typing" will magically solve the problem & allow untyped code to use typed code. This seems to me to be the best option, and I bet it's what GvR has in mind.

The third scenario is that there will simply be command-line arguments to Python (or internal configuration directives) that disable type checking completely, allowing schlubs like me to go about our day and ignore large portions of the language.

I'd be ok with either #2 or #3. #1 sounds like a disaster to me. Either way, I expect a large amount of energy to be expended on this set of features that, frankly, is not so important to me. (Maybe it should be, but I'm not betting on it... I like Bruce Eckel's take on static typing.)

Regardless, there's the question of whether the core team is focused too much on language development. Is the relentless drive for new, immensely powerful language features detracting from the stability and usability of the language?

It may be time to consider how to "fork" Python development so that it works more like Linux. Why not keep a 2.x branch going that focuses on long-term stability & library expansion, while building a 3.x branch with new language features but fairly complete backwards compatibility? In my traipses through the library it doesn't seem like the library really takes advantage of the new language features in 2.4, which (if true) in itself is a telling trend that suggests that language development and library development are orthogonal. Even if they aren't orthogonal, maybe they could be made more so, and this could be a worthwhile endeavor.

The balance between language stability and new features seems like it's tough to maintain. GvR et al. have done a fantastic job so far. Here's hoping they can keep doing a fantastic job!

--titus

4 Jan 2005 (updated 5 Jan 2005 at 00:41 UTC) »
OCaml and string theory -- how cool is that?!

Open-source vs closed-source Web dev

On our trip to Mammoth over New Year's, we had an interesting discussion about Web development frameworks, specifically open-source vs closed-source. One of the things that made it especially interesting was that apart from me & my wife (your standard OSS advocates) there was an MBA student & software developer as well as an engineer working in management at a construction company. So rather than discussing things with the usual group of my academic friends who buy into OSS by default, I had to actually come up with business arguments for OSS Web frameworks ;).

There were a two requirements that came up:

  • The engineer wanted accountability: if the company they hired to build a Web site or CMS defaulted or screwed up, he wanted to be able to sue them and/or recover their assets. Beyond that, he didn't really care whether or not they used OSS.

  • The MBA/programmer wanted better overall documentation. His experience with the Java-based ArsDigita CMS that RedHat bought was extremely frustrating because the documentation was essentially nonexistent. I and others have had similar experiences with Python CMSs.

Beyond these two considerations, I don't think either one had a problem with open-source vs closed source, which was nice to hear. In particular, there was no mention of OSS lacking features, security, or stability -- it seems like FUD on this account isn't reaching these two people.

My responses were basically this:

  • Re accountability, it seems to be difficult to recover assets no matter what. Large companies (e.g. Oracle and MS) that sell software tend to have EULA that nix accountability at this level. Consulting companies are either heavyweights that have lots of lawyers (e.g. IBM) or small businesses that have no money anyway.

    They both more-or-less agreed to this point & redefined their requirements to be that there was a company, with a reputation to manage, behind the software. When I pointed out that there were now many companies that were building on OSS, they shrugged & said that would be fine.

  • On the better documentation issue I agreed but pointed out that (a) a lot of closed-source software doesn't have great documentation either, and (b) most companies aren't going to care what documentation there is for their CMS if they have a consulting firm running it. Since it's kinda silly for a small non-IT firm to do more than manage their content, this point doesn't really matter for the end-user.

I added that one strong reason to go with an OSS Web CMS supported by multiple companies is company failure. A continual problem (exemplified by ArsDigita) is that companies go out of business & leave their customers high & dry. Maybe the base CMS is stable enough to work without modification, but any future customization or extension will be impossible with a closed CMS if the owners go out of business. With Zope or Plone, other companies (or even individual consultants) can pick up the pieces, if the company you had hired fails.

This seemed to resonate with both of our friends.

Anyway... 'nuff for now.

--titus

3 Jan 2005 (updated 3 Jan 2005 at 20:35 UTC) »
Is development of new software tools "moribund"?

I've always enjoyed Philip Greenspun's take on things; about 50% of what he says goes straight to the point. Unfortunately the other 50% seems to be complete crud. (I'm never sure how seriously he's taking himself, so it's hard to know if he's just trying to be thought-provoking. Even if not, separating out the gems from the crap is enjoyably difficult and hence worthwhile.)

His most recent weird post talked about a fairly simple Perl script to spam friends with invitations. In the comments, someone tweaked Philip about being a "Perl whore" despite his lauding of AOLserver, a Tcl-only (well, mostly) Web server. Philip responded that he had good reasons for liking AOLserver (connection pooling &c.), and since it happened to use Tcl, he used Tcl himself. Philip then felt compelled to say that he thought software tools were moribund, because his friends coding in .NET, his students working with Java, and people working with PHP weren't as productive as people who used AOLserver -- a technology designed over 10 years ago.

In the comments, I remarked that whinging about how Web software development tools haven't moved on since 1992 by citing Tcl, .NET, Java, and PHP was silly. Why not look at languages that support (and support well) a variety of techniques? Python (and by reputation Perl and Ruby) tools have come a loooong way in the last few years. In Python you can now choose between at least 5 different frameworks, all of them at least moderately mature. All of the popular Web programming techniques are represented between the various frameworks: the object publishing paradigm (Zope/Quixote), the object-kit templating paradigm (WebWare/CherryPy), and of course the utility belt that is Twisted.

The real question is, why not use Tcl, PHP, .NET, and Java for Web programming? It comes down to two conflicting considerations:

  1. the "each page is a class method" model encouraged by Java is overwrought for the simple pages that make up 20%-80% of any Web site (depending on the site, of course); it seems to be great for more complicated sites, though.

  2. the "each page is a string of spaghetti code" model encouraged by Tcl and PHP is great for the simple pages (20%-80% of any site) but disastrous for more complicated sites.

Or, to put it another way, straight scripting languages are great for constructing simple pages, while higher-level abstractions are needed for more complex pages. Most Web sites require both kinds of pages, but many languages do not support both kinds of programming well.

Python offers a great intermediate: it is a scripting language that deals well with outputting strings, but which offers nice higher-level ways of building out a framework. (This is likely true of other scripting languages that support object-oriented programming, such as Perl, [incr Tcl], Ruby, OCaml, ... what am I missing? VBA?)

Bas Scheffers pointed out (in the comments, again) that "it's not the tool, it's the programmer". Well, yes, this is the usual last resort of any language anti-advocate: heck, they're all identical at some level of abstraction, right? And you can write spaghetti code in any language, right? So how dare you say that language A is better than language B? Well, my experience tells me (and other people that you should trust more) that Python is better than both Tcl and Java for many things. And so the question is, why? I'm sure there are many considerations that make sense to people, but one that I haven't seen mentioned before is that of how examples are coded.

Since most programmers liberally "appropriate" code -- from books, from open-source programs, and especially from cookbooks and examples -- the quality of that code has a lot to do with the way they write programs. It also has a huge effect on the way they learn to code in the language. And I think this effect is grossly underestimated.

For Python, there's a nice tutorial, a variety of books (caveat: most of which I haven't read), and many, many examples from both the comp.lang.python newsgroup and the Python cookbook. By and large, the code contained in examples is clean, simple, short, and documented. It also lives at a level of abstraction that fits: classes/objects used when appropriate, and not used when not appropriate. Yes, there's plenty of gunk in Python, but it's not what you first encounter & it's easy to find pretty examples.

For other languages, example code is often much nastier. Spaghetti-code style may not be required by Tcl and PHP, but most of the example code I've seen can't be classed any other way. Overwrought object-oriented gunk may not be required by Java, but most of the example code I've seen certainly fits the description.

It could be that simple: if you write uncomplicated yet useful code examples, and people learn your language from them, then in the end your language will be used to write prettier code. And there will be less unmaintainable spaghetti code, or ugly overcomplex OOgunk, written in that language. And, if you're lucky & you've figured out how to grow your language over time with the help of the community, your language will continually move towards supporting that kind of coding.

Why Python examples are pretty is a different question, and the answer may be more sociological than technical. Perhaps it's as simple as Python fitting my brain better than other languages do, and it's not true for everyone else. Or, perhaps it's more than that -- for example, our beloved BDFL may be particularly good at designing a certain type of language.

In the end, supporting good mechanisms of abstraction may be necessary for good programming, but it is obviously not sufficient. It doesn't do much good to have a language that supports a bunch of mechanisms that don't cleanly fit into example code. Nor will it do the language any good in the long run if the example code is poorly constructed.

So, I don't think that the development of software tools is moribund -- but it may be time to move on from Tcl for doing Web programming.

Philip Greenspun's company died a horrible death several years ago partly because they were trying to transition a hideously complex mess o' Tcl into Java. I wonder what would have happened if they'd chosen to use something more scriptable than Java but a little more supportive of abstraction than Tcl?

--titus

p.s. I'm pretty supportive of using Java for other things, such as GUIs. I'm just not smart enough to make it do data reduction fast... so I switched to C++ / FLTK.

28 Dec 2004 (updated 28 Dec 2004 at 23:31 UTC) »

Odds and ends today...

PBP & SF hackage

SourceForge announce lists full of spam? Try this PBP script, with the rematch.py extension:

pyload rematch.py

go http://lists.sourceforge.net/lists/admindb/listname fv 1 adminpw pass submit 1

do set_match --form 1 "\\\\d+$" 3 submit 1

'twil flush out all messages in your queue...

Decorators in Quixote

Kevin Dangoor queried about decorators in Quixote, and here are implementations of his two suggested decorators. They seem to make sense syntactically.

First, one to restrict access to the decorated function to logged-in users:

from quixote.errors import AccessError

def require_login(func): """ decorator: require login to run decorated function. """ def wrapper(request): if not request.session.user: raise AccessError("you must be logged in!") return func(request)

return wrapper

Use like so:

@require_login
def func(request):
   ...

A slightly more complex case follows: here are two functions to export names for publication by Quixote.

def export(func):
    """
    decorator; export decorated function under its __name__.
    """
    _q_exports = func.func_globals['_q_exports']
    _q_exports.append(func.__name__)

return func

def export_names(*names): """ decorator; export decorated function under all given names. """

# build a new function to return; this is what will be called on # the following function. def export_func(func, names=names): _q_exports = func.func_globals['_q_exports'] for name in names: _q_exports.append((name, func.__name__,))

return func

return export_func

Use these like so:

@export
def func(request):
   ...

@export_names("name1", "name2") def func2(request): ...

It's a little bit irritating that you have to grab _q_exports from func_globals but *shrug* that's scoping for ya! (If you don't do this, then you can't import the decorators from another module.)

--titus

Web browsing via Python, refreshed.

The saga continues... after many patches, and diversions, and attempts to grok, I finally got PBP to work for a variety of purposes. In particular, I can now:

  • Use maxq to record browsing sessions to PBP scripts, and run those scripts successfully in PBP;
  • Use an only slightly altered HTMLParser.HTMLParser class to parse the moderately crappy SourceForge/mailman output;
  • Grok RFC 2965 cookies well enough to actually log into and play with mailman admindb pages;
  • Save, load, and view cookies in PBP.

I've submitted a bunch of patches to PBP & hopefully Cory can take a look at them in the next few weeks.

HTML/HTTP/URL support in Python's base is surprisingly messy, given my past experience with Python modules. I look forward to the day when Python 2.4 or above is the standard and 2.3 is no longer supported -- it will make cookie and URL handling much simpler!

I may spend some quality time refitting the HTMLParser classes in the htmllib and HTMLParser modules; as-is, they don't have good failure modes.

--titus

"He realized the fastest way to change is to laugh at your own
folly -- then you can let go and quickly move on."
-- via Dossy Shiobara
20 Dec 2004 (updated 28 Dec 2004 at 23:10 UTC) »
The joys, trials, and tribulations of open-source work

This is becoming a bad joke.

The story starts innocently enough: when playing with PBP, I found a bug in Python's sgmllib.py. This bug caused PBP (via ClientForm) to fail on my test case, the Quixote demo application. So I went and fixed that bug and submitted a patch to the Python developers.

Then I diverted myself and decided to take up the gauntlet thrown by Martin v. Löwis, in order to see how quickly I could get my patch reviewed. I proceeded to work through first one and then nine add'l Python patches. My python-dev post still hasn't gone through so nothing further has happened there.

Today I put myself back on track and spent some time with maxq, a Java HTTP proxy recording system that outputs Jython code tracking a Web session. I added a script generator module for PBP to maxq so that it would output PBP scripts as well.

In the process I ran across two additional problems in PBP. First I found a bug in the way PBP used shlex (PBP patch 274). Then maxq's behavior of recording all form variables, even hidden ones, led me to discover that PBP crashed when trying to change hidden variables. I changed this to a warning (PBP patch 273). (Not strictly a PBP problem, but the crash behavior was probably inappropriate.)

So, to try out one package (PBP) and modify another to fit my needs (maxq), I ended up crawling through a bunch of nifty packages (mechanize, ClientForm, and four or five Python modules I'd never used before), submitting three different patches to two different projects (not including revised patches I contributed to Python), and writing a small chunk of new code in Java. whee!

The worst part is that one of the PBP patches I'm submitting contains this:

 ...See http://issola.caltech.edu/~t/transfer/quixote-demo.pbp for an
operative example, although you will not be able to run it without
changing ClientForm.py to use the XHTMLCompatibleFormParser, because
of a bug in sgmllib.py (Python patch 1087808).
Yep -- to test this patch, go apply this other patch to the language you're using...

O well. At least now I can get around to the original purpose of writing a testing suite for Cartwheel.

My wife and cat are upset with the amount of time I've spent on this, that's for sure... ;) Of course, my wife doesn't get a vote because she went skiing today while I abused sea urchins. (The cat doesn't get a vote either: he sleeps all day.)

On the plus side, at least I could fix the software myself, so everything works now. I'd be SOL if any of this stuff had been closed-source.</a></a>

Continued karma-grubbing.

My advice for the Python patches neophyte: start high. I foolishly started in the low-numbered patches (#755660) & spent over an hour trying to figure out the code (which was easy) and the various patches and counterarguments and ... That part was less easy and nowhere near as pleasant. Folks, there's a reason why those patches and bug reports have been lying around for over a year -- no one else wants to deal with them.

Another piece of advice is to stick with modules you already know about or are interested to learn about. In my case I'd just gone through the HTMLParser stuff to find the sgmllib bug, so I decided to focus on HTML/CGI/URL stuff.

Anyhoo, worked my way through 9 add'l patches. Great! Together with my comment on patch 755660, I ran through 10 different patches. I may now pray to The Gods That Be to review patch 1087808, my sgmllib fix.

Here's a list:

  • patch 1055159, a simple docstring/doc patch to CGIHTTPServer. (This was really a documentation "enhancement request" masquerading as a patch, so I verified the behavior described & then wrote the doc string appropriately.)

  • patch 755670, a patch to make HTMLParser parse invalid HTML. Recommended not applying it.

    (I also put in a comment that was meant for a different patch (755660). Very embarrassing. Sigh. Tabbed browsing is dangerous in my hands. ;)

  • patch 1037974, a patch to fix HTTP digest authentication when accessing LiveJournal feeds (and any other feeds that don't listen to RFC 2617, which considers 'Algorithm' notification optional).

    Had to sign up for a livejournal account to test this, too...

    This is my first time dealing with digest authentication, but as I understand it the new behavior is merely verbose and a bit redundant. It certainly shouldn't break anything. Recommend apply.

  • patch 1028908, a bunch of small stuff by John J. Lee of wwwsearch. Apparently much of the code he modified was originally written by him anyway (?) and the regression tests passed, so *shrug*... recommend application.

  • patch 901480, fixing bug 735248. This fixes a bug in the way urllib2.parse_http_list parses unquoted elements. Recommend application, although I submitted a slightly modified patch (against the current CVS) that fixed a doctest string.

  • patch 827559 fixes SimpleHTTPServer to add a trailing '/' to directory names. Recommend application; it does fix the behavior, and I think it's a reasonable way to treat directories, too.

    An analogy: Links to 'http://some.place.or.other:port' get rewritten to 'http://some.place.or.other:port/', so links to '/this/dir' should get rewritten to '/this/dir/'.

  • patch 810023, a very nice patch to fix some reporthook behavior in urllib. The submitter, J. Lewis Muir, wrote regression tests to show that his new urllib worked, so testing this one was pretty easy. Recommend apply, based on the regression test behavior failed by the current source tree.

  • patch 893642, which adds an optional allow_none argument to SimpleXMLRPCServer and classes that use it. I updated the patch & added some documentation.

  • patch 1067760, to bug 1067728 (closed). This patch changes the behavior of seek() on file objects to do float --> long conversion instead of float --> int conversion. This allows 2.0**62 to be used in a seek, just like 2 ** 62. I recommended applying it because it shouldn't lead to new bugs. I'm probably missing something.

Spent a bit of time on Python bugs today, as per Martin v. Löwis's suggestion -- building up karma for my own (5 line) patch to sgmllib ;). Took a lot more time than I thought, sigh; I only took a look at one. Here are my notes: <hr>

patch 755660, fixes bug 736428. Comments on bug 917188 (closed) are relevant. May also fix or at least allow amelioration of behavior in bug 683938 (assigned to frdrake) and bug 699079 (closed).

I don't understand why in the comments on bug 736428 it says that the "patch in bug 917188 (closed) may be better" because there's no patch attached. Perhaps kingwood means that you need to pay attention to markupbase.py, too?

My comments:

This patch allows developers to override the behavior of HTMLParser
when parsing malformed HTML.  Normally HTMLParser calls the function
self.error(), which raises an exception.  This patch adds appropriate
return values for situations where self.error has been redefined in
subclasses to *not* raise an exception.

It does not change the default behavior of HTMLParser and so presents no backwards compatibility issues.

The patch itself consists of an added comment and two added lines of code that call 'return' with appropriate values after a self.error call. Nothing wrong with 'em. I can't verify that the "junk characters" error call will leave the parser in a good state, though, if execution returns from error().

The library documentation could be updated to reflect the ability to override error() behavior; I've written a short patch, available at

http://issola.caltech.edu/~t/transfer/HTMLParser-doc-error.patch

More problems exist with markupbase.py, upon which HTMLParser is based. markupbase calls error() as well, and has some stickier situations. See comments in bug 917188 as well.

Comments in 683938 and 699079 suggest that raising an exception is the correct response to the parse errors. I recommend application of the patch anyway, because it (a) doesn't change any behavior by default and (b) may solve some problems for people.

An alternative would be to distinguish between unrecoverable errors and recoverable errors by having two different functions, e.g. error() (for recoverable errors) and _fail() (for unrecoverable errors). By default error() would call _fail() and internal code could be changed to call _fail() where recovery is impossible. This might alter behavior in situations where subclasses override error() but then again that's not legitimate to do anyway, at least not at the moment -- error() isn't in the docs ;).

If nothing done, at least close patch 755660 and bug 736428 with a comment saying that this behavior will not be addressed ;).

Python problems: sgmllib/htmllib vs HTMLParser

While playing with PBP, I noticed that tag attributes weren't being correctly parsed. For example,

<option value="Small (10&quot;)"> Small (10&quot;)

was coming through as

<option value="Small (10&quot;)"> Small (10")

This caused problems in two areas: first, trying to set the value of the associated select widget failed unless the entity-encoded string was used (Small (10&quot;) instead of Small (10")). This in turn caused problems on submission of the form to the Web server, because the value was encoded once more for HTTP transmission. cgi.FieldStorage would decode it on the server side and set the select widget value to Small (10&quot;). So overall badness happened on both client and server sides.

I dug deeply into PBP, which led me to mechanize, which in turn led me to ClientForm, which led me to htmllib.HTMLParser. The trail finally ended in sgmllib. Long story short: there are two HTML parsing classes in Python, htmllib.HTMLParser (derived from sgmllib.SGMLParser) and HTMLParser.HTMLParser, which is more-or-less standalone. mechanize can use either, but prefers htmllib because it is present in older versions of Python. And here's the essential clue: the problem goes away if you switch to using HTMLParser.HTMLParser instead of htmllib.HTMLParser.

Once I figured this out, the root cause was easy to find: sgmllib.SGMLParser (and therefore htmllib.HTMLParser) does not unescape tag attributes, while HTMLParser.HTMLParser does. Oddly enough it doesn't use handle_entityref to unescape tag attributes; it uses string.replace to handle a small number of specific entity refs. I'm not sure if this is correct, but it's easy to move the same code over to sgmllib.py.

The diff to sgmllib.py is below. It's pretty small; I'll send it out the comp.lang.python newsgroup and see what people think, before I waste the time of Python maintainers specifically. It sure is nice to dig deeply into the code and find such a simple fix ;).

--- sgmllib.py  2004-12-16 23:30:51.000000000 -0800
***************
*** 272,277 ****
--- 272,278 ----
              elif attrvalue[:1] == '\'' == attrvalue[-1:] or \
                   attrvalue[:1] == '"' == attrvalue[-1:]:
                  attrvalue = attrvalue[1:-1]
+                 attrvalue = self.unescape(attrvalue)
              attrs.append((attrname.lower(), attrvalue))
              k = match.end(0)
          if rawdata[j] == '>':
***************
*** 414,419 ****
--- 415,432 ----
      def unknown_charref(self, ref): pass
      def unknown_entityref(self, ref): pass

+ # Internal -- helper to remove special character quoting + def unescape(self, s): + if '&' not in s: + return s + s = s.replace("<", "<") + s = s.replace(">", ">") + s = s.replace("'", "'") + s = s.replace(""", '"') + s = s.replace("&", "&") # Must be last + + return s +

class TestSGMLParser(SGMLParser):

g'nite.

--titus

19 older entries...

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!