Older blog entries for connolly (starting at number 31)

19 Jul 2005 (updated 20 Jul 2005 at 21:14 UTC) »
advobak.py -- advogato diary incremental backup; thanks to titus for some clues.

v1.2 start with the highest number and counts down; with -u, it stops when it finds one that's already up to date.

15 Jul 2005 (updated 15 Jul 2005 at 07:17 UTC) »

I finally got a copy of the contents of my personal wiki converted to clean XHTML, one of the few formats I trust. After fixing a bunch of low level escape markup problems with regex foo, I remembered that wiki rendering mixes up tag nesting. So I needed tidy. ElementTree and TidyHTMLTreeBuilder to the rescue! Now the hard part: migrating from the wiki genre to the journal/blog genre.

This work started in a CVS repository that I keep on a mac that goes to sleep at night. I usually etherwake it, but tonight I used hg and committed changes right on the machine I'm working on, and pushed them to another machine when I finished. I missed emacs integration, but not too badly.

I like the idea of all my machines acting as peers, but I wonder if I'll lose track of which changes are where. And I wonder when to add to an existing project/repository and when to make a new one. When it comes to moving files around, hg is less constrained than CVS but still has limitations. And moving things in the web involves redirects, if not broken links. If I want to publish the code, I doubt I'll use an hg server or even an hg CGI; I'd probably use a commit hook to update some static files, or not even bother with a repository on the server but just use rsync.

Tags: python scm

13 Jul 2005 (updated 15 Jul 2005 at 01:21 UTC) »
Bugs on top of bugs on top of paperwork: frustration on so many levels

So I'm trying to send an expense report.

The bane of my existence is doing things that I know the computer could do for me.*

Filling out shipping waybills is definitely one of them. I've tried the fedex online shipping interface, but it's pretty javascripty and klunky, and I've never made it past the form that asks for my credit card number, since my employer always pays to have these expense reports shipped.

But the last time I tried to send an expense report via fedex, I got a bill for $30 because I neglected to check exactly the right "bill to sender" box on the paper waybill. When I called fedex, they could see that I clearly meant to bill the recipient, since I wrote the account number right there, and they fixed the billing problem. So today I'm motivated to invest quite a bit more in getting online shipping working.

So I fill out the account application forms. One of them asks whether I want to create a new fedex account or use an existing one, but it doesn't accept the account number I have for my employer, so I create a new one, even though I don't want to. And there's this confusing stuff about a 10% discount.

I fought thru all that and got to a form that looked just like a paper waybill. Now we're getting somewhere. It has a "bill to recipient" option but when I submit the form, I get "invalid account". So I find the tech support number and get in the hold queue. While I'm waiting on hold, a friendly voice informs me that they have online chat support. I'm a pretty big fan of online chat, so I try it out.

It's java. I've had mixed luck with java applets, but thanks to a great article on installing Sun's java on debian, my Java installation is pretty shiny and I'm willing to give it a try. Just as I was starting to have a meaningful exchange with a fedex support guy, it started doing a peg-the-cpu-and-update-the-scrollbar-a-zillion-times thing. It recovered once, then did it again. I closed the window but the CPU stayed pegged. Uh-oh; I thought I'd have to restart firefox and lose days worth of state. But eventually the CPU load subsided.

Meanwhile, my turn in the hold queue came up, and in an IRC chat with our admin folks, I discover that I've been putting an extra digit in the account on my waybills. Also, I had used a "verify address" feature, which filled in the full 10 digit zip, but the account I was billing to only had the 5 digit zip. I don't know who thought that "invalid account" was a sufficient diagnostic for that failure mode. With those two problems addressed, I got as far as printing a waybill, only to see my phone number represented as (161)739-5555; their profile forms don't grok international +1-... syntax, so I had to put just the bare 10 digits in there. Finally, I got the computer to fill out the waybill for me.

The next hurdle was getting a priceline receipt printed. I have a .pdf file and a .ps file (derived from the .pdf via pdf2ps, I think). If I ask cups to print the .ps file to my networked HP printer, somewhere along the line it gets scaled up 4x, and I get only the top-left quarter of the receipt filling the whole page. The pdf file looks funny in evince. I check it with acroread and it works, so I try printing it and I win. But evince is otherwise such a nice piece of software that I want to help them fix this problem.

So I try, once again, to figure out how to report problems in debian. The gnome menus aren't much help, but it's pretty easy to find How to report a bug in Debian via the evince package page. I like the idea of an emacs interface, so I apt-get install debbugs-el and type M-x debian-bug. No joy. Since I haven't restarted emacs, it hasn't loaded the package. I have to find the .elc and M-x load-file it. Then it pleasantly guides me thru filling out a bug report. But I want to attach some files to the bug report, and I can't see how to do that in emacs mail mode. Oh well, I'll attach them to follow-up messages.

Emacs reports success when I hit ctrl-c ctrl-c to send the message; but I use an ssh tunnel to send my mail, and I haven't told emacs about it. So where did that message go? It's frozen in an exim4 queue. I don't really want to manage an MTA on this machine, but lots of debian packages that I use require one. There is a "don't do anything" configuration option, and I'm pretty sure I chose that one when I installed exim4. Now I'm trying to remember how to reconfigure it. I see several references to a debconf(7) man page that I can't find. I eventually figure out the magic incantation: sudo /usr/sbin/dpkg-reconfigure -plow exim4-config. But even after I configure it to send mail, the message stays frozen in the queue. I give up at this point and copy the contents of the message to a text file, and use EDITOR=gedit reportbug. I ask in the #debian channel about how to attach files to bug reports; somebody there confirms that attaching them to follow-up messages is a reasonable thing to do, but also suggests just composing the message that reportbug would send in my normal mailer and using it to attach them. "I don't usually bother with reportbug" he says.

argh! The primary interface for reporting debian bugs is so unusable that the developers (ok, one developer) don't even use it?

So I press on. I don't regularly use my ISP's SMTP server, but somehow reportbug knows its address and sends the bug report.

While I'm waiting for the acknowledgement, I try to figure out the relationship between reportbug and bug-buddy. I recall that bug-buddy feeds into the upstream gnome bug system, and I have a vague recollection that debian wants you to file bugs in the debian bug tracking system, not upstream. Plus, bug-buddy doesn't seem to have an interface for attaching files either.

Ah... the acknowledgement is here now: #318122.

@@tags: usability debian linux-printing-swamp

*yours truely, Oct '98

advogato still doesn't grok rss:title. I'm tring a class="title" microformat to that maybe I can recover/convert...

I reviewed my ISP bills and such over the weekend... went to check the mailbox that they provide... an address that I have never given to anyone; the 20MB quota was 99% used; all spam except monthly ISP newsletters.

It used to be that commercial or political spam was sent out by one type of miscreant, while worms and viruses were created and propagated by another type. These days, the two have merged. This is because spammers are using viruses to take control of thousands of people's computers, which the spammers then use to send out their crap. -- 2005 by Jef Poskanzer

I gather that some ISPs would like to automatically put their zombie customers into a sandbox where the only thing they can contact is an ISP web server (not running Windows I hope) where they can download repair tools and security updates, but Microsoft has, depressingly unsurprisingly, been uncooperative when ISPs have asked to redistribute updates on their own servers. -- Taughannock Networks Weblog 23 Jan 2005

Maybe there's hope: Slashdot | FTC Recommends ISPs Disconnect Spam Zombies May 24 2005.

Tags: spam.

29 Jun 2005 (updated 8 Jul 2005 at 04:08 UTC) »

The --xml-output option on darcs commands is something I hope hg picks up; let's not do much more microparsing than we have to.

OpenID looks kinda cool, but I don't quite see how it works; I hope they add a story/example showing what happens if a black hat puts my homepage URL in the field.

Some CVS (related) features I depend on that I wonder if/when hg has: emacs support (ctrl-x-v-v is burned into my medulla), keywords support.

Hmm... PHP is clearly the dominant server-side deployment vehicle these days. I wonder if its namespace management will catch up with Modula-3 and python... or if the .NET/mono CLR stuff has much chance; IronPython and the like look promising. I suppose JSP hosting is a commodity these days... and .Net... I wonder what the prices and trends are; and I wonder if there's mono hosting yet.

Hmm... a bunch more certs... ugh... they're all spam. How do I certify somebody as pest or fraud or leech?

Tags: scm

Using hCard, XSLT, and RDF to sync the family cellphones

Mary got a new Motorola V188, which iSync supports. Buoyed by the success of moving the kitchen calendar from paper to iCal, I took her paper address book and keyed it into Apple's addressbook. Now I want to sync (or at least copy) several of the categories (family, medical) from the contacts in my gizmo to hers.

In palmagent, dangerSync.py produces contact.rdf; I could use N3 rules to convert that to a vCard RDF vocabulary, and then do a syntactic RDF to vCard syntactic transform, but I'm a little down on N3 rules lately; the rules I use to process my calendar are really slow. I have been thinking about having dangerSync.py spit out .ics format directly. Or maybe having it spit out RDF diffs is the way to go...

Meanwhile, there's hCard, whence comes X2V including xhtml2vcard.xsl. And I'm a big fan of using XHTML to archive important data anyway, so I think I'll write some XSLT to convert contact.rdf to XHTML and then use xhtml2vcard.xsl to get vcard syntax.

grumble... advogato diary entries don't have titles, do they?

Tags: hCard, WearableGizmo, palmagent, RDF diff/sync, DIG

I'm no longer happy with my personal wiki; since upgrading to a version of zwiki that supports dated comments, I find that I'm more comfortable doing "episodic publishing" (i.e. blogging) than collaborating on collected wisdom. A personal wiki is an oxymoron; the wiki genre is all about collaboration.

So I started thinking about how to migrate my content to a blogging system. The first step was to somehow grok all the data I've got in there. I asked in the #zope channel about .zexp format and was discouraged from peeking inside. I was advised to write an external method that runs inside Zope to export my data, but by the time I saw that advice, I already had dumpwiki.xsl converting the zope XML export format to XHTML. The actual contents of the wiki pages was quoted, so I'm undoing that with python and xmltramp. I think I hit a bug in the xmltramp serializer. Gotta look into that.

Anyway... the #zope guys were surprised that I had never done any Zope methods. I explained that I use Zope because it's the only server you can apt-get and write to (with iPhoto or emacs eldav mode) out of the box, with elephant-never-forgets versioning.

That got me thinking... with the "cvs is good enough" orthodoxy eroding, and all the work on subversion and arch and darcs, maybe it's time to take another look at the versioning part of WebDAV. Especially git, with its cryptographically secure history... because the problem with the writeable web is more social than technical. People use sftp rather than HTTP because the social protocol is well known, not because FTP is a better protocol than HTTP for writing.

Hmm... if I had a PAW blog or a DIG blog, this might belong there. Maybe reltag will work...

I sure wish python had more penetration in the blogging world. PHP is clearly the server-side deployment vehicle these days, and javascript is the way to juice up clients, but neither PHP nor javascript meets the unambiguity requirement that I think is critical for software engineering in the large.

Got tired of the manual login-and-download-statement ritual with my bank, and since ClientForm and ClientCookie are such a joy and python has SSL built-in, I cooked up grabst.py that automates it. Heaven forbid the bank website should allow me to just bookmark my statement so I could directly GET (with SSL and password auth) it, without all the frames and javascript malarky.

Glom reports looks interesting... using XSLT and CSS for reports... GTK... python API... postgress; that reminds me that In quacken I have code to transfer all my Quicken data (about 10 years worth) to postgress in one go for use with an old version of saCASH. saCASH development has since gone in a direction that I'm not so interested in, but this glom thing looks like a good match for my FractalAccounting goals.
Grokking Triples from Spreadsheets

Sean notes that there are lots of triples in spreadsheets. Yup. After my Aug 2003 trip to Montreal for Extreme, I used gnumeric as an RDF authoring tool to collect all the gas receipts and such; then the Makefile has this stanza to convert it to RDF:

triplog.rdf: triplog.xml grokSheet.xsl
	$(XSLTPROC) --novalid grokSheet.xsl triplog.xml >$@

I haven't scrubbed the data, so this is somewhat incomplete as a demo.

Yes, this is another GRDDL style transformation.

A comment on Sean's blog said "don't forget RDBs". Of course not. See Relational Databases and the Semantic Web; I hope to update my implementation, dbview.py to use SPARQL before too much longer.

Hmm... where are timbl's slides on RDF, trees, tables, and such?

22 older entries...

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!