Older blog entries for robertc (starting at number 94)

Customer service... and vodafone. Something fundamentally missing here.

Vodafone's website is slow - 50% of the time you can't even login. 15-20 simple page clicks.

Ringing customer service - you get a IVR system that is intensely frustrating. It can't handle simple things like "Your website sucks" (it asks if you want to inquire about iphone 3g's). 10 minutes later and it actually offers a menu. Finally, some way to get through to a human.

Having stopped doing paper bills, you have to sign up to this website; several thousand words of legal terms and conditions later - 99% unrelated to paying bills - and you can't configure it to email you, unless it also SMS's you. WTF. Customer service don't know why this is required. And the website doesn't tell you its required, it just refuses to accept the form unless it's ticked.

Seriously, someone setup a viable, flexible, mobile provider in australia, with good international roaming - give me a ring.

Well, advertising for the win right?

Was looking up a song I vaguely remembered via youtube, and I noticed that > 30% of the search results are taken up by a single advert for a film (Liam Neeson in Taken, ironically) which sits at the top right, leaving the rest of the column blank.

SHEESH

What the git vs bzr discussion is about IMO is usability. The following blog post about DTrace on linux talks about the same issue, and I'd like to use Bryan's words:

"Over and over again, we made architectural and technical design decisions that would yield an instrumentation framework that would be not just safe, powerful and flexible, but also usable. The subtle bit here is that many of those decisions were not at the surface of the system (where the discussion on the Linux list seems to be currently mired), but in its guts."

->

"Over and over again, we made architectural and technical design decisions that would yield a Distributed VCS be not just safe, powerful and flexible, but also usable. The subtle bit here is that many of those decisions were not at the surface of the system (where the discussions going on at the moment seem to be currently mired), but in its guts."

I keep running into folk whom I knew of, that use bzr, but I did not know that they use bzr.

Right now there is a lot of discussion going on about DVCS in various projects. While I imagine most bzr users just want to get on with their coding (after all thats what bzr is good at :))... it would be fantastic if you could blog that you use it, and folk at GUADEC wear the T-shirt!

Also, I'm at GUADEC, and I'm extremely happy to answer questions from anyone, bzr user, git user, or even svn user :)

4 Jul 2008 (updated 4 Jul 2008 at 10:10 UTC) »

Well, the gauntlet is down (BTW - desktop power integration. Cool!). The use case Ted talks about is actually quite interesting - we were at UDS last month, waiting on a SVN server that was apparently so slow we could have walked to it and copied stuff onto harddisk more quickly. (Really. No kidding). bzr was idling and blocked on network IO the whole time... kudos for the plugin Ted!

For my response, may I present a new index format, (branch url) 70% smaller than bzr's current default, equally fast at most workloads, up to 20 times faster at others. I started this this week, and John jumped in in overlapping time periods, but I think it counts!

Note that the perfromance wins are a component improvement - other things we haven't addressed yet can make the index improvements less visible. But several early adopters have told me that they see a 25-30% reduction in 'time bzr log > /dev/null' or other commands.

To install:

bzr branch http://bazaar.launchpad.net/~lifeless/+junk/bzr-index2 ~/.bazaar/plugins/index2

bzr branch https://bazaar.launchpad.net/~jameinel/+junk/pybloom ~/.bazaar/plugins/pybloom

To use:

cd <repository you want to experiment on>

bzr upgrade --btree-plain

(or --btree-rich-root for bzr-svn users).

A version of this will be going to trunk soon, and it will be able to upgrade from any repository that you have that uses the plugin as long as you keep the plugin installed.

Dear lazyweb number 3.

So far, I've asked:

high latency net simulations - great answers.

python friendly back-end accessible search engines - many answers, none that fit the bill. So I wrote my own :).

Today, I shall ask - is there a python-accessible persistent b+tree(or hashtable, or ...) module. Key considerations:

- scaling: millions of nodes are needed with low latency access to a nodes value and to determine a nodes absence

- indices are write once. (e.g. a group of indices are queried, and data is expired altered by some generational tactic such as combining existing indices into one larger one and discarding the old ones)

- reading and writing is suitable for sharply memory constrained environments. ideally only a few 100KB of memory are needed to write a 100K node index, or to read those same 100K nodes back out of a million node index. temporary files during writing are fine.

- backend access must either be via a well defined minimal api (e.g. 'needs read, readv, write, rename, delete') or customisable in python

- easy installation - if C libraries etc are needed they must be already pervasively available to windows users and Ubuntu/Suse/Redhat/*BSD systems

- ideally sorted iteration is available as well, though it could be layered on top

- fast, did I mention fast?

- stable formats - these indices may last for years unaltered after being written, so any libraries involved need to ensure that the format will be accessible for a long time. (e.g. python's dump/marshal facility fails)

sqlite, bdb already fail at this requirements list.

snakesql, gadfly, buzhug and rbtree fail too.

Launchpad, please stop mailing me mine own comments on bugs. I know what I said.

kthxbye

14 Jun 2008 (updated 14 Jun 2008 at 10:07 UTC) »

Rethinking annotate: I was recently reminded of Bonsai for querying vcs history. GNOME runs a bonsai instance. This got me thinking about 'bzr annotate', and more generally about the problem of figuring out code.

It seems to me that 'bzr annotate', is, like all annotates I've seen pretty poor at really understanding how things came to be - you have to annotate several versions, cross reference revision history and so on. 'bzr gannotate' is helpful, but still not awesome.

I wondered whether searching might be a better metaphor for getting some sort of handle on what is going on. Of course, we don't have a fast enough search for bzr to make this plausible.

So I wrote one: bzr-search in my hobby time (my work time is entirely devoted to landing shallow-branches for bzr, which will make a huge difference to pushing new branches to hosting sites like Launchpad). bzr-search is alpha quality at the moment (though there are no bugs that I'm aware of). Its mainly missing optimisation, features and capabilities that would be useful, like meaningful phrase searching/stemming/optional case insensitivity on individual searches.

That said, I've tried it on some fairly big projects - like my copy of python here:


time bzr search socket inet_pton
(about 30 hits, first one up in 1 second)...
real    0m2.957s
user    0m2.768s
sys     0m0.180s

The index run takes some time (as you might expect, though like I noted - it hasn't been optimised as such). Once indexed, a branch will be kept up to date automatically on push/pull/commit operations.

I realise search is a long slope to get good results on, but hey - I'm not trying to compete with Google :). I wanted something that had the following key characteristics: * Worked when offline * Simple to use * Easy to install

Which I've achieved - I'm extremely happy with this plugin.

Whats really cool though, is that other developers have picked it up and already integrated it into loggerhead and bzr-eclipse. I don't have a screen shot for loggerhead yet, but heres an old one. This old one does not show the path of a hit, nor the content summaries, which current bzr-search versions create.

Recently I read about a cool bugfix for gdb in the Novell bugtracker on planet.gnome.org. I ported the fix to the ubuntu gdb package, and Martin Pitt promptly extended it to have an amd64 fix as well.

I thought I would provide the enhanced patch back to the Novell bugtracker. This required creating new Novell login as my old CNE details are so far back I can't remember them at all.

However, hard-stop when I saw this at the bottom of the form:

"By completing this form, I am giving Novell and/or Novell's partners permission to contact me regarding Novell products and services."

No thank you, I don't want to be contacted. WTF.

8 Jun 2008 (updated 9 Jun 2008 at 00:26 UTC) »

So, the last lazyweb question I asked had good results. Time to try again:

Whats a good python-accessible, cross-platform-and-trivially-installable(windows users) flexible (we have plain text, structured data, etc and a back-end storage area which is only accessible via the bzr VFS in the general case), fast (upwards of 10^6 documents ), text index system?

pylucene fails the trivially installable test (apt-cache search lucence -> no python bindings), and the bindings are reputed to be SWIG:(, xapian might be a candidate, though I have a suspicion that SWIG is there as well from the reading I have done so far, and - we'll have to implement our own BackEndManager subclass back into python. That might be tricky - my experience with python bindings is folk tend to think of trivial consumers only, not of python providing core parts of the system :(.

So I'm hoping there is a Better Answer just lurking out there...

Updates: sphinx looks possible, but about the same as xapian - it will need a custom storage backend. google desktop is out (apart from anything else, there is no way to change the location documents are stored, nor any indication of a python api to control what is indexed).

It looks like I need to be considerably more clear :). I'm looking for something to index historical bzr content, such that indices can be reused in a broad manner(e.g. index a branch on your webserver), are specific to a branch/repository (so you don't get hits for e.g. the working tree of a branch), with a programmatic API (so that the bzr client can manage all of this), with no requirement for a daemon (low barrier to entry/non-admin users).

85 older entries...

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!