<?xml version="1.0"?>
<rss version="2.0.">
  <channel>
    <title>Advogato blog for robertc</title>
    <link>http://www.advogato.org/person/robertc/</link>
    <description>Advogato blog for robertc</description>
    <language>en-us</language>
    <generator>mod_virgule</generator>
    <pubDate>Wed, 9 Jul 2008 06:21:38 GMT</pubDate>
    <item>
      <pubDate>Fri, 4 Jul 2008 10:01:54 GMT</pubDate>
      <title>4 Jul 2008</title>
      <link>http://www.advogato.org/person/robertc/diary.html?start=90</link>
      <guid>http://www.advogato.org/person/robertc/diary.html?start=90</guid>
      <description>&lt;p&gt;Well, the &lt;a href="http://gould.cx/ted/blog/Bazaar_Power_Management" &gt;gauntlet&#xD;
is down&lt;/a&gt; (BTW - desktop power integration. Cool!). The&#xD;
use case Ted talks about is actually quite interesting - we&#xD;
were at UDS last month, waiting on a SVN server that was&#xD;
apparently so slow we could have walked to it and copied&#xD;
stuff onto harddisk more quickly. (Really. No kidding). bzr&#xD;
was idling and blocked on network IO the whole time... kudos&#xD;
for the plugin Ted!&lt;p&gt;&#xD;
For my response, may I present a &lt;a href="https://code.edge.launchpad.net/~lifeless/+junk/bzr-index2" &gt;new&#xD;
index format&lt;/a&gt;, (&lt;a href="http://bazaar.launchpad.net/~lifeless/+junk/bzr-index2" &gt;branch&#xD;
url&lt;/a&gt;) 70% smaller than bzr's current default, equally&#xD;
fast at most workloads, up to 20 times faster at others. I&#xD;
started this this week, and &lt;a href="http://jam-bazaar.blogspot.com" &gt;John&lt;/a&gt; jumped in in&#xD;
overlapping time periods, but I think it counts!&lt;p&gt;&#xD;
Note that the perfromance wins are a component improvement -&#xD;
other things we haven't addressed yet can make the index&#xD;
improvements less visible. But several early adopters have&#xD;
told me that they see a 25-30% reduction in 'time bzr log&#xD;
&amp;gt; /dev/null' or other commands.&lt;p&gt;&#xD;
To install:&lt;p&gt;&#xD;
bzr branch&#xD;
http://bazaar.launchpad.net/~lifeless/+junk/bzr-index2&#xD;
~/.bazaar/plugins/index2&lt;p&gt;&#xD;
bzr branch&#xD;
https://bazaar.launchpad.net/~jameinel/+junk/pybloom&#xD;
~/.bazaar/plugins/pybloom&lt;p&gt;&#xD;
To use:&lt;p&gt;&#xD;
cd &amp;lt;repository you want to experiment on&amp;gt;&lt;p&gt;&#xD;
bzr upgrade --btree-plain&lt;p&gt;&#xD;
(or --btree-rich-root for bzr-svn users).&lt;p&gt;&#xD;
A version of this will be going to trunk soon, and it will&#xD;
be able to upgrade from any repository that you have that&#xD;
uses the plugin as long as you keep the plugin installed.</description>
    </item>
    <item>
      <pubDate>Fri, 27 Jun 2008 01:20:02 GMT</pubDate>
      <title>27 Jun 2008</title>
      <link>http://www.advogato.org/person/robertc/diary.html?start=89</link>
      <guid>http://www.advogato.org/person/robertc/diary.html?start=89</guid>
      <description>&lt;p&gt;Dear lazyweb number 3.&lt;p&gt;
So far, I've asked:&lt;p&gt;
high latency net simulations - great answers.&lt;p&gt;
python friendly back-end accessible search engines - many answers, none that fit the bill. So I wrote my own :).&lt;p&gt;
Today, I shall ask - is there a python-accessible persistent b+tree(or hashtable, or ...) module. Key considerations:&lt;p&gt;
 - scaling: millions of nodes are needed with low latency access to a nodes value and to determine a nodes absence&lt;p&gt;
 - indices are write once. (e.g. a group of indices are queried, and data is expired altered by some generational tactic such as combining existing indices into one larger one and discarding the old ones)&lt;p&gt;
 - reading and writing is suitable for sharply memory constrained environments. ideally only a few 100KB of memory are needed to write a 100K node index, or to read those same 100K nodes back out of a million node index. temporary files during writing are fine.&lt;p&gt;
 - backend access must either be via a well defined minimal api (e.g. 'needs read, readv, write, rename, delete') or customisable in python&lt;p&gt;
 - easy installation - if C libraries etc are needed they must be already pervasively available to windows users and Ubuntu/Suse/Redhat/*BSD systems&lt;p&gt;
 - ideally sorted iteration is available as well, though it could be layered on top&lt;p&gt;
 - fast, did I mention fast?&lt;p&gt;
 - stable formats - these indices may last for years unaltered after being written, so any libraries involved need to ensure that the format will be accessible for a long time. (e.g. python's dump/marshal facility fails)&lt;p&gt;
sqlite, bdb already fail at this requirements list.&lt;p&gt;
snakesql, gadfly, buzhug and rbtree fail too.</description>
    </item>
    <item>
      <pubDate>Tue, 17 Jun 2008 01:04:26 GMT</pubDate>
      <title>17 Jun 2008</title>
      <link>http://www.advogato.org/person/robertc/diary.html?start=88</link>
      <guid>http://www.advogato.org/person/robertc/diary.html?start=88</guid>
      <description>&lt;p&gt;Launchpad, please stop mailing me mine own comments on bugs. I know what I said.&lt;p&gt;
kthxbye</description>
    </item>
    <item>
      <pubDate>Sat, 14 Jun 2008 10:06:24 GMT</pubDate>
      <title>14 Jun 2008</title>
      <link>http://www.advogato.org/person/robertc/diary.html?start=87</link>
      <guid>http://www.advogato.org/person/robertc/diary.html?start=87</guid>
      <description>Rethinking annotate: I was recently reminded of Bonsai for&#xD;
querying vcs history. GNOME runs a bonsai &lt;a href="http://svn.gnome.org/viewvc/yarrr/?view=queryform" &gt;instance&lt;/a&gt;.&#xD;
This got me thinking about 'bzr annotate', and more&#xD;
generally about the problem of figuring out code.&#xD;
&#xD;
&lt;p&gt; &lt;p&gt; It seems to me that 'bzr annotate', is, like all annotates&#xD;
I've seen pretty poor at really understanding how things&#xD;
came to be - you have to annotate several versions, cross&#xD;
reference revision history and so on. 'bzr gannotate' is&#xD;
helpful, but still not awesome.&#xD;
&#xD;
&lt;p&gt; &lt;p&gt; I wondered whether searching might be a better metaphor for&#xD;
getting some sort of handle on what is going on. Of course,&#xD;
we don't have a fast enough search for bzr to make this&#xD;
plausible.&#xD;
&#xD;
&lt;p&gt; &lt;p&gt; So I wrote one: &lt;a href="https://launchpad.net/bzr-search" &gt;bzr-search&lt;/a&gt; in my&#xD;
hobby time (my work time is entirely devoted to landing&#xD;
shallow-branches for bzr, which will make a huge difference&#xD;
to pushing new branches to hosting sites like Launchpad).&#xD;
bzr-search is alpha quality at the moment (though there are&#xD;
no bugs that I'm aware of). Its mainly missing optimisation,&#xD;
features and capabilities that would be useful, like&#xD;
meaningful phrase searching/stemming/optional case&#xD;
insensitivity on individual searches.&#xD;
&#xD;
&lt;p&gt; &lt;p&gt; That said, I've tried it on some fairly big projects - like&#xD;
my copy of python here:&#xD;
&lt;pre&gt;&#xD;
time bzr search socket inet_pton&#xD;
(about 30 hits, first one up in 1 second)...&#xD;
real    0m2.957s&#xD;
user    0m2.768s&#xD;
sys     0m0.180s&#xD;
&lt;/pre&gt;&#xD;
&#xD;
&lt;p&gt; &lt;p&gt; The index run takes some time (as you might expect, though&#xD;
like I noted - it hasn't been optimised as such). Once&#xD;
indexed, a branch will be kept up to date automatically on&#xD;
push/pull/commit operations.&#xD;
&#xD;
&lt;p&gt; &lt;p&gt; I realise search is a long slope to get good results on, but&#xD;
hey - I'm not trying to compete with Google :). I wanted&#xD;
something that had the following key characteristics:&#xD;
 * Worked when offline&#xD;
 * Simple to use&#xD;
 * Easy to install&#xD;
&#xD;
&lt;p&gt; &lt;p&gt; Which I've achieved - I'm extremely happy with this plugin.&#xD;
&#xD;
&lt;p&gt; &lt;p&gt; Whats really cool though, is that other developers have&#xD;
picked it up and already integrated it into &lt;a href="https://launchpad.net/loggerhead" &gt;loggerhead&lt;/a&gt; and &lt;a href="http://launchpad.net/bzr-eclipse" &gt;bzr-eclipse&lt;/a&gt;. I&#xD;
don't have a screen shot for loggerhead yet, but heres an&#xD;
old &lt;a href="http://guille.beuno.com.ar/images/bzr-eclipse-search-fullscreen.png" &gt;one&lt;/a&gt;.&#xD;
This old one does not show the path of a hit, nor the&#xD;
content summaries, which current bzr-search versions create.</description>
    </item>
    <item>
      <pubDate>Tue, 10 Jun 2008 04:34:36 GMT</pubDate>
      <title>10 Jun 2008</title>
      <link>http://www.advogato.org/person/robertc/diary.html?start=86</link>
      <guid>http://www.advogato.org/person/robertc/diary.html?start=86</guid>
      <description>&lt;p&gt;Recently I read about a cool &lt;a href="https://bugzilla.novell.com/show_bug.cgi?id=390722" &gt;bugfix for gdb&lt;/a&gt; in the Novell bugtracker on planet.gnome.org. I ported the fix to the ubuntu &lt;a href="https://bugs.edge.launchpad.net/ubuntu/+source/gdb/+bug/111869" &gt;gdb package&lt;/a&gt;, and Martin Pitt promptly extended it to have an amd64 fix as well.&lt;p&gt;
I thought I would provide the enhanced patch back to the Novell bugtracker. This required creating new &lt;a href="https://secure-www.novell.com/selfreg/jsp/createAccount.jsp" &gt;Novell login&lt;/a&gt; as my old CNE details are so far back I can't remember them at all.&lt;p&gt;
However, hard-stop when I saw this at the bottom of the form:&lt;p&gt;
"By completing this form, I am giving Novell and/or Novell's partners permission to contact me regarding Novell products and services."&lt;p&gt;
No thank you, I don't want to be contacted. WTF.</description>
    </item>
    <item>
      <pubDate>Sun, 8 Jun 2008 12:27:55 GMT</pubDate>
      <title>8 Jun 2008</title>
      <link>http://www.advogato.org/person/robertc/diary.html?start=85</link>
      <guid>http://www.advogato.org/person/robertc/diary.html?start=85</guid>
      <description>&lt;p&gt;So, the last lazyweb question I asked had good results.&#xD;
Time to try again:&lt;p&gt;&#xD;
Whats a good python-accessible,&#xD;
cross-platform-and-trivially-installable(windows users)&#xD;
flexible (we have plain text, structured data, etc and  a&#xD;
back-end storage area which is only accessible via the bzr&#xD;
VFS in the general case), fast (upwards of 10^6 documents ),&#xD;
text index system?&lt;p&gt;&#xD;
pylucene fails the trivially installable test (apt-cache&#xD;
search lucence -&amp;gt; no python bindings), and the bindings&#xD;
are reputed to be SWIG:(, xapian might be a candidate,&#xD;
though I have a suspicion that SWIG is there as well from&#xD;
the reading I have done so far, and - we'll have to&#xD;
implement our own BackEndManager subclass back into python.&#xD;
That might be tricky - my experience with python bindings is&#xD;
folk tend to think of trivial consumers only, not of python&#xD;
providing core parts of the system :(.&lt;p&gt;&#xD;
So I'm hoping there is a Better Answer just lurking out there...&#xD;
&#xD;
&lt;p&gt; Updates: sphinx looks possible, but about the same as xapian&#xD;
- it will need a custom storage backend. google desktop is&#xD;
out (apart from anything else, there is no way to change the&#xD;
location documents are stored, nor any indication of a&#xD;
python api to control what is indexed).&#xD;
&#xD;
&lt;p&gt; It looks like I need to be considerably more clear :). I'm&#xD;
looking for something to index historical bzr content, such&#xD;
that indices can be reused in a broad manner(e.g. index a&#xD;
branch on your webserver), are specific to a&#xD;
branch/repository (so you don't get hits for e.g. the&#xD;
working tree of a branch), with a programmatic API (so that&#xD;
the bzr client can manage all of this), with no requirement&#xD;
for a daemon (low barrier to entry/non-admin users).</description>
    </item>
    <item>
      <pubDate>Wed, 4 Jun 2008 12:19:34 GMT</pubDate>
      <title>4 Jun 2008</title>
      <link>http://www.advogato.org/person/robertc/diary.html?start=84</link>
      <guid>http://www.advogato.org/person/robertc/diary.html?start=84</guid>
      <description>&lt;p&gt;So I've been playing with &lt;a href="http://www.mnemosyne-proj.org/" &gt;Mnemosyne&lt;/a&gt; recently, using it to help brush up on my woeful Latin vocabulary. I thought it would be a good idea to get some of that data out of my head an into Ubuntu (which has a &lt;a href="https://translations.edge.launchpad.net/ubuntu/hardy/+lang/la" &gt;Latin translation&lt;/a&gt;).&lt;p&gt;
Imagine my surpise when, after installing the latin language pack (through the gui), I could not log into Ubuntu in Latin?!&lt;p&gt;
It turns out that there is no Latin locale in Ubuntu, or indeed in glibc. This is kind of strange (there is an esperanto locale). Remember that locales combine language and location - they describe how to format money, numbers, telephone details and so on. So clearly, I needed to add a latin locale. I could add one for just me (e.g. la_AU), or I could add a generic one (helpfully using AU values) on the betting chance that at this point there are not enough folk wishing to log in in latin (after all you can't!) for us to need one per country. And even more so, doing la_AU doesn't make a lot of sense - there isn't a pt_AU locale even though there are portuguese speakers living in Australia. (The root issue here is that location and language are conflated. POSIX I hate thee). So, a quick crash course in locales, some copy and paste later, and there is a &lt;a href="https://bugs.launchpad.net/bugs/234105" &gt;Latin locale&lt;/a&gt;. &lt;p&gt;
Installing that on my system got me a latin locale, but gdm still wouldn't let me select it. It turns out that gdm feels the urge to maintain its own list of what locales exist, and what to call them. I thought duplication in software was a bad idea, but perhaps I don't understand the problem space enough. Anyhow, time to &lt;a href="https://bugs.launchpad.net/bugs/234101" &gt;fix it&lt;/a&gt;.&lt;p&gt;
And because this is something other people may be interested in, and the patches are not yet in Ubuntu because upstream glibc may choose a different locale code (e.g. la_AU), I've finally had reason to activate my &lt;a href="https://edge.launchpad.net/~lifeless/+archive" &gt;ppa&lt;/a&gt; on launchpad, so there are now binary packages for hardy for anyone that wants to play with this!</description>
    </item>
    <item>
      <pubDate>Fri, 23 May 2008 06:07:11 GMT</pubDate>
      <title>23 May 2008</title>
      <link>http://www.advogato.org/person/robertc/diary.html?start=83</link>
      <guid>http://www.advogato.org/person/robertc/diary.html?start=83</guid>
      <description>&lt;p&gt;This week I've been at UDS in Prague, and looking at some&#xD;
possible ways to deploy bzr for packaging (which is a hot&#xD;
topic: developers don't want to change workflows without a&#xD;
concrete benefit, and definitely don't want to pay a cost&#xD;
for doing so - e.g. having to have all of history locally&#xD;
just to make a trivial change).&lt;p&gt;&#xD;
One of the discussions inspired a scalability test for bzr -&#xD;
not how we think we'd deploy bzr for Ubuntu developers, just&#xD;
a test to understand how it would scale *if* we did it this&#xD;
way.&lt;p&gt;&#xD;
&lt;a href="http://blog.liw.fi/" &gt;Lars Wirzenius&lt;/a&gt; has a habit&#xD;
of testing VCS systems capabilities in various ways,&#xD;
including importing the Debian/Ubuntu source archive into&#xD;
them. He kindly ran a test using bzr, creating a single&#xD;
shared repository, with one branch in it per source packages.&lt;p&gt;&#xD;
This took a few hours to generate (I'm not sure of the exact&#xD;
figure, we forgot to time it, but it was started in the&#xD;
afternoon and finished in the morning). The resulting&#xD;
repository has 21GB in its .bzr/repository/packs directory,&#xD;
and 500MB in its .bzr/repository/indices directory. There&#xD;
are 30 pack files, the largest of which is 16GB, and the&#xD;
smallest a few hundred kB. &lt;p&gt;&#xD;
In general VCS terms this repository has 16000 heads, 16000&#xD;
commits (because we didn't import deep archive history).&lt;p&gt;&#xD;
But what about performance? Its currently copying to a&#xD;
machine where I can do some serious benchmarks using this&#xD;
repository. I do have some quick and dirty figures though.&#xD;
To branch a single package (libyanfs-java) from its branch&#xD;
within the repository to a new standalone branch with cold&#xD;
cache took ~5 seconds. Branching again from the repository&#xD;
now the needed data is in page cache took 0.6 seconds.&#xD;
Branching from the newly created branch to another new&#xD;
standalone branch took 0.3 seconds. &lt;p&gt;&#xD;
There is a clear slowdown occuring here. Including startup&#xD;
costs the time to make a new branch is doubled by adding the&#xD;
branch to the repository. However as the repository is 16000&#xD;
times the size, the scaling factor (2/16000) is pretty darn&#xD;
good. I'm stoked at this result, as I think it demonstrates&#xD;
just what the underlying pack store is capable of. We are&#xD;
working on streamlining the upper layers of bzr to make&#xD;
better and better use of the underlying store. For instance,&#xD;
&lt;a href="http://jam-bazaar.blogspot.com/" &gt;John Meinel&lt;/a&gt;&#xD;
has just done this for 'bzr missing' and 'bzr uncommit'.&lt;p&gt;&#xD;
Now I must go, time for breakfast!&lt;p&gt;&#xD;
Woo!</description>
    </item>
    <item>
      <pubDate>Sat, 8 Mar 2008 11:07:24 GMT</pubDate>
      <title>8 Mar 2008</title>
      <link>http://www.advogato.org/person/robertc/diary.html?start=82</link>
      <guid>http://www.advogato.org/person/robertc/diary.html?start=82</guid>
      <description>&lt;p&gt;Best Breakfast place if you are in london: &lt;a href="http://www.roast-restaurant.com/" &gt;Roast&lt;/a&gt;&lt;p&gt;
Yum.&lt;p&gt;
(Also most expensive I suspect).</description>
    </item>
    <item>
      <pubDate>Tue, 26 Feb 2008 20:36:56 GMT</pubDate>
      <title>26 Feb 2008</title>
      <link>http://www.advogato.org/person/robertc/diary.html?start=81</link>
      <guid>http://www.advogato.org/person/robertc/diary.html?start=81</guid>
      <description>I'm very happy to announce that Canonical are hosting a &lt;a href="http://www.squid-cache.org/" &gt;Squid&lt;/a&gt; meetup in&#xD;
London this coming Saturday and Sunday the 1st and 2nd of&#xD;
March. Any developers (in the broad sense - folk doing&#xD;
coding/testing/documenting/community support/) are very&#xD;
welcome to attend. As it is a weekend and a security office&#xD;
building, you need to contact me to arrange to come - just&#xD;
rocking up won't work :). We'll be there all Saturday and&#xD;
Sunday through to mid-afternoon.&#xD;
&#xD;
&lt;p&gt; The Canonical London office is in Millbank Tower&#xD;
http://en.wikipedia.org/wiki/Millbank_Tower.&#xD;
&#xD;
&lt;p&gt; So if you want to come by please drop me a mail.&#xD;
&#xD;
&lt;p&gt; We'll be getting very technical very quickly I expect - for&#xD;
folk wanting a purely social meetup, I'm going to pick a&#xD;
reasonable place to meet for food and (optionally) alcohol&#xD;
on Saturday evening - I'll post details here mid-friday.&#xD;
</description>
    </item>
  </channel>
</rss>
