<?xml version="1.0"?>
<rss version="2.0">
  <channel>
    <title>Advogato blog for cuenca</title>
    <link>http://www.advogato.org/person/cuenca/</link>
    <description>Advogato blog for cuenca</description>
    <language>en-us</language>
    <generator>mod_virgule</generator>
    <pubDate>Fri, 10 Feb 2012 16:56:00 GMT</pubDate>
    <item>
      <pubDate>Tue, 11 Nov 2003 21:56:28 GMT</pubDate>
      <title>11 Nov 2003</title>
      <link>http://www.advogato.org/person/cuenca/diary.html?start=5</link>
      <guid>http://www.advogato.org/person/cuenca/diary.html?start=5</guid>
      <description>&lt;b&gt;Chema&lt;/b&gt;

&lt;p&gt; I will miss you, dude.

&lt;p&gt; I still remember the first time I meet Chema, at the first GUADEC.  There was not a lot of people there, and the spanish speakers where a little bit packed together.  I got thus specially tied with Rodrigo, alo, and Chema.

&lt;p&gt; I still remember when Chema and me where alone together in the underground at Paris.  Chema said me how exciting was to meet people that shared a common passion.  He said me that he had not yet done anything in GNOME, but he wanted to help with gnome-print.  He was almost a linux newbee.

&lt;p&gt; A linux newbee, a future gnome hacker, that just wanted to help with gnome-print, just travelled half the planet to meet other gnome hackers.  Man, *THAT* was passion.  He not only helped with gnome-print, but it became its maintainer, along with gedit, glade3, gst, ...

&lt;p&gt; As I write this, I'm seeing Chema giving away Ximian t-shirts at a Copenhagen's pub, or just explaining frenetically its ideas about GST.

&lt;p&gt; I will miss you, dude.
</description>
    </item>
    <item>
      <pubDate>Fri, 14 Feb 2003 17:25:54 GMT</pubDate>
      <title>14 Feb 2003</title>
      <link>http://www.advogato.org/person/cuenca/diary.html?start=4</link>
      <guid>http://www.advogato.org/person/cuenca/diary.html?start=4</guid>
      <description>&lt;b&gt;Trees&lt;/b&gt;

&lt;p&gt; &lt;p&gt;I've been reading about David's &lt;a href="http://www.treedragon.com/ygg/blobs.htm" &gt;blobs&lt;/a&gt; in more deep these days.
I still don't fully understand them, but I think that I understand them well enough to start
comparing them to a piece table with a tree.

&lt;p&gt; &lt;p&gt;The two structures are nearly identical.  The only difference is on the leafs
of the b-tree.  From what I've read, it seems that a blob will store the text directly on the leafs, while a piece table will
only store a pointer to the real text.  This pointer is composed of the buffer (the read-only or the append-only one, see my previous diary entry), the offset
and the size of the piece of text.

&lt;p&gt; &lt;p&gt;This extra indirection has several advantages.

&lt;p&gt; &lt;p&gt;First, the number of leafs is no more dependent on the size of the text, but on
the number of pieces in your piece table. That's in theory dependent only on the number
of changes that the piece table has suffered since the file was first read. In the
real world, people likes to (over?) use the pieces to store the format of the text,
so the pieces have the additional restriction of being contiguous chunks of text
with the same format. But even with this additional restriction, there should be
less nodes in the tree.

&lt;p&gt; &lt;p&gt;Second, the text is never, ever, really deleted. I don't find the delete algorithms
of blobs, so maybe that's also a blob's feature, but I don't think so (my guess is
only funded on the blob's data structure). That may seem at first like bloat.  After
all, why do you need to keep around even the deleted text? To be able to undo.
In fact, the piece table enforces an infinite undo as a trivial feature.

&lt;p&gt; &lt;p&gt;The piece table also shares the strong points of blobs.  Copy-on-write, for instance,
can also be implemented more or less the same way as with blobs.

&lt;p&gt; &lt;p&gt;In short, I'm maybe missing something, but I still prefer a piece table with a
auto-balanced tree than a blob.
</description>
    </item>
    <item>
      <pubDate>Fri, 7 Feb 2003 14:37:25 GMT</pubDate>
      <title>7 Feb 2003</title>
      <link>http://www.advogato.org/person/cuenca/diary.html?start=3</link>
      <guid>http://www.advogato.org/person/cuenca/diary.html?start=3</guid>
      <description>&lt;strong&gt;Trees&lt;/strong&gt;

&lt;p&gt; &lt;p&gt;
	I've been studying trees lately.  The goal that I pursuit is to do a
	data structure/algorithms that make editing operations a O(log n),
	where n is the number of previous changes.  In &lt;a href="http://www.advogato.org/proj/AbiWord/" &gt;AbiWord&lt;/a&gt; we have
	O(n) algorithms, so the O(log n) should provide a good boost to the
	performance.


&lt;p&gt; &lt;p&gt;
	I will present a somewhat simplified vision the current data structure.
	It is composed of 2 buffers.  One of them is read only (the buffer "0"),
	and contains the original text of the file that you're editing.  The
	second one is append only (buffer "1"), and at the beginning is empty.
	We also have a linked list, whose nodes contain 3 fields: Buffer, Offset
	and Size.  Each node represents a piece of text.  At first, the linked
	list only contains a node, with has Buffer 0, Offset 0, and Size equal
	to the size of the read only buffer.


&lt;p&gt; &lt;p&gt;
	To explain the insert operation I will use an example.  Let's say that
	you have a document with "ABCDEFGH" and you want to insert the 'a' letter
	at the 4th position in the document.  


&lt;p&gt; &lt;p&gt;
	Buffer 0: ABCDEFGH&lt;br&gt;
	Buffer 1: [empty]&lt;br&gt;
	Linked list: &amp;lt;0, 0, 8&amp;gt;


&lt;p&gt; &lt;p&gt;
	To do that, you should append the 'a' letter to the end of the Buffer 1,
	and you split the node on the linked list and insert a new node to point
	to your letter.  It ends like that:


&lt;p&gt; &lt;p&gt;
	Buffer 0: ABCDEFGH
	Buffer 1: a
	Linked list: &amp;lt;0, 0, 4&amp;gt; =&amp;gt; &amp;lt;1, 0, 1&amp;gt; =&amp;gt; &amp;lt;0, 4, 4&amp;gt;


&lt;p&gt; &lt;p&gt;
	In general, insertion is: search the right node on the linked list (O(n)
	operation), and split the node.  Of course, there are some edge cases,
	when you insert at the beginning or at the end of a piece you don't need
	to split it.


&lt;p&gt; &lt;p&gt;
	The delete operation works much in the same way.  If you want to delete
	de 'B' in our example's document, you don't have to touch any of the
	buffers, and you change the linked list to:


&lt;p&gt; &lt;p&gt;
	Linked list: &amp;lt;0, 0, 1&amp;gt; =&amp;gt; &amp;lt;0, 2, 2&amp;gt; =&amp;gt; &amp;lt;1, 0, 1&amp;gt; =&amp;gt; &amp;lt;0, 4, 4&amp;gt;

&lt;p&gt; &lt;p&gt;
	To get the document contents, you only have to walk the linked list, and
	get the characters that the nodes point to.


&lt;p&gt; &lt;p&gt;
	I wanted to change the linked list to a balanced tree, to make lookup,
	insert and delete operations O(log n) instead of O(n).  So I did it, and
	I put the results at &lt;a href="http://e98cuenc.free.fr/wordprocessor/piecetable.html" &gt;http://e98cuenc.free.fr/wordprocessor/piecetable.html&lt;/a&gt;
	(along with a regression test suite).


&lt;p&gt; &lt;p&gt;
	You may be surprised that I choose a red-black tree instead some simpler
	structure (as a skip list) or some more efficient structure (as a b-tree).
	It was just because I already had a red-black tree that I implemented a
	while ago, so I just took the first balanced tree that I had on my hard disk.
	It was enough to show my idea, but before putting it in production code
	I want to convert it to a b-tree.  The b-tree variant that I will try to
	implement is a fractal prefetching b+-tree
	(&lt;a href="http://www.pdl.cmu.edu/ftp/Database/fpbtree.pdf" &gt;http://www.pdl.cmu.edu/ftp/Database/fpbtree.pdf&lt;/a&gt;).


&lt;p&gt; &lt;p&gt;
	The interesting part of changing from a linked list to a balanced tree
	was how to pass from a document position to a node on log(n).  To do that
	I store the size of the children of each node in the node itself.  Of
	course, it was interesting because I didn't know anything about the
	Enfilade theory, nor David McCusker blobs.


&lt;p&gt; &lt;p&gt;
	Using the &lt;a href="http://www.sunless-sea.net/wiki/General%20Enfilade%20Theory" &gt;enfilade terminology&lt;/a&gt; that &lt;a href="http://www.advogato.org/person/Raph/" &gt;Raph&lt;/a&gt; wrote about, the size of the nodes
	is a "wid" property.


&lt;p&gt; &lt;p&gt;
	&lt;a href="http://www.treedragon.com/ygg/blobs.htm" &gt;David McCusker's blobs&lt;/a&gt; seems to share more or less the same goals as this
	data structure.  Insertion, delete, and lookup are O(log n) operations.
	The structure is undo friendly, as you only have to store the operation
	(insert or delete) and at which point it happens to be able to undo it.
	David has obviously put a lot of work on blobs, and it seems to implement
	several optimization over the "piece table with a balanced tree" that
	I've implemented (as copy-on-write).  I will have to look deeper inside
	Blobs to see how is it implemented, as right now I've only had a quick
	look at them.


&lt;p&gt; &lt;p&gt;
	I guess that all that should look like baby steps, as I've just started
	to look into the subject, and the trees field is a very mature one...


&lt;p&gt; &lt;p&gt;
	Btw, &lt;a href="http://www.advogato.org/person/Raph/" &gt;Raph&lt;/a&gt;, your tree representation for Athshe is beautiful :-)


&lt;p&gt; &lt;p&gt;
&lt;strong&gt;Batch formatting&lt;/strong&gt;

&lt;p&gt; &lt;p&gt;
	I'm kind of on a middle point between &lt;a href="http://www.advogato.org/person/Raph/" &gt;Raph&lt;/a&gt; and &lt;a href="http://www.advogato.org/person/cinamod/" &gt;cinamod&lt;/a&gt;, but more on the
	&lt;a href="http://www.advogato.org/person/cinamod/" &gt;cinamod&lt;/a&gt; side than on the &lt;a href="http://www.advogato.org/person/Raph/" &gt;Raph&lt;/a&gt; side :-) on the batch formatting tool vs.
	improve one of the current word processors.


&lt;p&gt; &lt;p&gt;
	I can see the &lt;a href="http://www.advogato.org/person/Raph/" &gt;Raph&lt;/a&gt; point of one batch formatting tool being easier to do
	than a full word processor.  &lt;a href="http://www.advogato.org/person/Raph/" &gt;Raph&lt;/a&gt; says that you don't need to care about
	all the GUI, but you also don't need very complex data structures to
	hold your data.  The text is not going to be inserted, nor deleted, so these
	operations don't need to be fast, as they don't exist.  You don't need, either,
	undo.


&lt;p&gt; &lt;p&gt;
	I don't think that it will be possible to get a perfect word conversor
	if its layout engine is not done with the goal of duplicating the MS Word
	one.  For instance, TeX will do a full paragraph justification, while word
	does not, so if you put TeX in the middle of the chain you will never get
	an identical result.


&lt;p&gt; &lt;p&gt;
	Also, I think that a fundamental part of a batch conversor will be a
	regression test suite that it should have if it wants any remote possibility
	of getting a perfect output.

&lt;p&gt; &lt;p&gt;
	But there is a lot of common code between a batch formatting and a word processor
	to just let them go through separate paths.  I guess that the only added
	complexity due to be in a interactive word processor that will impact the
	batch formatting code is due to the above mentioned data structures to
	hold your data, and implementing them raises the usefulness of such a
	program a magnitude order.  IMO the usefulness/complexity ratio is worth
	making the batch formatting tool "edition capable".

</description>
    </item>
    <item>
      <pubDate>Tue, 18 Jul 2000 23:14:44 GMT</pubDate>
      <title>18 Jul 2000</title>
      <link>http://www.advogato.org/person/cuenca/diary.html?start=2</link>
      <guid>http://www.advogato.org/person/cuenca/diary.html?start=2</guid>
      <description>The university is over, na na na na :)

&lt;p&gt; Well, the uni has been over (for this year) for 3 weeks, 
now, but I&#xB4;ve been busy searching a job, getting back at 
home, etc...

&lt;p&gt; And *NOW*, I have a Real Computer(TM).  A pentium 100Mz 
with 64Mb... better than my french computer (only 16Mb).

&lt;p&gt; Finally, I managed to learn a bit of perl (It has been in 
my TODO for ~two years).

&lt;p&gt; And, the hottest news!!  Now, I even have Inet 
connection!!! (who said that Spain is not a developped 
country? :)

&lt;p&gt; Yesterday I started to hack the toolbars of AbiWord (they 
are too long, and they use only one toolbar per band...), 
translating some stuff, and played a bit more with Dia... 
Dude, that&#xB4;s a KICK ASS app!

&lt;p&gt; I&#xB4;ve been playing with the idea of making a parser that 
generates a UML Dia diagram of a bunch of C++ source files 
(to do all the classes by hand is too cumbersome).

&lt;p&gt; I miss the other GNOME guys, specially Chema and Rodrigo 
(and I want to meet acs!).  I&#xB4;m looking forward for GUADEC 
2 or something...
</description>
    </item>
    <item>
      <pubDate>Thu, 25 May 2000 03:50:40 GMT</pubDate>
      <title>25 May 2000</title>
      <link>http://www.advogato.org/person/cuenca/diary.html?start=1</link>
      <guid>http://www.advogato.org/person/cuenca/diary.html?start=1</guid>
      <description>All the day wasted in a stupid program for BD2.  And *all
the day* is *all the day*.  24h/24h.  I'm tired and I want
to sleep.

&lt;p&gt; BD sucks.
</description>
    </item>
    <item>
      <pubDate>Sat, 6 May 2000 00:03:32 GMT</pubDate>
      <title>6 May 2000</title>
      <link>http://www.advogato.org/person/cuenca/diary.html?start=0</link>
      <guid>http://www.advogato.org/person/cuenca/diary.html?start=0</guid>
      <description>Well, it was a long time since my last entry...

&lt;p&gt; In the lasts two weeks I've been doing my homeworks: I've
been working in hdoc (a C++ documentation extractor, a la
javadoc or headerdoc), in IT, GL2, BD2, etc...

&lt;p&gt; The university is *TOO* boring.

&lt;p&gt; I've don't touched a single line of code of Abiword in the
last two weeks, and it sucks too much.
</description>
    </item>
  </channel>
</rss>

