27 Feb 2004 hereticmessiah   » (Journeyer)

Part of my final year project is a version control system. It’s not the most important part, but it is important.

I searched about for delta compression algorithms, but all of them were either inappropriate or would take too long for me to implement (not to mention overkill for a college project!). So I came up with my own line-oriented one. Last night I wrote a quick implementation of the algorithm out on paper in python. I haven’t tested it out yet, so I’m not sure about its performance. The end product will be written in C++.

def GetDiff(oldLines, newLines):
    """Constructs a diff to get from newlines to oldlines."""

# Construct our special "hash" table. tblSize = len(newLines) * 2 + 1 tbl = tuple([[] for i in range(tblSize)]) for i in range(len(newLines)): pos = hash(newLines[i]) mod tblSize tbl[pos].append(i)

# Holds our diff information. result = []

# Find the common subsequences and differences. iOld = 0 iMax = 0 while iOld < len(oldLines): pos = hash(oldLines[iOld]) mod tblSize

# Start with no match nMax = 0 for iNew in tbl[pos]: nCur = 0 while nCur + iNew < len(newLines) and \ newLines[nCur + iNew] == oldLines[nCur + iOld]: nCur = nCur + 1 if nCur > nMax: nMax = nCur iMax = iNew

# Have we found a matching subsequence? if nMax == 0: # Nope. result.append(oldLines[iOld]) iOld = iOld + 1 else: # Woohoo! result.append((iMax, nMax)) iOld = iOld + nMax

return result

If anybody has any thoughts on it, I’d love to know.

Latest blog entries     Older blog entries

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!