28 Feb 2004 hereticmessiah   » (Journeyer)

Am I an idiot or what! There were a few stupid errors in that diffing algorithm I threw up (give me a break: they don’t have python installed here in college). Here’s a corrected version. Again, any comments are welcome.

def GetDiff(oldLines, newLines):
    """Constructs a diff to get from newlines to oldlines."""
 
    # Construct our special "hash" table. Using primes for the table length
    # would be better, I know, but odd numbers are simpler.
    tblSize = len(newLines) * 2 + 1
    tbl     = tuple([] * tblSize)
    for iNew in range(len(newLines)):
        iBucket = hash(newLines[iNew]) % tblSize
        tbl[iBucket].append(iNew)
 
    # Holds our diff information.
    result = []
 
    # Find the common subsequences and differences.
    iOld = 0
    iMax = 0
    while iOld < len(oldLines):
        iBucket = hash(oldLines[iOld]) % tblSize
 
        # Start with no match
        nMax = 0
        for iNew in tbl[iBucket]:
            # if nMax > 0 and iNew < iMax + nMax:
            #     continue
            nCur = 0
            try:
                while newLines[iNew + nCur] == oldLines[iOld + nCur]:
                    nCur += 1
            if nCur > nMax:
                nMax = nCur
                iMax = iNew
                iEnd = iMax + nMax
                if iEnd == len(oldLines) or iEnd == len(newLines):
                    break
 
        # Have we found a matching subsequence?
        if nMax == 0:
            # Nope.
            result.append(oldLines[iOld])
            iOld += 1
        else:
            # Woohoo!
            result.append((iMax, nMax))
            iOld += nMax
 
     return result

There old version is in my last entry.

Latest blog entries     Older blog entries

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!