When are two identical changes the same, and when aren't
they? Theres a little bit of debate started by Andrew
Cowie posting about unmixing the paint. Matt Palmer
followed up with a claim
that a particular technique used by Andrew is dangerous,
and finally Andrew Bennetts makes
the point that text conflicts are a small subset of merge
conflicts.
That said, one critical task for a version control system
is the merge command. Lets define merge at a human level as
"reproduce the changes made in branch A in my branch
B". There are a lot of taste choices that can be made
without breaking this definition. For instance, merge that
combines all the individual changes into one - losing the
individual commit deltas meets this description. So does a
merge which requires all text conflicts to be resolved
during the merge commands execution, or one that does not
give a human a chance to review the merged tree before
recording it as a commit.
So if the goal of merge is to reproduce these other
changes, then we are essentially trying to infer what the
*change* was. For example, in an ideal world, merging a
branch that changes all "log messages of floating points to
6 digit scale." would know enough to catch all new log
messages added in my branch, regardless of language, actual
api used etc etc. But that is fantasy at the moment. The
best we can do today depends on how we capture the change.
For instance, Darcs allows some changes to be captured as
symbol changing patches, and others as regular textual
diffs.
So the problem about whether arriving at the same result
can be rephrased 'when is arriving at the same result
correct or incorrect'.
For instance, if I write a patch and put it up as plain
text on a website, then two people developing $foo download
it and apply it, they have duplicate changes but its clearly
correct that a merge between them should not error on this.
On the other hand, the example Andrew Bennetts quotes in
his post is a valid example of two people making the same
change, but the line needing a change during the merge to
remain correct.
Here's another, example though. If I commit something
faulty to my branch, and you pull from me before I fix it.
Then while I fix the bug, you also fix it - the same way.
That is another example of no-conflict being correct.
If its possible for either answer - conflict, or
do not conflict - to be correct, then what should a VCS
author do?
There are several choices here:
- Always conflict
- Never conflict conflict
- Conflict based on a heuristic
I think that our job is to assess what the maximum harm
from choosing the wrong default is, and the likely hood of
that occuring, and then make a choice. Short of
fantasy no merge is, in general, definately good or bad -
your QA process (such as an automatic test suite) needs to
run regardless of the VCS's logic. The risk of a bad merge
is relatively low, because you should be testing, and if the
merge is wrong you can just not commit it, or roll it back.
So our job in merge is to make it likely as possible that
your test suite will pass when you have done the merge,
without further human work. This is very different to trying
to always conflict whenever we cannot be 100% sure that the
text is what a human would have created. Its actually harder
to take this approach than conflicting - conflicting is
easy.