Sunday, April 5, 2009

Branching is easy, Merging is hard

Merging concurrent or overlapping changes to the same piece of source-code is one of the basic operations to support collaborative development and is supported in some form by most modern version control systems.

Merging is required if changes from two different branches have to be reconciled - these could be branches created explicitly or implicitly by two users editing the same file concurrently as in the example below.

In the example below, Alice and Bob are both making changes to the same file. After Alice has submitted a new version, Bob needs to merge the changes into his client view before submitting a new version.

Typically a merge is required if a branch is being closed, but the current head version (R43 in the example) on the parent branch where it had branched off from earlier is no longer identical to the version from which it was branched off (R42) - i.e. changes have happened in parallel on both the parent and its derived branch. In such cases a 3-way merge operation can be used to reconcile two version of a file, given a common ancestor.

A 3-way merge operation could most easily be emulated as taking a file diff between the common ancestor and each of the version under consideration and then "add" both those diffs to the common ancestor. If the changes are sufficiently non-overlapping, then the merging might be done automatically, otherwise a merge conflict occurs which needs to be reconciled manually maybe by creating a new version which combines the joint intention of the other two.

But even if a merge seems to present on conflict a the level of the 3-way merge algorithm, it could still be semantically wrong. E.g. if one of the changes renames a function and all its invocations and the other change adds a new reference to the old name, then even though these change may not represent a merge conflict, the resulting code would still be wrong. After an automated merge, thorough review and testing of the resulting code is quite essential.

I am typically not a big fan of overly visual or graphical development tools, one exception being support for merge. I find it hard to follow, what is going on during a merge and seeing the different versions lined up side by side, with the differences highlighted is clearly helpful.

My ideal tool should show the three input version side by side: common ancestor/base, as well as the two contributors being merged. The tool should automatically propose a merged solution if there is no conflict and should allow to easily navigate through the section of the file which have changed. The resulting version should be shown in a fourth window and should be editable on the fly to fix and resolve merge conflicts or errors introduced by the automated merge.

Surprisingly, there are only a small number of open-source tools available which support a graphical 3-way merge, on Linux in particular. As far as I know the list is about:
My favorite one, which I use every day is kdiff3. Since it is a bit heavy and slow to start, I tend to rely on the auto-merge algorithm built into the version control tool unless it detects a conflict, in which case a script hands over the merging of only the conflicting files to kdiff3.