Wednesday, April 15, 2009

Essential Startup Software Development Infrastructure - 2007 Edition

A while back, I had done some research into what I would do at this point to set up again the basic software development infrastructure for a new startup project - with hindsight and the state of open-source development tools of ca 2007. Again the goal of the experiment is to spend little or no money, use only free and open-source software and end up with a solution which could be set up from scratch in only a few days.

Here are the basic choices of the 2007 edition software development infrastructure (to be elaborated in future posts):
  • The entire development support infrastructure should again be able to run on a single machine. For the experiment, I used my home server running Sabayon Linux a more user friendly version of the Gentoo source based Linux distribution. We assume that this machine is behind a firewall and cannot be accessed from the outside.
  • As the version control system, I chose Subversion (svn). Svn is mature, stable, well supported and largely accepted to be the natural replacement for CVS as the dominant open-source SCM. This is a deliberately conservative choice - since there is nothing as important and critical in the development support infrastructure as the SCM. Much of the ongoing innovation is in distributed SCM, but since I am most familiar with the centralized model, I know that svn will work and I assume there is no time for trialing and evaluation - so svn it is.
  • The overall frame of the infrastructure is provided by Trac. It wraps nicely around svn and provides a few important and well-integrated services out of the box, with little or no configuration:
    • Web based browser for Subversion repository, including change-sets.
    • Simple Wiki for organizing links and documentation
    • Bug/issue tracking ticketing system
    • Automated wiki style linkage between all subsystems of Trac referencing wiki pages, tickets and svn change-set numbers anywhere in pages, tickets or svn checkin comments.
    • Event time-line covering updates to wiki pages, issue tickets or svn checkins.
  • For the e-mail subsystem, we use the Postfix server, which is generally accepted as a more secure and administrator friendly replacement for the old sendmail. To implement a complete "closed-circuit" mail service, we use the Dovecot POP & IMAP server to provide access to the mail stored on the server to any e-mail client supporting these protocols.
  • To support mailing lists, we can simply use mail aliases provided by Postfix - in combination with MHonarc for managing the web archives.
  • From the original list, I am deliberately ignoring build and test automation. Not because it is not important, but because it more than others depends on the particularities of the project, may require some more significant setup and most from experience at least its own dedicated machine, if not a whole cluster of them. If possible, we would like a solution, which integrates nicely with Trac - e.g. Bitten.
For the setup and configuration, I am trying to keep things as simple as possible. Use defaults wherever possible and maybe even choose tools, just because they seem simpler to set up and configure. Most likely we can assume that the initial setup would be done by somebody who is a member of the development team, with limited system administration experience like myself and that there might be no professional system administrator around to help.

Sunday, April 5, 2009

Branching is easy, Merging is hard

Merging concurrent or overlapping changes to the same piece of source-code is one of the basic operations to support collaborative development and is supported in some form by most modern version control systems.

Merging is required if changes from two different branches have to be reconciled - these could be branches created explicitly or implicitly by two users editing the same file concurrently as in the example below.

In the example below, Alice and Bob are both making changes to the same file. After Alice has submitted a new version, Bob needs to merge the changes into his client view before submitting a new version.

Typically a merge is required if a branch is being closed, but the current head version (R43 in the example) on the parent branch where it had branched off from earlier is no longer identical to the version from which it was branched off (R42) - i.e. changes have happened in parallel on both the parent and its derived branch. In such cases a 3-way merge operation can be used to reconcile two version of a file, given a common ancestor.

A 3-way merge operation could most easily be emulated as taking a file diff between the common ancestor and each of the version under consideration and then "add" both those diffs to the common ancestor. If the changes are sufficiently non-overlapping, then the merging might be done automatically, otherwise a merge conflict occurs which needs to be reconciled manually maybe by creating a new version which combines the joint intention of the other two.

But even if a merge seems to present on conflict a the level of the 3-way merge algorithm, it could still be semantically wrong. E.g. if one of the changes renames a function and all its invocations and the other change adds a new reference to the old name, then even though these change may not represent a merge conflict, the resulting code would still be wrong. After an automated merge, thorough review and testing of the resulting code is quite essential.

I am typically not a big fan of overly visual or graphical development tools, one exception being support for merge. I find it hard to follow, what is going on during a merge and seeing the different versions lined up side by side, with the differences highlighted is clearly helpful.

My ideal tool should show the three input version side by side: common ancestor/base, as well as the two contributors being merged. The tool should automatically propose a merged solution if there is no conflict and should allow to easily navigate through the section of the file which have changed. The resulting version should be shown in a fourth window and should be editable on the fly to fix and resolve merge conflicts or errors introduced by the automated merge.

Surprisingly, there are only a small number of open-source tools available which support a graphical 3-way merge, on Linux in particular. As far as I know the list is about:
My favorite one, which I use every day is kdiff3. Since it is a bit heavy and slow to start, I tend to rely on the auto-merge algorithm built into the version control tool unless it detects a conflict, in which case a script hands over the merging of only the conflicting files to kdiff3.