Thursday, April 16, 2009

Essential Startup Software Development Infrastructure - 2007 Edition

A while back, I had done some research into what I would do at this point to set up again the basic software development infrastructure for a new startup project - with hindsight and the state of open-source development tools of ca 2007. Again the goal of the experiment is to spend little or no money, use only free and open-source software and end up with a solution which could be set up from scratch in only a few days.

Here are the basic choices of the 2007 edition software development infrastructure (to be elaborated in future posts):
  • The entire development support infrastructure should again be able to run on a single machine. For the experiment, I used my home server running Sabayon Linux a more user friendly version of the Gentoo source based Linux distribution. We assume that this machine is behind a firewall and cannot be accessed from the outside.
  • As the version control system, I chose Subversion (svn). Svn is mature, stable, well supported and largely accepted to be the natural replacement for CVS as the dominant open-source SCM. This is a deliberately conservative choice - since there is nothing as important and critical in the development support infrastructure as the SCM. Much of the ongoing innovation is in distributed SCM, but since I am most familiar with the centralized model, I know that svn will work and I assume there is no time for trialing and evaluation - so svn it is.
  • The overall frame of the infrastructure is provided by Trac. It wraps nicely around svn and provides a few important and well-integrated services out of the box, with little or no configuration:
    • Web based browser for Subversion repository, including change-sets.
    • Simple Wiki for organizing links and documentation
    • Bug/issue tracking ticketing system
    • Automated wiki style linkage between all subsystems of Trac referencing wiki pages, tickets and svn change-set numbers anywhere in pages, tickets or svn checkin comments.
    • Event time-line covering updates to wiki pages, issue tickets or svn checkins.
  • For the e-mail subsystem, we use the Postfix server, which is generally accepted as a more secure and administrator friendly replacement for the old sendmail. To implement a complete "closed-circuit" mail service, we use the Dovecot POP & IMAP server to provide access to the mail stored on the server to any e-mail client supporting these protocols.
  • To support mailing lists, we can simply use mail aliases provided by Postfix - in combination with MHonarc for managing the web archives.
  • From the original list, I am deliberately ignoring build and test automation. Not because it is not important, but because it more than others depends on the particularities of the project, may require some more significant setup and most from experience at least its own dedicated machine, if not a whole cluster of them. If possible, we would like a solution, which integrates nicely with Trac - e.g. Bitten.
For the setup and configuration, I am trying to keep things as simple as possible. Use defaults wherever possible and maybe even choose tools, just because they seem simpler to set up and configure. Most likely we can assume that the initial setup would be done by somebody who is a member of the development team, with limited system administration experience like myself and that there might be no professional system administrator around to help.

Sunday, April 5, 2009

Branching is easy, Merging is hard

Merging concurrent or overlapping changes to the same piece of source-code is one of the basic operations to support collaborative development and is supported in some form by most modern version control systems.

Merging is required if changes from two different branches have to be reconciled - these could be branches created explicitly or implicitly by two users editing the same file concurrently as in the example below.

In the example below, Alice and Bob are both making changes to the same file. After Alice has submitted a new version, Bob needs to merge the changes into his client view before submitting a new version.

Typically a merge is required if a branch is being closed, but the current head version (R43 in the example) on the parent branch where it had branched off from earlier is no longer identical to the version from which it was branched off (R42) - i.e. changes have happened in parallel on both the parent and its derived branch. In such cases a 3-way merge operation can be used to reconcile two version of a file, given a common ancestor.

A 3-way merge operation could most easily be emulated as taking a file diff between the common ancestor and each of the version under consideration and then "add" both those diffs to the common ancestor. If the changes are sufficiently non-overlapping, then the merging might be done automatically, otherwise a merge conflict occurs which needs to be reconciled manually maybe by creating a new version which combines the joint intention of the other two.

But even if a merge seems to present on conflict a the level of the 3-way merge algorithm, it could still be semantically wrong. E.g. if one of the changes renames a function and all its invocations and the other change adds a new reference to the old name, then even though these change may not represent a merge conflict, the resulting code would still be wrong. After an automated merge, thorough review and testing of the resulting code is quite essential.

I am typically not a big fan of overly visual or graphical development tools, one exception being support for merge. I find it hard to follow, what is going on during a merge and seeing the different versions lined up side by side, with the differences highlighted is clearly helpful.

My ideal tool should show the three input version side by side: common ancestor/base, as well as the two contributors being merged. The tool should automatically propose a merged solution if there is no conflict and should allow to easily navigate through the section of the file which have changed. The resulting version should be shown in a fourth window and should be editable on the fly to fix and resolve merge conflicts or errors introduced by the automated merge.

Surprisingly, there are only a small number of open-source tools available which support a graphical 3-way merge, on Linux in particular. As far as I know the list is about:
My favorite one, which I use every day is kdiff3. Since it is a bit heavy and slow to start, I tend to rely on the auto-merge algorithm built into the version control tool unless it detects a conflict, in which case a script hands over the merging of only the conflicting files to kdiff3.

Wednesday, April 1, 2009

Startups: Technology Execution Play

At the opposite end of the spectrum from the concept and Zeitgeist heavy startups of the web age, is the kind of startup without neither any particularly great new idea, nor a secret new technology, simply doing something which is really hard to do and only very few people would know how.

During the late 1990ies, the Internet had been growing by leaps and bounds, requiring a doubling in capacity every couple of months for many types of networks and networking gear was constantly running out of steam and needed to be upgraded with the next generation of higher capacity equipment. So just building the next bigger, better gear sounded like a reasonable thing to do, except that it was easier said than done - specially at a breakneck pace of 18-24 months development cycles, barely ahead of Moore's law. Until recently, building telecom and networking equipment had been a relatively specialized niche craft, practiced mostly in the R&D labs of a small number of companies, selling to a boring, utility-like industry. With a small pool of people who would know how to build something like that, those crazy enough to try had a pretty good chance to succeed - if they could pull it off and execute. Many would succeed quite well (in terms of return on investment) without even having to build a real business to sell their product - they would be acquired by some established equipment manufacturer who desperately needed something like this, but whose internal R&D was years behind schedule - partly because, their most experienced staff had run off to start companies - often getting bought back by the companies they had left.

This was the climate in which we started Xebeo Communications, to build the next generation packet switch for carrier networks. We had no experience or track-record in business, other than having worked on the development of similar systems before. Nevertheless we raised some double-digit millions in venture capital funding on our technology expertise alone (Ok, those were crazy times and people got a lot more money to go sell dogfood on the Internet...).

The idea was simple, but the execution required a large and highly skilled team with a broad range of expertise: VLSI chip design, electro-optical componentry, hardware systems and circuit design, high-speed signals and thermal flow simulations, mechanical engineering, embedded high-availability software and development of specialized communications protocol software. And all this had to be put together into a working system in less time than any reasonable estimate, while pushing the technology close to the edge of what is possible at the time. For example, the contract manufacturer asked to keep one of the circuit boards for display in their lobby - it had been the most complicated one they had ever built thus far...

It was basically build it and they will come - we actually DID build it, but they never came. The bottom had fallen out underneath the tech market in ca 2001 leaving tons of unused equipment around at fire sale prices. Nobody needed to double any capacity anymore for quite some time. The company was acquired for cents on the dollar and after some some time of trying to find a niche for it, the project was eventually canceled - a white elephant from a bygone area whose time had never really come. [They had ultimately failed to see or take advantage of the real value of what they had aquired - not the product, already obsolete by then - but the team who could build it.]