Wednesday, December 30, 2009

Android Log Viewer

One of my latest favorite discoveries in the Android market is aLogcat, a must-have for Android developers and power users, who want to know more about what is going on on the device. It is named after the logcat command which can be run in the debug shell, typically via the adb tool from the Android SDK, which requires the device to be connected to a host PC through a USB cable.

aLogcat allows to display a log console on the device itself, color coded by levels with options to filter by levels or arbitrary substrings. By default the console updates continuously with new messages as they appear in the log, but it can also be frozen to allow scrolling back through the log history without interfering screen updates. Since logs can also be sent via email, it subsumes the functionality of earlier log collector apps.

Now that the number of devices, configurations and version of Android are exploding, it is less and less likely that a developer can reproduce a particular problem, since they may only occur in particular device configurations to which the developer does not have access to. Tools like aLogcat are often the only way how developers can remotely diagnose a problem, with the help of a user who can reproduce it and is willing to invest some time in getting it resolved.

Sunday, December 6, 2009

Online Backup

Since we recently moved, my current backup system has become some what undone. I have not been able yet, to reactive my linux home-server, since neither its power input nor its TV signal output works in the new environment. But since backing up to an aging piece of low-cost hardware running an obsolete version of an OS, which I also happen to use for experimentation does not leave the kind of warm fuzzy feelings which one typically expects from a backup solution, maybe it was time again to look around for another solution.

From a maintainable and reliability point of view, it would be better to store the backups in the cloud, rather than on a single computer in the same room. On the other hand, sending the data out of the room opens up some serious privacy concerns

The solutions for online backups on mac are still a bit limited. There are some portable solutions using Amazon's S3 cloud storage service or the very open-protocol based rsync.net service, which could have supported an alternate target for my existing home-grown script. But since I wanted to primarily backup my media library (music, photos and videos), the storage cost added up to some real money very quickly.

In the end, I started using a commercial solution from CrashPlan, which has both a mac client and a matching online storage service (single datacenter located in the US). The basic client which supports both Mac and Linux is free (as in free beer) for personal use and there is a free trial for the online storage, which otherwise has a flat-rate pricing of about $50 per year.

The backup client runs continuously in the background and tries to be nice to both the CPU and network so that the computer should still be usable, even if there is a backup going on. In addition to the online service, CrashPlan can also do backups to attached hard-drives or in some peer-peer fashion to other computers running the same client.

So far the system has survived the baptism by fire of doing an online backup of my media-library over about 10 days of continuous backup activity, surviving a reboot and several network disruptions without a hitch.

Obviously the quality of backup is measured by how reliably the data could be restored after a disaster, but judging from the experience with the initial backup, the solution seems solid enough to give it a try for a while.

Thursday, November 26, 2009

Customized Call Routing

I now have a cellphone plan, where all outgoing calls are metered and very expensive, except for unlimited calls to 3 favorite numbers, which are included in the plan. For our international calling, we are already using a discount carrier activated through a dial prefix, which also supports a local land-line dial-in access to be usable from mobile phones. By declaring the access number as one of my favorites, I can make unlimited calls from my mobile phone at the substantially lower rate of the discount carrier. The only problems is that making calls through the indirection of a voice-prompt menu is very cumbersome!

This is where Phonecard Express comes in. This highly customizeable application inserts itself almost transparently as a filter between the Android system dialer and the telephony subsystem. Whether a call is made from the dialer, the address book or any other intent, the call setup is intercepted and potentially routed through a calling card service. In addition to supporting multiple cards and their specific call setup sequences (access numbers, voice prompt, PIN, etc) Phonecard Express also supports various policies which control which calls are routed through which calling card account or dialed directly. For people who travel internationally and tend to store all numbers in the address book in the GSM style "+" notation, Phonecard Express supports logic to expand the number with a configured international call prefix prefix.

In my setup, I have exception rules for my other two favorite numbers as well as the voicemail access to use direct dialing. For all other cards, the call is automatically routed through discount carrier account. The integration is so seamless and transparent that it is almost scary and the only noticeable drawback over the standard call flow is the noticeable larger post dial delay, partially from loading another application during the call setup flow and from having to dial an access number and key in the number as DMT signals on AVS prompt before the call is really initiated. On the other hand, this is a small price to pay for saving an order of magnitude in per minute calling cost.

Phonecard Express is a great example for the flexibility of the Android platform, where 3rd party applications can very deeply integrated and partially replace default system functionality.

Thursday, November 5, 2009

Virtual Phone, part II

I started using the new Google Voice service as my standard US phone number. Since they currently only support international calling (billed rate), but not international forwarding, the current setup is a bit suboptimal and roundabout: from Google Voice, calls are forwarded to my US based Skype online number, with my skype account itself being forwarded to my current cellphone. The only catch is that the post-dial-delay of this whole contraption is too long for me to pick up the call before it goes to voicemail and since the voicemail delay is not configurable there isn't much I can do. I once managed to catch a call today, by leaping at the answer button on the first ring - it was a wrong number...

At least I get an email notification right away, when somebody leaves a message. I could probably just as well turn off forwarding and just use it as a email based voicemail-box.

Tuesday, August 18, 2009

Monitoring Network Usage

Some people seem to be using NetMeter for continuously monitoring their network usage against a limited volume data-plan. Since we are now moving to Europe, where stingy data plans with only a few 100Mbs included per month are the norm, I now also have a sudden interest in measuring my monthly data usage.

Since NetMeter is a trouble-shooting tool, oriented towards very detailed tracking of network usage over the last few minutes to see what is going on right now, it is not really very suitable as long-term bandwidh counter.

The ideal network usage counter should be low overhead and run continuously in the background - e.g. start itself at boot time. In its most basic form, it should show the current monthly total of cellular data usage. It would also be nice to enter the specs of the data-plan to track agains: how much is included per month, what is the rate above that limit and when does the monthly limit reset. For the current month, the ideal application should always show how current data usage compares to the "budget", by constantly pro-rating the monthly limit to the part of the billing-month which has already passed. For people who travel a lot, tracking multiple SIM card accounts or roaming carriers with different rates would be nice to avoid nasty surprised at the end of the month.

Among the free applications on the market which offer bandwith usage tracking, I have tried NetCounter and NetSentry. Both run continuously in the background and poll the linux interface counters in the /proc pseudo-filesystem. Both are capable applications which track usage on both wifi and cellular data interfaces over a monthly period with configurable starting date. At this point neither offers yet an ongoing comparison of actual usage compared to the pro-rated monthly limit or any plotting of historical usage data from previous month for trend-analysis.

Wednesday, July 22, 2009

Collecting Android Logs

One of the problems debugging Android apps in the field is that the system error messages are quite generic and meaningless - e.g. Application not responding or application has stopped unexpectedly. All the interesting details like the java exception stack trace in the case of crashing apps are logged to the system log. This log can be looked at by running the "logcat" command from the background debug shell, usually by connecting over USB via the "adb" command from the Android SDK.

But what to do if a problem occurs in the field, when you don't have a computer handy with USB cable and Android SDK installed? I recently came across a few apps on the market which address this issue by running logcat in the background from the application, capture the output and bringing up an email message with the output to be sent to somebody - either yourself or the developer of the crashing app, who had been desperately asked to see the logs.

The source of one of the Log Collector apps is available online, and I was obviously curious how this is done. I had always wanted to add a log viewer to NetMeter, since seeing the logs is very essential to trouble-shooting anything going on with an Android device. Looking at the source-code for logcat, it seemed challenging to re-implement something similar in java as an Android application.

What I didn't know, is that it seem to be possible to run any native unix commands from within and Android Java application - even on regular production devices, assuming the application task has the necessary permissions to execute the particular command. Simply running the logcat command seems indeed to be the easiest way to get the system log from within an application. If I can find some time, maybe I'll add the logviewer to NetMeter after all.

Thursday, July 16, 2009

Virtual Phone

Similar to the virtual postal address setup, we are looking to keep a virtual US phone number while being overseas. Reasons for this are to give our friends and family in the US a permanent local number to call, no matter where are at a particular point in time. Having a US number for outbound calls is can also be useful to call US based 1-800 numbers - e.g. to call a credit card company and yell at them to stop sending spam to our vitual mailbox for example...

The simplest way to do that would be to get VoIP phone service from a provider like Vonage and the simply pack up the VoIP adapter and US phone and install it wherever we are - using the proper power converters.

However, we are doing something slightly more convoluted. Even though Skype became famous for offering free computer to computer calling using a peer-peer, the now also offer gateways to the plain-old telephone service at a fraction of the cost of most other VoIP providers.

For $9 per month, we have unlimited calling to the US and Canada plus two phone numbers, one in the US and one Switzerland. We can also set up forwarding to any number (free in the US, about 2c/min. anywhere else with the current subscriptions) so that any Skype calls, wheter from a computer user or from a call to either of these numbers will be forwarded to our designated home phone if we are not running the Skype anywhere on a computer.

Linking checkins to tickets (SDI 07 VII)

There are two crucial pieces information related to every change to the project's codebase: the what and the why.

Thanks to using svn for version control, all the details regarding what a change is are being automatically recorded in the form of an atomic changeset which moves the codebase from one (hopefully) consistent state to another.

Recording the why requires a bit more work. In this simple workflow, we are using Track tickets to document and track every single legitimate reason for making a change to the codebase - whether they are bugfixes or work items related to new features or enhancements. If every legitimate reason to change the code is represented by a ticket, then for each svn changeset there should be one or more tickets which shows why this change was made. Code changes shoudl be done in small incremental steps, but the reason for these changes may be a fairly large an complicated project requiring many small code changes. Tickets can also be used as threads for resolving and documenting design decisions.

In the Trac web interface any reference to other Trac "objects" like svn changesets, tickets or wiki entries are automatically represented as hyperlinks. In order to link a svn changeset to its related tickets, all we need to do is to mention "#<tktnum>" in the commit message for this change. In order to automate the reverse linkage from the ticket to the changesets which refer to it, Trac provides an optional script which can be installed as a svn pre-commit hook to post a message to the ticket log with the changeset ID.

There are two scripts which can be downloaded from http://trac.edgewall.org/browser/branches/0.10-stable/contrib/ or whatever version of Trac is being used. The first one is trac-pre-commit-hook which can be used to automatically enforce the policy that each svn changeset message must contain a reference to an open Track ticket number in the form of expressions like ”closes #5”, ”fixes #34”, ”addresses #456” or ”references #4”.

The second one is trace-post-commit-hook, which will post messages to the corresponding tickets referenced this way in the commit message after the commit has been successful and potentially changes the state of the ticket in the case of "closes" or "fixes" references.

In order to install one or both of these hooks we need to create the following /data/sdi07/svn/hooks/pre-commit script:
#!/bin/sh

REPOS="$1"
TXN="$2"
TRAC_ENV="/data/sdi07/trac"
LOG=‘/usr/bin/svnlook log -t "$TXN" "$REPOS"‘
/usr/bin/python /data/sdi07/svn/hooks/trac-pre-commit-hook \
"$TRAC_ENV" "$LOG" || exit 1
exit 0
and add the following to the already existing /data/sdi07/svn/hooks/post-commit script:
/usr/bin/python /data/sdi07/svn/hooks/trac-post-commit-hook \
-p /data/sdi07/trac/ -r "$REV" -s "sdi.kugelfish.com/sdi07/trac/"

Monday, July 13, 2009

Virtual Mail

We are about to move to Europe for a few years and during that time, we need to be able to maintain a virtual presence here in the US, to make going back and forth more easy, to stay in touch with friends and family and to take advantage of the greatest consumer paradise on earth, where everything is available for a buck or two.

An important part of maintaining a presence is to have a US mailing address - e.g. this required to have a US credit card and to have stuff shipped to you. For certain things a PO box would do, for others you need a real street address - e.g. to accept packages.

Traditional mail forwarding places have existed for a long time, basically private mailbox operators who also provide the service to periodically mail the content of the mailbox to a forwarding address.

Shipping overseas is really expensive and the idea of getting a box of junk-mail sent to us every months or so does not seem too appealing. Also if something is really important, it might also be urgent and can't wait until the next delivery.

Fortunately there are also alternative which integrate paper mail more closely with e-mail. Most of the time, I don't need the physical piece of paper which is being sent in the envelope, just to know what is in it or to know as soon as possible that it arrived is often good enough.

There are a few services for expats and other nomadic users without a steady mailing address which offer online remote mail management. This typically involves scanning all incoming envelopes and sending email notifications when something new arrived. The user can the log into a web application - their virtual PO box and decide what to do with the content: ship it to another forwarding address, shred it and throw it away or in some cases, have it opened and the contents scanned and made available through the virtual PO box.

We ended up choosing Earth Class Mail because of their emphasis on scan & discard vs shipping. Yes, it is a bit scary to have random strangers open up your mail and it is a bit hard to be trusting a startup company in this current financial climate that they will not certainly disappear of do some very shady things out of desperation.

Fortunately all my financial accounts support online interfaces as well and most of them are coming around to let users opt out of any paper statements and transaction records - green is in right now! Unfortunately at least one of them insists in spamming me with silly offers, which I hope is something I can get them to stop so that it might be easier to recognize the envelope with the new credit card or the new PIN code and selectively forward it, without having to open them.

And for a lot of other things, my life just isn't interesting enough to provide the workers in the mail processing facility with a lot of entertainment and for the rest I hope they would be too busy to engage in large scale identity theft.

After setting up the account, we had to send in quite a bit of paperwork - certified copies of passports and other IDs for the US postal service to deliver mail to a third party mail handling agent. After all this had been received and processed, we now have both a street address and a PO box address set up to choose from. PO boxes tend to draw less spam mail, but street addresses are necessary for all those situations where a PO box address is not accepted.

I sent a test letter to both addresses and was quite surprised how quickly they envelope pictures appeared online. After I requested open & scan, the images of the content appeared online a day or so later. Despite being located in the Pacific northeast, they must have local processing on the east-coast as well, since the letter mailed to an east-coast address appeared online a day or so ahead of the one sent to the west-coast address - and only about two days after I had mailed in New York City. I was quite impressed with the speed.

We have not switched over any of our important mailings to this new virtual address(es), so I can't really say yet how well it works. I am a bit worried that once the address leaks out onto the commercial mailing lists, we will get so much spam mail to eat up our monthly envelope scanning quota just for that. I guess we'll just have to become more aggressive about fighting spam.

Saturday, July 11, 2009

Manhattanhenge II

Today was the second occurrence of Manhattanhenge this year - both equally before and after the summer solstice (June 21). I didn't take any pictures this time, largely because it was cloudy and rainy tonight.

Wednesday, July 8, 2009

Component Architecture, Android Style

Android has a nifty component framework, where each screen - called "activities" should be self-contain can be called up by anybody through an event distribution mechanism called "intents". Activities can also register to handle arbitrary intents, which allows applications to delete certain functionality without even knowing which app is going to provide this functionality. Certain intents are pre-defined - e.g. the NoiseAlert app uses the ACTION_CALL system intent to delegate the making of a phone call to the dialer app typically, or whoever can handle it. Anybody can define new intents, but to pull off useful delegation of functionality between completely unrelated apps is usually a bit harder to pull off.

But here is an example of that. A while ago, a user of the BistroMath tip calculator was asking whether I could add the functionality of recording and tracking dining expenses over time. I can see this would be a very useful functionality. But I also think that mobile apps should follow the Unix philosophy, where each program does one thing well and interconnect with others to provide more complex functionality.

Basically I didn't want to build expense tracking into BistroMath, but wouldn't mind interfacing with a specialized application through the intent framework. So I was checking out expense tracking apps on the market (there are quite a few) and emailed the authors of a few which I liked and thought would work reasonably well for what the user had requested.

Within 24h the author of Funky Expenses had replied with a proposed intent interface and a new version of his app which implemented it. I just added a little code to BistroMath to trigger it and there we have a tip calculator with expense tracking capabilities.

For anybody who wants to support the same intent in their expense tracking application or to support calling it from tip calculators or other financial apps, here is an example of the caller interface:
Intent launchIntent = new Intent();
launchIntent.setAction("com.funkyandroid.action.NEW_TRANSACTION");
launchIntent.putExtra("com.funkyandroid.DATE", System.currentTimeMillis());
launchIntent.putExtra("com.funkyandroid.PAYEE", "Per Se");
launchIntent.putExtra("com.funkyandroid.CATEGORY", "dining");
launchIntent.putExtra("com.funkyandroid.AMOUNT", "1532.42");
try {
startActivity(launchIntent);
} catch (ActivityNotFoundException e) {
Toast.makeText(this, "No application found to handle expense reporting functionality", Toast.LENGTH_LONG).show();
}
Any of the extra attributes can be omitted and should then appear blank or as default values in the input mask of the event tracking application which is called up. The intent is also documented at the OpenIntents intent registry.

Tuesday, July 7, 2009

Androlib

As the number of apps on the Android Market is growing, finding interesting apps is increasingly becoming harder. Developers have a very limited opportunity to promote their apps - a name, an icon and 325 characters are all there is to tell the user what the app is about. No screen-shots, no release notes, no FAQ, no back-channel to respond to user's comments. And worst of all, there is still no web interface which puts whatever little information there is on the market online and available to search engines.

However useful smart phones are supposed to be the speed, power and comfort of a full-size laptop or desktop computer are hard to beat when it comes to searching and browsing large amounts of data. The quality of the built-in search function in the Android market is a bit spotty to say the least - why not let people use real powertools like an Internet search engine to find android apps?

For a while there has been Cyrket to expose the content of the market online - presumably by reverse engineering the market's client-server protocol. Recently, Cyrket has gotten some competition in the form of AndroLib with a slicker, more polished appearance and some extras like the ability to upload screenshots.

Saturday, June 27, 2009

E-mail notifications for SVN & Trac (SDI 07 part VI)

Much of the email I get is actually not from users directly, but generated by some automated system in response to some event. We are all used to getting these kind of emails - if we sign up on a new Internet sit, buy tickets online or somebody throws a sheep at us on Facebook.

Since email is nearly ubiquitous and has many powerful tools for sorting, filtering, archiving or processing messages, it is a natural channel for delivering such automated notifications.

Since we set up an email distribution system in the last 2 episodes of our series on startup software development infrastructure, we can make use of it as well to distribute event notifications from the key systems in our software development workflow, which are the issue tracking and source code version control systems - Trac tickets and svn respectively in our case.

To enable email notification for every change to a trac ticket which concerns us, we only need to edit the notification section in the Trac config file at /data/sdi07/trac/conf/trac.ini based on the build-in documentation on the Trac wiki under TracNotification - basically something like:
[notification]
smtp_enabled = true
smtp_server = localhost
smtp_from = trac@sdi.kugelfish.com
smtp_default_domain = sdi.kugelfish.com
This assumes that there is an email account for each active trac/svn user in the system, but more about that in a later posting.

Since the evolution of the codebase is of great concern to anybody on the team, we would also like to send out a notification message every time a new change is submitted, since any change to the latest state in the repository could potentially affect what anybody is working on.

Subversion has a generic mechanism to trigger customized user actions at particular stages of its operations. If they exist, particular hook scripts located in the server repository hierarchy are executed - e.g. the post-commit hook each time after a new version has been committed.

Since commit notification emails is such a common feature, we can choose from a variety of existing solutions - the simplest one being commit-email.pl from the svn contrib collection of hook scripts. In order to activate the hook, copy the script to /data/sdi07/svn/hooks/commit-email.pl and create a new file /data/sdi07/svn/hooks/post-commit with the following content:
#!/bin/sh
REPOS="$1"
REV="$2"
/data/sdi07/svn/hooks/commit-email.pl ${REPOS} ${REV} --diff n -h localhost -s "[svn commit] " swdev
Both files should again be owned by the apache uid/gid and have execute permission. This very basic notification script will send out an e-mail with a diff for all changes submitted in a new repository revision, to the swdev mailing list which we previously created. More sophisticated scripts are availalbe - .e.g svnnotify which allows many options to customize formatting and distribution of checkin notification emails.

Thursday, June 18, 2009

Archived Mailing Lists (SDI 07 Part V)

In the previous episode of our series on startup software development infrastructure, we have setup the basic email deliver system. The next step is to provide a solution for archived mailing lists.

Mailing lists have long been a backbone of the Internet community and there are mature solutions for managing very large-scale mailing lists. The classics are Listserv and Majordomo where users can manage their own subscriptions by sending emails with embedded commands and more recently Mailman has become very popular because of its web interface for managing mailing list settings for both users and administrators. Mailman even has a built-in web-based email archive.

Since we are targeting our solution for very small teams, we can get away with an even simpler approach. We expect our teams to only have a handful of members and where people joining and leaving are very rare events. In this case, we can simply use the mail alias functionality built into the email server. To maintain the web-based archive, we use MHonArc an email to html converter as a member on each of the lists.

Assuming our main software team maling list will simply be called swdev, we create the following three alias targets in a file /data/sdi07/lists/list-aliases:
swdev-archive: "|/usr/bin/mhonarc -add -outdir /data/sdi07/lists/swdev/ -title ’swdev mailing list archive’"
swdev : swdev-noarc, swdev-archive
swdev-noarc : user1, user2, user3
The first target will archive a copy of each email sent to it using the mhonarch archiver to a particular directory in our project subtree. The second one is the actual archived mailing list which consists of the archiver and the non-archived version of the same list, which contains all the human users.

Creating both the regular and noarc version of each list is not required but it can be convenient for sending social and administrative email, which may not need to be archived.
mkdir /data/sdi07/lists/swdev
chown -R apache:apache /data/sdi07/
postalias /data/sdi07/list-aliases
Prepares the new mailing list to be activated. Postfix will run the delivery command with the user/group ID of the alias file they are defined in. Generating the archive as the apache user, ensures that it can be displayed later on by the webserver which runs under this user ID.

To start delivery, we need to add our new alias file to the alias configuration in /etc/postfix/main.cfg:
alias_map = hash:/etc/aliases, hash:/var/proj/milinglists/list-aliases


And in order to make the mailing list archive visible on the website, we need to add the following section to the apache configuration file /etc/apache2/httpd.conf:
Alias /sdi07/lists/swdev "/data/sdi07/lists/swdev"
<Directory "/data/sdi07/lists/swdev">
AllowOverride None
Options -Indexes
</Directory>
After we send the first message to the ”swdev” mailing list, we can see the archive presented as /sdi07/lists/swdev/maillist.html on the local web server.

Tuesday, June 16, 2009

Local Email System (SDI 07 Part IV)

After setting up Subversion and Trac for managing source code, the next episode in our series on startup software development infrastructure is about email. Email or some other form of archived group communication is essential for a team to remain in sync on any of the important details of the project.

Using email for remote collaboration is about as old as the Internet itself and benefits from well established habits and usage pattern. Most open source projects are using mailing lists as their main or sole channel of communication, which means that most open-source software development tools are well integrated with an email centric work flow.

Systematically conducting all important technical discussions on archived mailing lists can bridge gaps in both space and time. Email can reach team members who are not here right now - traveling or in a remote location as well as future team members who can read up on old discussion threads to figure out why things were done a certain way.

These days when public Internet mail is dominated by spam, viruses and other malware, running an email service has become a major challenge and is best left to professional system administrators. For now, we can punt on the various security issues by running a private "close circuit" mail service only within and for the software development project. Most users today are using email clients which can connect to remote mail services using mailbox access protocols like POP or IMAP and can manage multiple disjoint mailboxes and email service accounts quite easily.

As the main engine or Mail Transfer Agent (MTA) of our private email system, we are using Postfix. Postfix is a modern replacement for sendmail - the grand-daddy of MTAs and is meant to be faster, more secure and more easy to administer than sendmail. It is clearly overkill for what we need, but since simple configurations are simple, we can as well use it and lay the foundtion for growing into a full-fledged mail service later on if needed.

In order to configure the close-circuit mail delivery system, we need to make only a few changes in the default postfix configuration file /etc/postfix/main.cfg:

# Configure local domain info
myhostname = sdi.kugelfish.com
mydomain = kugelfish.com

# Use domain instead of host name as origin
myorigin=$mydomain

# Accept email for those destinations
mydestination=$myhostname, localhost.$mydomain, locahost, $mydomain

# Reject email to all other destinations with an error
relay_transport = error:External delivery disabled
default_transport = error:External delivery disabled
After this configuration changes, we can active the mail delivery service as follows:
/etc/init.d/postfix start
rc-update add postfix default
In order to let users access their email, we need to run the necessary mailbox remote access protocols for which we use Dovecot, which supports both POP and IMAP in one package. All we need to change from the defaults in /etc/dovecot/dovecot.conf is to enable all possible protocol options - POP and IMAP in both plain and SSL versions:
protocols = imap imaps pop3 pop3s
before we can start the service as follows:
/etc/init.d/dovecot start
rc-update add dovecot default

With this configuration we have a basic local mail service for all Unix user accounts which are configured on the development infrastructure server. This is somewhat inconsistent with the services accessed through the http front-end (e.g. svn and trac) which use the htaccess file for user authentication. We will discuss at a later stage how to unify the user account information for all these services.

Saturday, June 6, 2009

Code Review App as a Service

I just noticed that Rietveld, one of the open-source code-review applications dicussed in an earlier post is now availalbe as a hosted virtual private application to any organization which uses the Google Apps service.

Since I use Google Apps for the @kugelfish.com domain, I tried out the new service by adding it to my domain from the link at this page. As for all virtualized apps, the administrator needs to create a new CNAME in the domains DNS records under which the new service will be mapped - e.g. reviews.kugelfish.com.

While there are many good arguments for an organization to use a hosted solution like Google Apps for providing their public email service, I am a bit more skeptical about outsourcing critical development infrastructure. Specially as in this case the SLA explicitly says that there is no SLA, since these are experimental labs applications - a step down even from the usual Google Beta label. But for organizations where code reviews are not yet a fully integrated and critical part of their development workflow, this is an intriguingly quick way to get started and hopefully become hooked on the practice of code reviews.


Saturday, May 30, 2009

Manhattanhenge

Today is a geeky holiday: the particular celestial alignment called Manhattanhenge where the sun sets exactly in line with the Manhattan street grid, which is about 30 degrees off from perfect east-west orientation.

This picture which is taken from 5th avenue, shows the sun setting at the end of 22nd Street. Not exactly at street level and somewhat before the astronomical time of sunset, since the mathematical horizon is obstructed the coastline of New Jersey in the background.

Some more pictures on flickr from 23rd st and 22nd st.

Poor-man's Testing Cluster

One of the biggest problems in embedded software development is how to most effectively test while the final target hardware is either not built yet or is too rare, unwieldy or expensive to give every software developer unlimited and unrestricted access to.

Typical approaches include software simulation of the target on some general purpose computer or to find some suitable stand-in for the final hardware platform.

Simulation by running the target OS on another CPU architecture (e.g. PowerPC target on an Intel based PC) is inaccurate for some of the important differences between CPU architectures like endianness and memory alignment. Simulation including the target processor architecture can be very slow unless the speed difference between the target platform (e.g. a low powered mobile device) and the host platform (e.g. a standard desktop PC or server) is sufficiently large to make the experience usable for developers - e.g. for the Android emulator included in the SDK.

The often simpler and more reliable alternative is to find a platform which is close to the final target or deliberately choose the target CPU complex to be close to some standardized platform. Vendors of embedded CPUs often tend to sell functional evaluation boards with reference designs for their chipsets - but those tend to be expensive because of the small volume of manufacturing.

For our project at Xebeo Communications, we deliberately chose to use an x86 based controller architecture - even though this did make life harder for the hardware and manufacturing/logistics teams, since for various reasons, building a small volume x86 platform is not as easy as with other CPU vendors & architectures which explicitly support the embedded market.

But using a controller which is close to a standard PC allowed us to run a stripped-down FreeBSD kernel as the target OS and preliminarily build and test most of the software on any standard PC, like our linux based desktops. By using an OS abstraction layer, we were able to run an test on most Unix-like operating systems to verify that the application software would be portable to other architectures if need be.

In order to become even more realistic, we started building low-cost eval boards using the cheapest commodity PC parts we cold find from discount online stores like TigerDirect:
  • low-end PC motherboard with matching low-end Intel CPU (celeron or similar)
  • cheapest matching memory configuration as needed
  • cheapest PC power supply
  • CPU heatsink with fan
  • PCI Ethernet card with same chipset as our design
  • compact flash to IDE adapter
  • small compact flash card for network bootloader
  • Rubber feet and screws fitting motherboard mounting holes
  • Serial cable for console access
The resulting systems could be build for under $200 in around 2002 and probably even more cheaply today. The resulting systems can be seen above, stacked on storage rack. These particular systems were used to run the various automated nightly and continuous builds.

At this price, every software developer could have one next to their desktop and network boot it from a tftp & nfs service exported from that desktop. By changing a boot option, the board could be used to run the target emulation mode or a simple workstation mode to run the automated test-suite, thus reducing the CPU load on the workstation.

Building and testing a large code-base is very resource consuming in CPU, memory and disk-I/O bandwith and can bring even relatively powerful workstation to the point of being barely usable for any other things while the build is running. Having a large number of low-cost CPU blades around to run build and tests can be a significant booster of developer productivity.

Friday, May 22, 2009

Submarine Mode

I understand that one of the Google's ulterior motives with Android is to promote a mobile experience where the user is always connected to the Internet and the G1 is pretty much built around that "always on" networking paradigm - including the special flat-rate data plans from T-Mobile.

On the other hand, data services are not universally cheap yet everywhere in the world and it would be nice to give the user more control over the mobile data usage. Both current commercial Android phones (HTC G1/Dream and G2/Magic) have two types of radio for data usage:
  • GPRS/EDGE/3G cellular data connection
  • IEEE 802.11 WiFi wireless LAN interface
Since the Wifi interface is faster and was not exactly invented with power saving mobile devices in mind, it is presumably more power hungry than the cellular interface.

In the implementation on the G1, Android gives precedence to the Wifi connection if enabled and available when the phone is active - i.e. when the screen is on. Once the screen goes off, the wifi interface is shut down after a few seconds until the screen is turned on again. There is an obscure option in the expert wifi settings (Wi-Fi Settings->(menu key) Advanced-> Wi-Fi sleep policy) to change that default behavior.

If background synchronization for gmail, calendar and contacts is enabled, the phone will periodically (about every 5min according to NetMeter) partially wake up and go online on the cellular data network - even if it is sitting in the middle of a well covered Wi-fi network.

While it is possible to administratively turn off the Wifi radio, it is currently NOT possible to turn off the cellular data connection, while leaving the Wifi interface running. The only option is the "airline mode", which disables all radio interfaces - including the ability to make or receive phone calls.

What seems to be missing is a configuration switch to turn off cellular data only, leaving on the phone service, SMS and the Wifi interface. There is an option to disable data when roaming, where the biggest cost might occur - but I would prefer not to be roaming at all and use a local, maybe pre-paid SIM card instead. (My home operator, who gets paid $1.50 per minute in roaming charges on every phone-call, would probably disagree.) So what I do when traveling is to use a prepaid SIM card for phone calls and administratively deprovision the data service capabilities (by calling the operator to have it disabled) and/or by making sure there are no APN settings configured for this operator. This way I can leave background synchronization and Wifi enabled and still receive email updates while in Wifi coverage - even though this synchronization only happens when I turn on the phone's screen.

Hoever sometimes it would be nice to be able to splurge and pay the 50c per kbps for doing a quick search or lookup of something on the Internet, even using a prepaid or otherwise expensive data plan - but without the phone taking advantage of the oportunity to sync all my email or download an new OTA update - completely drainging my prepaid card before I can stop it.

What I would like is an additional setting - a "submarine mode", where the celluar data interface is only used in an extremely minimal and controlled way - like the radio transmitter on submarine, trying not to be located. While in this mode, the use of the cellular data service should be reliably cut off, but phone service and Wifi should continue to work. In addition, I should be allowed to very temporarily bring up the cellular data interface and grant access only to the current foreground application (e.g. using IPtables with UID matching) - ideally with a data usage counter right in the notification bar, so I can see what is going on.

I don't know enough about how cellular data services work to know if something like this would be easily implemented, but I doubt that a feature to allow users to pinch pennies on phone usage would be very high on any operators priority list right now and as long as operators drive the requirements with cellphone manufacturers, it is their priorities which are being implemented.

Saturday, May 9, 2009

Subversion & Trac (SDI 07 Part III)

In this episde of the series on creating a minimal software development infrastructure, we are dealing with the centerpiece of the solution: setting up Subversion as the version control system and Trac to provide a unified and integrated system for bug/issue tracking, collaborative document editing (wiki) and source control repository browsing as well as a platform for further integration and extension. All the necessary packages are already installed on the server as part of the previous episode.

Assuming that our infrastructure server has a payload partition mounted under /data and our fictitious project is called "sdi07", we are setting up the disk space for our project as follows:
mkdir /data/
mkdir /data/sdi07/svn
mkdir /data/sdi07/trac
To properly initialize the databases for svn and trac, we need to run the following commands,
svnadmin create /data/sdi07/svn
trac-admin /data/sdi07/trac initenv
and answer a few basic questions for trac-admin. In particular we need to specify the name of the project as it will appear on the Trac main page and the path to the subversion repository ( /data/sdi07/svn as specified above). Since we are using subversion as the version-control system and sqlite as the back-end storage for trac, we stay with the default choices for all the remaining questions.

After initialization, any configuration choices can be changed by editing /data/sdi07/trac/conf/trac.ini or by running trac-admin/data/sdi07/trac/ with one of the supported commands.

Both Subversion and Trac have their own specific server implementations, but both also support access through an Apache web server. We choose to go the Apache way to unify front-end setup and user account management.

Since all operations originated from the Apache front-end are executed under the permissions of the low-privilege apache/apache Unix user, we sign over the entire project space to that user first:
chown -R apache:apache /data/sdi07
Since we have installed subversion with the apache2 use flag, the necessary modules and config files to support subversion access through Apache have been installed. In order to map subversion access under the URL http://localhost/sdi07/svn/, we add/modify the following to /etc/apache2/modules.d/47_mod_dav_svn.conf:
<Location /sdi07/svn>
DAV svn
SVNPath /data/sdi07/svn
</Location>
The simplest way to configure Trac within the Apache configuration is by using mod python. To map the Trac instance we just configured under the URL http://localhost/sdi07/trac/, add the following section into the module specific conditional configuration in the file /etc/apache2/modules.d/16_mod_python.conf:
<Location /sdi07/trac>
SetHandler mod_python
PythonHandler trac.web.modpython_frontend
PythonOption TracEnv /data/sdi07/trac
PythonOption TracUriRoot /sdi07/trac
</Location>
Both Subversion and Trac require a user identity - at least for any write operations. If the http accesses are authenticated, the user identify will be passed by Apache to the corresponding Subversion or Trac backends.

Among the many options for user authentication with Apache, the simplest to set up is to use basic authentication with a local htpasswd file. In order to require all access to our entire project webspace under http://localhost/sdi07/, we add the following section to /etc/apache2/httpd.conf:
<Location /sdi07>
AuthType Basic
AuthName "SDI07"
AuthUserFile /data/sdi07/htpasswd
Require valid-user
</Loction>
In order to create access for a new user, execute the following command:
htpasswd2 /data/sdi07/htpasswd <username>
After the empty file has been created apache:apache file ownership permissions. While this approach to user account management is very simple, it is admittedly not very flexible. We will discuss some alternative approaches to user account management later on.

In order to finalize the configuration and start up the Apache web front-end, we need to add activate the required optional Apache modules in /etc/conf.d/apache2:
APACHE2_OPTS="$APACHE2_OPTS -D PYTHON -D SVN -D DAV -D DAV_FS"

and then finally try to start the newly configured front-end with
/etc/init.d/apache2 configtest
/etc/init.d/apache2 start
while monitoring /var/log/messages and /var/log/apache2/error_log for
any errors. Once any potential configuration issues are fixed and Apache is starting up properly, we can add it to the default runtime configuration with
rc-update add apache2 default
After Apache is running properly, we should see a default wiki homepage for our project at http://localhost/sdi07/trac and we should be able to create the basic recommended source-tree layout for a new Subversion project as follows:
svn checkout http://localhost/sdi07/svn my_workspace
cd my_workspace
svn mkdir trunk
svn mkdir branches
svn mkdir tags
svn commit -m "create initial directory structure"
Typical causes for errors at this stage, might be file ownership - i.e. not all necessary files have access permissions for the apache user under which the Apache web-server is running or some typo in any of the configuration files.

Since this setup is based on the state of the Gentoo world of 2007 (from Sabayon Linux 3.3b live min-CD), some details certainly have changed with newer version and need to be adjusted. The versions of the key packages used here in particular are as follows:
dev-util/subversion-1.4.3-r1
dev-python/mod_python-3.3.1
net-www/apache-2.2.4-r1
www-apps/trac-0.10.4

Wednesday, May 6, 2009

Platform Setup (SDI 07 Part II)

In the last episode of this series, we have decided to use Gentoo Linux on a skimpy tabletop server as the platform for the software development infrastructure for our fictitious new project.

True to the hardcore image of Gentoo, the installation process for the Gentoo bootstrap binary distribution is a bit spartan (as of 2007). Fortunately Sabayon Linux provides a Gentoo derived distribution with a live CD to check out hardware compatibility and a simple installation process targeted at desktop end-users. Ok, I desktop distribution is probably not optimal for a server, but I needed to get the base system up and running as painlessly as possible (which it did).

Once the minimal base system is configured to connect to the local network and is ready for remote login by an admin user, we can start with the setup of the service infrastructure. Based on the list of the services we want to set up, this is the shopping list of additional packages, which we need to download from the portage repository and build locally:

echo "www-apps/trac cgi fastcgi sqlite enscript silvercity" >> /etc/portage/package.use
echo "dev-util/subversion apache2" >> /etc/portage/package.use
echo "net-mail/dovecot pop3d" >> /etc/portage/package.use
emerge -v apache2
emerge -v subversion
emerge -v trac
emerge -v mod_python
emerge --unmerge ssmtp
emerge -v postfix
emerge -v mhonarc
emerge -v dovecot


Before we start configuring any of the services, here are a few deliberate assumptions and choices we have made for this setup:
  • This server will run all the services required to support collaborative software development, but not any general purpose IT functions which are needed for any kind of team or work environment (networking, email, file & print sharing, Internet access, account administration, etc.)
  • This server will run behind a firewall on a gated, private network. The security on the server is geared towards keeping visitors from casually snooping around or anybody accidentally destroying something rather than keeping out any seriously determined cyber-criminals.
  • Thanks to the cyber-criminals mentioned above and other assorted scum hanging out on the Internet, running a public e-mail server has become an unpleasant hassle which might not be worth doing for a small organization. Instead, using a hosted mail service might prove very attractive. Instead of trying to route email traffic from and to the software development server, we could as well run a completely isolated email system on that server. Member of the software team would have to use two distinct email accounts to communicate within the project team and externally. Most modern email clients can easily manage multiple accounts accessed remotely over protocols like POP or IMAP. If there is already e-mail service provided on the local network, we can easily relay all messages there instead.
  • All the services which are part of the software development infrastructure require a single set of consistent set of accounts to uniquely identify each user. Less so to prevent unauthorized access than to track all user interactions which are part of the projects evolution and audit trail (checkins, ticket updates, changes to wiki pages, email messages sent to the project mailing lists, etc.). We need a simple way to provide unified account management and probably do that separately from whatever is done for the general IT infrastructure.

Tuesday, May 5, 2009

Cupcake is out of the Oven

The new version of the Android platform - 1.5 "Cupcake" - is now being shipped with the new HTC Magic phone from Vodafone and is also already available for some versions of the HTC Dream/G1. Since an OS update in the field is aways a scary business - T-Mobile is likely going to take it slow to upgrade all of the reportedly over a million sold G1 phones.

Cupcake seems to be a relatively minor major release - a few significant new features (on-screen keyboard, video), some UI face lift and some improvement behind the scenes (battery life, performance).

For my own use, there are two features which have made the upgrade to Cupcake a big deal for me.

The touch-screen virtual keyboard is the big one. I have never been a big fan of the G1's hybrid touch-screen plus keyboard design and the virtual keyboard is more than good enough for me. In fact it is a lot better, since I for small text input, work flow from touch-screen navigation to text entry and back is a lot quicker and smoother than before. Since the upgrade I have not used the physical keyboard any more and would be more than happy to loose it...

My second most favorite new features is the support for bulk operations in the gmail app. Like in the online interface, there is now a row of check boxes in the message list and if you start checking them, a set of operations like bulk-delete or bulk-archive becomes available. I do get a lot of email on my account and this kind of rapid triage is important for me.

I don't particularly care about being able to record video on my cellphone or have home-screen widgets, so many of the other features are lost on me. Since I don't have two phones to compare side by side, I am not even sure anymore what has really changed.

Battery life is probably better, but still somewhat of an issue, but since I have wifi on all day and use sync to a busy gmail account, I can't really complain. I get through a day on one charge, which is pathetic for a regular feature phone, but not bad for a portable computer.

Most of the apps I use are still working - including the ones I wrote myself. I had to do a small update for BistroMath to fix some issue with how the keyboard and landscape mode was detected (with Cupcake, the keyboard is always there...). I was mildly surprised, that NetMeter still works without a problem, since it uses non-standard APIs by going directly to the /proc filesystem of the underlying linux kernel for much of the information.

Sunday, May 3, 2009

Choosing the Operating System (SDI 07 part I)

For the first part of the discussion on how to set up a minimal software development infrastructure for a startup project, using only open-source software, we are looking at the lowest layer in the technology stack - hardware and operating system.

The first obvious reason for choosing an operating system for this development support server would be familiarity. If there is a particular OS or distribution the administrator is most familiar and comfortable with - this should probably be the most significant argument for choosing it.

At the time of this experiment, I did not have any recent experience with any particular OS for the last few years, so the choice would be based on what I could most likely set up most easily without much of a learning curve and where I could get help most easily when I run into problems.

The most obvious choice for an open-source operating system at this point is Linux, which runs pretty much on anything with a CPU - including almost any commodity PC hardware. For server platforms, my other preferred open-source operating system has typically been FreeBSD - which doesn't try to be anything else but a rock-solid server platform - but is a lot more picky when it comes to hardware and software support.

Even though not a very typical server hardware platform, the machine used for this experiment was going to be my mini tabletop server from AOpen. Linux would probably be my best bet to install and run on this type of hardware without too much trouble.

After choosing Linux, the next question is which distribution?

Assuming that an ambitious software project might have a development life-cycle in the order of 12 to 36 months, which is a very long time in the life of typical Linux distribution. We would like to assume that key systems like version control etc. could be set up at the beginning of the project and would not need to be touched or upgraded again during the most crucial initial development phase. If we need to do any upgrades down the line, we most likely would want these upgrades to be as minimal as possible. From past experience (admittedly with RPM mostly), the package management of most Linux distributions breaks down when trying to do point upgrades on a several year old system, which has not been kept up to date - sometimes just because packages are not archived for that long on a distributions website.

All of the major Linux distributions use some form of package management system for installing and upgrading optional software packages and for keeping track of the dependencies between packages. The most popular package management formats are RPM and DEB which are both based on distributing and installing binary packages. The odd one out among the top Linux distributions is Gentoo Linux, whose package management system, portage, is based on locally compiled source packages.

I am intrigued by the Gentoo portage package management system not for the usually claimed benefits like greater speed or better optimization, but by its potential to reduce non-essential dependencies. Dependencies often considered the root of evil in software package management...

Most open-source software packages themselves are extremely portable. They often not only build and compile from source on any Linux distribution but also most other Unix and Unix like systems sometimes even including MS Windows. One of the secrets behind this flexibility is for example the GNU autotools, which allows a package to probe and discover the existing system configuration and to configure its build to account for its current environment.

While most open-source software packages may have essential dependencies without which they cannot work, there are many optional dependencies which may be disabled if not needed. Once a package is built for a particular environment, much of that environment becomes a accidental or spurious dependencies for the resulting binary, which needs to be satisfied if we distribute a binary package.

This is just a hunch, but I would think that a source based package manager like portage should be able to get away with a lot less dependencies among packages than even the best ones based on binary packages. A non scientific sample comparison for Subversion between the gentoo-portage repository and the most popular Debian package system, seem to support that intuition:
the portage package has 4-5 mandatory direct dependencies and a few optional ones, which can be enabled or disabled during build, while the debian package is broken up into three different ones (subversion, libsvn1 and subversion-tools) with a dozen or more direct dependencies, not including some of the optional features from the portage package.

To further test this hypothesis, I have installed and upgraded a few packages on my now roughly two year old, out of date Gentoo Linux system without much of a problem. None of the typical problems like packages no longer available, incompatible with the system or causing a cascade of upgrades which might break other existing packages unless they are upgraded as well.

On the other hand, since I did not do the control experiment with a leading binary distribution to compare, who knows if it might have worked out here as easily.

Wednesday, April 15, 2009

Essential Startup Software Development Infrastructure - 2007 Edition

A while back, I had done some research into what I would do at this point to set up again the basic software development infrastructure for a new startup project - with hindsight and the state of open-source development tools of ca 2007. Again the goal of the experiment is to spend little or no money, use only free and open-source software and end up with a solution which could be set up from scratch in only a few days.

Here are the basic choices of the 2007 edition software development infrastructure (to be elaborated in future posts):
  • The entire development support infrastructure should again be able to run on a single machine. For the experiment, I used my home server running Sabayon Linux a more user friendly version of the Gentoo source based Linux distribution. We assume that this machine is behind a firewall and cannot be accessed from the outside.
  • As the version control system, I chose Subversion (svn). Svn is mature, stable, well supported and largely accepted to be the natural replacement for CVS as the dominant open-source SCM. This is a deliberately conservative choice - since there is nothing as important and critical in the development support infrastructure as the SCM. Much of the ongoing innovation is in distributed SCM, but since I am most familiar with the centralized model, I know that svn will work and I assume there is no time for trialing and evaluation - so svn it is.
  • The overall frame of the infrastructure is provided by Trac. It wraps nicely around svn and provides a few important and well-integrated services out of the box, with little or no configuration:
    • Web based browser for Subversion repository, including change-sets.
    • Simple Wiki for organizing links and documentation
    • Bug/issue tracking ticketing system
    • Automated wiki style linkage between all subsystems of Trac referencing wiki pages, tickets and svn change-set numbers anywhere in pages, tickets or svn checkin comments.
    • Event time-line covering updates to wiki pages, issue tickets or svn checkins.
  • For the e-mail subsystem, we use the Postfix server, which is generally accepted as a more secure and administrator friendly replacement for the old sendmail. To implement a complete "closed-circuit" mail service, we use the Dovecot POP & IMAP server to provide access to the mail stored on the server to any e-mail client supporting these protocols.
  • To support mailing lists, we can simply use mail aliases provided by Postfix - in combination with MHonarc for managing the web archives.
  • From the original list, I am deliberately ignoring build and test automation. Not because it is not important, but because it more than others depends on the particularities of the project, may require some more significant setup and most from experience at least its own dedicated machine, if not a whole cluster of them. If possible, we would like a solution, which integrates nicely with Trac - e.g. Bitten.
For the setup and configuration, I am trying to keep things as simple as possible. Use defaults wherever possible and maybe even choose tools, just because they seem simpler to set up and configure. Most likely we can assume that the initial setup would be done by somebody who is a member of the development team, with limited system administration experience like myself and that there might be no professional system administrator around to help.

Sunday, April 5, 2009

Branching is easy, Merging is hard

Merging concurrent or overlapping changes to the same piece of source-code is one of the basic operations to support collaborative development and is supported in some form by most modern version control systems.

Merging is required if changes from two different branches have to be reconciled - these could be branches created explicitly or implicitly by two users editing the same file concurrently as in the example below.

In the example below, Alice and Bob are both making changes to the same file. After Alice has submitted a new version, Bob needs to merge the changes into his client view before submitting a new version.

Typically a merge is required if a branch is being closed, but the current head version (R43 in the example) on the parent branch where it had branched off from earlier is no longer identical to the version from which it was branched off (R42) - i.e. changes have happened in parallel on both the parent and its derived branch. In such cases a 3-way merge operation can be used to reconcile two version of a file, given a common ancestor.

A 3-way merge operation could most easily be emulated as taking a file diff between the common ancestor and each of the version under consideration and then "add" both those diffs to the common ancestor. If the changes are sufficiently non-overlapping, then the merging might be done automatically, otherwise a merge conflict occurs which needs to be reconciled manually maybe by creating a new version which combines the joint intention of the other two.

But even if a merge seems to present on conflict a the level of the 3-way merge algorithm, it could still be semantically wrong. E.g. if one of the changes renames a function and all its invocations and the other change adds a new reference to the old name, then even though these change may not represent a merge conflict, the resulting code would still be wrong. After an automated merge, thorough review and testing of the resulting code is quite essential.

I am typically not a big fan of overly visual or graphical development tools, one exception being support for merge. I find it hard to follow, what is going on during a merge and seeing the different versions lined up side by side, with the differences highlighted is clearly helpful.

My ideal tool should show the three input version side by side: common ancestor/base, as well as the two contributors being merged. The tool should automatically propose a merged solution if there is no conflict and should allow to easily navigate through the section of the file which have changed. The resulting version should be shown in a fourth window and should be editable on the fly to fix and resolve merge conflicts or errors introduced by the automated merge.

Surprisingly, there are only a small number of open-source tools available which support a graphical 3-way merge, on Linux in particular. As far as I know the list is about:
My favorite one, which I use every day is kdiff3. Since it is a bit heavy and slow to start, I tend to rely on the auto-merge algorithm built into the version control tool unless it detects a conflict, in which case a script hands over the merging of only the conflicting files to kdiff3.

Tuesday, March 31, 2009

Startups: Technology Execution Play

At the opposite end of the spectrum from the concept and Zeitgeist heavy startups of the web age, is the kind of startup without neither any particularly great new idea, nor a secret new technology, simply doing something which is really hard to do and only very few people would know how.

During the late 1990ies, the Internet had been growing by leaps and bounds, requiring a doubling in capacity every couple of months for many types of networks and networking gear was constantly running out of steam and needed to be upgraded with the next generation of higher capacity equipment. So just building the next bigger, better gear sounded like a reasonable thing to do, except that it was easier said than done - specially at a breakneck pace of 18-24 months development cycles, barely ahead of Moore's law. Until recently, building telecom and networking equipment had been a relatively specialized niche craft, practiced mostly in the R&D labs of a small number of companies, selling to a boring, utility-like industry. With a small pool of people who would know how to build something like that, those crazy enough to try had a pretty good chance to succeed - if they could pull it off and execute. Many would succeed quite well (in terms of return on investment) without even having to build a real business to sell their product - they would be acquired by some established equipment manufacturer who desperately needed something like this, but whose internal R&D was years behind schedule - partly because, their most experienced staff had run off to start companies - often getting bought back by the companies they had left.

This was the climate in which we started Xebeo Communications, to build the next generation packet switch for carrier networks. We had no experience or track-record in business, other than having worked on the development of similar systems before. Nevertheless we raised some double-digit millions in venture capital funding on our technology expertise alone (Ok, those were crazy times and people got a lot more money to go sell dogfood on the Internet...).

The idea was simple, but the execution required a large and highly skilled team with a broad range of expertise: VLSI chip design, electro-optical componentry, hardware systems and circuit design, high-speed signals and thermal flow simulations, mechanical engineering, embedded high-availability software and development of specialized communications protocol software. And all this had to be put together into a working system in less time than any reasonable estimate, while pushing the technology close to the edge of what is possible at the time. For example, the contract manufacturer asked to keep one of the circuit boards for display in their lobby - it had been the most complicated one they had ever built thus far...

It was basically build it and they will come - we actually DID build it, but they never came. The bottom had fallen out underneath the tech market in ca 2001 leaving tons of unused equipment around at fire sale prices. Nobody needed to double any capacity anymore for quite some time. The company was acquired for cents on the dollar and after some some time of trying to find a niche for it, the project was eventually canceled - a white elephant from a bygone area whose time had never really come. [They had ultimately failed to see or take advantage of the real value of what they had aquired - not the product, already obsolete by then - but the team who could build it.]

Sunday, March 29, 2009

On the Value of Tools

Maybe to a fault, I tend to think that tools play a big role in the success of a software development projects. The benefits can largely be summarized under the following categories:
  • leverage or force multiplication
  • positive reinforcement or behavioral modification
The first one is the primary reason for using tools ever since early hominids started to pick up rocks or sticks and using them as tools. They allow us to go beyond the immediate capacity of our hands or our brains. Even though according to hacker folklore, Real Programmers need nothing but
cat > a.out
to write code, but the days of writing programs in raw binary form by flipping switches or by punching cards are over. High-level languages and interactive programming - i.e. using a computer workstation to write, compile and test programs in quick iterations, have brought such a leap in programmer productivity, that without it, we could hardly manage the complexity of some of the software systems we are working on today.

The second one might be more subtle and harder to explain. Software development beyond a certain scale and complexity requires discipline and most likely collaboration. There are some rules, we all know should be followed, but sometimes laziness or expedience is getting the better of us. Good tools should prevent us from us from cheating, reduce temptation to cut corners by make it easier to follow the rules than not to or mercilessly expose us if we do break the rules. For example only part of the reason for having an automated build system is to let everybody know when the build is broken, to avoid wasting time working off a broken baseline, the other part is to shame people who do break the build so that it happens less frequently.

The value of tools which provide leverage and increase our individual productivity is easy to see, the value of tools which encourage us to play by the rules may be equally important but depend on what we value as the right thing to do both as individuals and and as a team.Their effectiveness depends on how well in tune they are with the processes and software development culture of a particular team.

Tuesday, March 17, 2009

Essential Startup Software Development Infrastructure - 2000 Edition

When we started a company in early days of 2000, I spent some time setting up what should become our minimal IT infrastructure and software development environment (That's how I ended up with UID 500...). Since we did not have any money (yet), it had to be free/open-source software, and since we did not have any time for evaluation or in-depth research, we tried to go with what seemed to be the most obvious, conservative or mainstream choice at the time for each piece of the solution.

Initially our entire server infrastructure was based on a single Linux box from Penguin Computing since that was about all we could afford with an empty bank account. In the hope there would soon be more machines to come, it was running a NIS and NFS server for a centralized network wide login, DHCP and DNS (bind) servers for IP network configuration, a http server (apache) as the intranet homepage and SMTP (sendmail), POP and IMAP servers for basic email service. Many of these initial choices were undone again, once we had a real professional Unix sysadmin.

On top of that we built the initial infrastructure to support the software development team. From day one, we wanted the team to work a certain way. E.g. by to put working code in the center of attention. Always move the system in small increments from one working state to a new working state. And only what is integrated into the central repository really exists. Make changing things as easy and risk-free as possible - etc. The common development infrastructure should support this way of working and make it easy to follow these principles.

The key pieces of this initial infrastructure were:
  • Email including archived mailing lists
  • Version control system
  • Document sharing
  • Build and Test automation
  • Issue tracking
Email is probably the most essential tools to support team collaboration, not just for software development. Archived mailing lists provide an instant and effortless audit-trail of any discussion as it unfolds. And emails is also a very convenient way to distribute automated notifications. For our first mailing lists, we used very simply the built-in alias functionality of the mail delivery system itself (sendmail) and MHonArc as the web-based mail archive tool. All the setup is manual, but since we expected the team to change very slowly - reaching about 20 members at the peak.

At the time, the only serious open-source contender for software version control systems was CVS. The version control system is the vault where the crown jewels are kep and it is the most mission-ctritical piece of infrastructure. As soon as we had some money in the bank, we replaced CVS with Perforce, since we were familiar and comfortable with its model of operation (same advantages as CVS but keep meta state on the server, commit atomic sets of changes, etc.). We added a web-based repository browser and notification email support, sending out a mail for each submitted change, with a link to this particular change in the web based repository browser. The source-code repository was meant to be the most openly public part of the infrastructure and nobody should be able to sneak in a change unseen.

Our document sharing system was very simple. Since we already had version control as the central piece of our workflow, we simply used the version control system to stage our entire intranet website. To add or update a document, check in the new version and if necessary hand-edit the html link of some page where it should appear. This sounds crude, but we were all programmers after all and editing some html did not particularly bother us. The website provided easy access to the current version of any document and the version control system backing it provided all the history of necessary.

The build and test automation was essentially home grown (loosely inspired by DejaGnu). At its core was a Python script called runtest, which parsed a hierarchy of test definition files within the source tree and ran any test executable specified there. Test-cases had to generate output containing PASS or FAIL and each occurrence of such a keyword would count as a test-case. For the official automated build, runtest would log its results to a MySQL database, but the same script could also be used interactively by anybody in the team to make sure tests always worked or to troubleshoot breakages. The automated master build itself was simply a scrip which ran in an loop doing a checkout from the source control system and if there was any change, ran a clean build (using a combination of gmake and jam) and execute runtest on the full test-suite. As a framework, this was extremely flexible. Tests could be written in any language as long as they could write PASS or FAIL to the console and exit cleanly at the end. For example we ended up with a very powerful but rather unwieldy network simulation framework written in bash for running high-level integration tests, which could easily be run as part of the runtest suite.

The issue tracking system was not part of the inital setup but followed soon therafter with the conversion from CVS to Perforce. We were using Bugzilla (probably again the only viable free choice at the time) with a set of patches to integrate it closely with Perforce. By automatically enforcing that each checkin into the source control repository had to be linked to a ticket in the issue tracking system. This provided a very rudimentary workflow and scheduling system for keeping track of work items and for linking source changes to the reason why they were being made.

Sunday, March 8, 2009

FIRST robotics competition

I was volunteering today at a robot competition for high-school age kids organized by FIRST, a non-profit to promote interest in science and engineering among high-school students. They organize a series of robotics tournaments, where teams of middle-school or high-school age students have to build a robot in 6 weeks to compete in a particular challenge. The teams work with adult mentors, who are typically real-life engineers or scientists.

I was impressed by the quality of the work the students brought to today's NY regional competition at the Javits convention center. Most of the robots where highly functional and held up well through multiple rounds of competition.

With the disappearance of the industrial middle-class in the US, education has become the single biggest factor in economic success (other than simply being born rich). The service economy consists of at one end of the spectrum of gold-color jobs which typically require advance college degrees and McJobs at the other end, but with very little possibilities to work you way up between the two.

Kids at this age may not fully understand yet, how crucial education has become in their future lifes, but a lack of interest and engagement at this age is very hard to correct later. In the NY area, some of the teams participating in these robotics competitions come from schools with very low graduation rates, but some of the long-time mentors claim that the graduation rates among the members of the robotics teams is significantly (many 10s of percents) higher than the school average. Maybe there is a selection bias - i.e. kids who would participate in such a nerdy activity would have a higher chance or graduating anyway. But maybe getting to interact seriously with people from a technical profession gives some kids an idea that there are ways out of poverty other than aspiring to become a gangster, drug-dealer, rap-star or professional athlete (even if this path is unglamorous and petit-bourgeois...).

But even if there is a small chance that exposing kids to the possibility that a career in technology might be an option for them to consider, then this seems like a pretty good use of our time.

Friday, March 6, 2009

SMS Remote Control for Android Apps

I wanted to add an remote control feature to the NoiseAlert application for Android, where menu options could be triggered remotely by sending an SMS to the phone. SMS messages should be delivered to the application only if it is running and the commands should be executed by the foreground activity.

Instead of registering a BroadcastReceiver globally in the AndroidManifest.xml file, the following object can dynamically register and unregister itself for receiving all SMS during the time it is active. All incoming SMS are passed to the objects onReceive method, encoded as an Intent in slightly obscure ways:
public class SmsRemote  extends BroadcastReceiver {
Boolean mActive = false;
Context mContext;

@Override
public void onReceive(Context context, Intent intent) {
Bundle bundle = intent.getExtras();
if (bundle == null) return;

Object pdus[] = (Object[]) bundle.get("pdus");
for (int n = 0; n < pdus.length; n++) {
SmsMessage message = SmsMessage.createFromPdu((byte[]) pdus[n]);

String msg = message.getDisplayMessageBody();
/* check if text of SMS matches remote control command
* and trigger appropriate action.
*/
}
}

public void register(Context context) {
mContext = context;
if(mActive) return;
IntentFilter smsFilter = new IntentFilter("android.provider.Telephony.SMS_RECEIVED");
context.registerReceiver(this, smsFilter);
mActive = true;
};

public void deregister() {
if (!mActive) return;
mContext.unregisterReceiver(this);
mActive = false;
}
}

The context is provided by the foreground Activity which can also provide a callback to execute the commands which are to triggered by the SMS. Permission to intercept incoming SMS still needs to be requested in the AndroidManifest.xml file:
<uses-permission android:name="android.permission.RECEIVE_SMS" />

Wednesday, March 4, 2009

Source-Code Samples in Blogger

Blogger makes it a bit hard to include properly formated source-code snippets in postings as it does not have a mode for entering raw pre-formatted text, which should not be molested by any of the further processing and rendering.

You can always use the raw HTML edit mode, but then all the all the html and xml-isms have to be escaped before pasting in the code sample. Fortunately there is a convenient online service at formatmysourcecode.blogspot.com which does just that. Here is an example of how the resulting output looks:

main() {
printf("hello, world");
}

Monday, March 2, 2009

The other Benefit of Open-Source

Software development must be one of the fields where the gap between best practices and average practices is the widest. A poll in 2001 showed that only about two thirds of software development teams are using version control and only about one third use some kind of bug tracking system. C'mon people how many high-rise window cleaning crews are working without a safety harness?

Open-source projects with many collaborators distributed throughout the world generally need to adopt solid collaborative development practices and often build themselves the tools to support collaboration at such a large scale.

With the popularity of open-source software, an increasing number of people in the technical community have been exposed to the ways these projects operate and to the tools they use. Today, it is a lot harder to find fresh college grads who would not find it completely naturally to use version control, after all this is how you get the pre-release version of "insert-your-favorite-open-source-project-here". At the same time, they are naturally familiar with the idea of a release and that large, complex software systems don't just come together by themselves.

When I was in college, I don't think the word version control was ever mentioned. We learned to program in obscure and irrelevant languages - which is not necessarily a bad thing, since this helps to build a meta-level understanding of programming languages. I guess it was just assumed that those of us who would choose a career in software development would learn their trade on the job, once we got out into the industry. On the other hand, since not all industrial software projects are necessarily that well run, bad habits are propagated as much and as quickly as good ones. Since successful teams tend to stick together a lot longer than the ones which fail, maybe the bad habits spread even faster.

My first exposure to industrial software development was at a very reputable technology company. The kind of company, where you would expect using the most effective software development practices would be a given. It turns out it wasn't and each project had to figure it out for themselves - not uncommon in large companies. Our project ended up to be a classic death march: overly ambitious schedule, a team of fresh bodies assembled to quickly (hundreds of people towards the end), no particular method to the madness, builds started to take hours, dinners and week-ends at work became routine. Many of us were young and we made up in energy and enthousism what we lacked in experience - the few experienced software developers had either left in disgust, as nobody listened to them or kept a low profile, knowing well enough they couldn't influence much the inevitable course of events.. Yes, we had somehow heard that using version control (the company had even invented a few of them) was aparently useful and yes, documentation too - but we didn't have much of clue on how to put it all together.

When it had all come to and end, some of us refused to accept that this should really be the best way possible to do software development. If our current environment not teach us how to do it better, we had to look elsewhere for inspiration. We couldn't see how other companies were doing things better, but there certainly were a few open-source projects who seemed to be building software systems of comparable scale and complexity a lot more smoothly.

Open-source projects provide a unique insight into some very large and long running software development efforts - some of them like the Linux kernel development have gone on for decades and have produced millions of lines of working code. Most commercial software development projects are a lot smaller than that, but by adoptiong tools and practices which work for mega-projects like that, we can be reasonably certain that they will not run out of steam during the lifetime of our project. Furthermore, open-source can become a shared frame of reference on practical software development issues for professionals accross different organizations - hopefully helping to raise the standards of how software development is practiced throughout the industry.