Sunday, January 22, 2012

GWT - An Experience Report

As noted before, I am not a big fan of JavaScript as a language for complex web application projects. Recently I got the chance to get some first-hand, comparative experience with GWT (Google Web Toollkit) as part of an application re-write/upgrade.

The original system was a web-app built web 1.5 style in Java, on top of the OpenSymphony WebWork framework, combined with an XML based template engine and guice for dependency injection on the server side. On the client side, there was a growing amount of JavaScript code for each page, using the Closure JavaScript compiler and library. The app is reasonably non-trivial, resulting in about 40k client-side Java code after the rewrite.

For a project of this nature and complexity, I am very positively surprised and impressed with GWT. For the base architecture of the new client, we had basically followed some of the best practices advice for large-scale GWT applications from GoogleIO talks in 2009, 2011 or in this document: use MVP to isolate UI code from business logic for testability, use Gin/Guice DI and the event-bus for dependency management, use UIBinder to push as much of HTML & CSS stuff into templates as possible and use the new Activities & Places framework for history management, navigation and the basic layout structure of the application. For the rest, we tried to stay as close to the most naive plain vanilla implementation (e.g. standard GWT-RPC services and view constructed bottoms up from widgets). So far this first-cut implementation has held up well enough without need for re-writes and optimization, which is quit impressive for a first use of a reasonably complex new technology.

What we wanted to get out of a migration to GWT was the ability to use same language, tools and software engineering techniques for both client and server as well as the ability to share as much of the actual code between client and server. The second part turns out to be the much harder one...

For somebody who generally likes working in the Java dev ecosystem, working with GWT is quite pleasant. Much of the tools and techniques carry over effortlessly and at least when using the emulated development mode, the high-level abstractions rarely break down. HTML & CSS are still largely browser-magic, but can at least be largely contained to the leave-nodes of the UI object tree - typically in the form of widgets. Because there is still a lot of missing functionality in native GWT libraries and there is a lot of potential custom JavaScript to be integrated with, the use of the JSNI JavaScript native interface within GWT is likely to be used more often than comparable low-level breakout mechanisms would be needed in more complete and dominant development environments. The probably biggest complaint when developing large-ish applications in GWT is the speed (or rather lack thereof) of the GWT compiler and the somewhat sluggish execution of the emulated development mode. In all fairness, when using development mode, recompilations is often not required to make changes effective, just a reload of the application host-page URL.

The ability to use the same language, tools and software engineering techniques on both client and server is already a huge benefit in a large project, but sharing actual code would be even better. To enable that, GWT attempts to provide support for a large part of the Java standard platform runtime and library, within the limits of a compiled, non JVM framework (e.g. no reflection) or limitations of the browser environment (e.g. no multithreading). Besides not using any of these features in code, it also starts to get tricky when using libraries which are not available in source form or which use themselves features which are not supported in the GWT environment. There are ways of providing custom GWT emulations for JRE classes and custom serializations, the way GWT uses internally for implementing some standard library functionality, but that approach is a bit too low-level for everyday use in application development projects.

The most common use-case for sharing typically centers around using the same set of classes in the client-side model, the RPC interface and the server-side data-model, including interfaces to databases and other backend services. Besides raw data definition, some behavior needed both on client & server should likely also be sharable.

In order for a class to be usable in GWT-RPC it must implement either the standard Java Serializable interface (with some caviats) or the GWT IsSerializable marker interface. One of the major annoyances for people who like immutable data classes, is that there is no serialization for final fields.

Without making use of partial emulation (leaving the contentious functionality unimplemented in the emulated version) some classes which are entangled with some unsupported server side framework (e.g. inherit from or provide serialization/deserialization to a persistence framework) may need to be heavily refactored, to split the  framework dependencies out.

Somewhat unrelated of the technology used, the move towards a single page "thick client" web-app suddenly makes keeping track of per-client session state trivial, without the database and caches necessary in a server-side LAMP web-app, since the browser is a single-threaded, single user environment and the lifecycle of the session state is naturally tied to the lifetime of the client application in the users browser.

The biggest weakness of GWT is that it does not easily scale up from 0. It is a complex and heavy technology which requires a lot of upfront planing and architecture. As many UI frameworks, a reasonably complete minimal "hello world" app would probably be a few hundred lines of setup and boiler-plate. If the job is to attach a few bits of client side customization to an otherwise classic HTML web-page, the GWT is clearly not the right choice.

Based on this experience, I find GWT quite an ideal choice for massive and complex "thick client" apps, specially when backed by a java server. Assuming a reasonably "unsexy" enterprise application development environment with focus on complex functionality and business logic and without need for extreme optimization, extreme customization of exploiting the latest browser tricks, basically anywhere, where people would not consider using C or assembler otherwise...

Wednesday, January 18, 2012

Why Time is hard

At least since the "Y2K problem" entered the public consciousness around the turn of the last century, nobody doubts that correctly representing time in computer systems is somehow hard. While today nobody is hopefully trying to save a few bytes by representing years in 2 digits, the state of time computations in many programming environment is still over-simplistic to say the least.

Having (almost) gotten caught by surprise by last years changes to civil time in Russia, this article is an attempt to understand, what it takes to handle time somewhat correctly for business related computer applications.

There are 2 common uses of time in computer systems:
  1. a monotonically increasing measure representing a global reference clock of some sorts and which can be used to determine an absolute ordering of all events in the system as well as their relative duration.
  2. Representation of civil time as it is used by communities of people living in some particular place to go about their daily lives and converting between multiple such references.
While 1. is an interesting technological problem, 2. is typically the focus, when computers programs are used to solve some practical everyday problem, which is probably the case for the majority of people writing software today.

Modern time measurement is based on the SI second which since 1967 is defined a the duration of 9 192 631 770 periods of the radiation corresponding to the transition between the two hyperfine levels of the ground state of the cesium 133 atom. At sea level, at rest and at a temperature of 0K. Before that the common definitions of time used to be either defined based on astronomical observations.

For the sake of some sanity in a globalized world, politicians at some point in the 19th century agreed on a single coordinated time reference system which is based on an approximation of average solar time at the Greenwich Observatory near London, setting location of Greenwich as the global prime meridian and the Greenwich Mean Time (GMT) as the global reference time.Toady's global time reference is called Coordinated Universal Time (UTC) and is piece-wise identical to the International Atomic Time (TAI), derived from  the average of some 200 atomic clocks world-wide. The difference between TAI and UTC comes from the occasional insertion of a leap-seconds in order to keep UTC in line with UT1, an idealized model of astronomical time at the prime meridian.

Starting from the prime meridian at Greenwich, the world is then partitioned into 24 reference time-zones, each at 1h increment from the previous one, defining a standard local time up to about 30min different from the local solar time. These timezones are typically names "UTC +/- x" or sometimes "GMT +/- x". Most standard timezones or combinations of timezones have more or less obvious names and abbreviations. E.g. EST stands for Eastern Standard Time and refers to UTC - 5,  the timezone used roughly in winter along the US eastern seaboard, but the exact list of applicability is frankly a bit confusing. Also timezone abbreviations are not unique and don't follow a logical pattern. E.g. BST stands for both British Summer Time (UTC + 1) and Bangladesh Standard Time (UTC + 6).

And if that were not yet complicated enough politician proceeded to make a complete hash of things by choosing for a variety of political entities (countries, states, provinces, towns, who knows what...), which timezone it should belong to - sometimes coinciding with the reference timezone this entity was located in, sometimes not. To make things even worse, they proceeded to invent a thing which is called "summer time" in some places "daylight savings time" in others, which is typically done by shifting the local time back and forth by 1h at random times in the spring and fall. This ritual is supposed to have some benefits, but most likely just drives people and livestock mad...

Another problem with daylight savings time is that it breaks the intuitive assumption of time being at some reasonable level continuous and monotonically increasing. However in all those parts of the world which observe some form of DST, this assumption is broken twice a year when the clocks are moved forward resp. backwards at some particular points in time. This means that some specific representations of local time are invalid, as they do not exist, i.e. not correspond to any valid point in time expressed in UTC or any other global reference time. E.g. 2012-03-25T02:30 does not exist as a valid local time in Z├╝rich, as it is being skipped during the wintertime to summertime switch. Similarly 2011-10-30T02:30 in local time for the same region is ambiguous, as it corresponds to 2 different points in time due to the clocks being set back by 1 hour.

And since politicians need to keep busy, they sometimes change any of those rules arbitrarily whenever they feel like, causing frantic activities of software updates and all kinds of malfunctions in software which is dealing with representation of local time.

While obtaining a decent enough approximation of UTC is not a big challenge anymore for most computer systems (e.g. through GPS or NTP), figuring out what time a clock should show on the wall of any arbitrary place in the world is still a hard problem, thanks to our politicians.

In the absence of an authoritative standard of all timezone definitions, each software package which does local time comparisons and computations needs to somehow be changed and updated when any timezone related rule changes anywhere in the world - for some arbitrary definitions of "any"...

It seems that most popular time handling libraries, which are sophisticated enough to handle these issues anywhere near correctly, rely on a group of volunteers, which maintain the open-source tz database and associated tools, now also called the IANA Time Zone Database. It uses a particular definition of timezone (from tz project page):

"Each location in the database represents a national region where all clocks keeping local time have agreed since 1970. Locations are identified by continent or ocean and then by the name of the location, which is typically the largest city within the region. For example, America/New_York represents most of the US eastern time zone; America/Phoenix represents most of Arizona, which uses mountain time without daylight saving time (DST); America/Detroit represents most of Michigan, which uses eastern time but with different DST rules in 1975; and other entries represent smaller regions like Starke County, Indiana, which switched from central to eastern time in 1991 and switched back in 2006."

Judging from the traffic on the mailing list, there seem to be some change somewhere in the world every few months, which we can either choose to ignore or which may require an upgrade of the timezone definition database.