Sunday, January 22, 2012

GWT - An Experience Report

As noted before, I am not a big fan of JavaScript as a language for complex web application projects. Recently I got the chance to get some first-hand, comparative experience with GWT (Google Web Toollkit) as part of an application re-write/upgrade.

The original system was a web-app built web 1.5 style in Java, on top of the OpenSymphony WebWork framework, combined with an XML based template engine and guice for dependency injection on the server side. On the client side, there was a growing amount of JavaScript code for each page, using the Closure JavaScript compiler and library. The app is reasonably non-trivial, resulting in about 40k client-side Java code after the rewrite.

For a project of this nature and complexity, I am very positively surprised and impressed with GWT. For the base architecture of the new client, we had basically followed some of the best practices advice for large-scale GWT applications from GoogleIO talks in 2009, 2011 or in this document: use MVP to isolate UI code from business logic for testability, use Gin/Guice DI and the event-bus for dependency management, use UIBinder to push as much of HTML & CSS stuff into templates as possible and use the new Activities & Places framework for history management, navigation and the basic layout structure of the application. For the rest, we tried to stay as close to the most naive plain vanilla implementation (e.g. standard GWT-RPC services and view constructed bottoms up from widgets). So far this first-cut implementation has held up well enough without need for re-writes and optimization, which is quit impressive for a first use of a reasonably complex new technology.

What we wanted to get out of a migration to GWT was the ability to use same language, tools and software engineering techniques for both client and server as well as the ability to share as much of the actual code between client and server. The second part turns out to be the much harder one...

For somebody who generally likes working in the Java dev ecosystem, working with GWT is quite pleasant. Much of the tools and techniques carry over effortlessly and at least when using the emulated development mode, the high-level abstractions rarely break down. HTML & CSS are still largely browser-magic, but can at least be largely contained to the leave-nodes of the UI object tree - typically in the form of widgets. Because there is still a lot of missing functionality in native GWT libraries and there is a lot of potential custom JavaScript to be integrated with, the use of the JSNI JavaScript native interface within GWT is likely to be used more often than comparable low-level breakout mechanisms would be needed in more complete and dominant development environments. The probably biggest complaint when developing large-ish applications in GWT is the speed (or rather lack thereof) of the GWT compiler and the somewhat sluggish execution of the emulated development mode. In all fairness, when using development mode, recompilations is often not required to make changes effective, just a reload of the application host-page URL.

The ability to use the same language, tools and software engineering techniques on both client and server is already a huge benefit in a large project, but sharing actual code would be even better. To enable that, GWT attempts to provide support for a large part of the Java standard platform runtime and library, within the limits of a compiled, non JVM framework (e.g. no reflection) or limitations of the browser environment (e.g. no multithreading). Besides not using any of these features in code, it also starts to get tricky when using libraries which are not available in source form or which use themselves features which are not supported in the GWT environment. There are ways of providing custom GWT emulations for JRE classes and custom serializations, the way GWT uses internally for implementing some standard library functionality, but that approach is a bit too low-level for everyday use in application development projects.

The most common use-case for sharing typically centers around using the same set of classes in the client-side model, the RPC interface and the server-side data-model, including interfaces to databases and other backend services. Besides raw data definition, some behavior needed both on client & server should likely also be sharable.

In order for a class to be usable in GWT-RPC it must implement either the standard Java Serializable interface (with some caviats) or the GWT IsSerializable marker interface. One of the major annoyances for people who like immutable data classes, is that there is no serialization for final fields.

Without making use of partial emulation (leaving the contentious functionality unimplemented in the emulated version) some classes which are entangled with some unsupported server side framework (e.g. inherit from or provide serialization/deserialization to a persistence framework) may need to be heavily refactored, to split the  framework dependencies out.

Somewhat unrelated of the technology used, the move towards a single page "thick client" web-app suddenly makes keeping track of per-client session state trivial, without the database and caches necessary in a server-side LAMP web-app, since the browser is a single-threaded, single user environment and the lifecycle of the session state is naturally tied to the lifetime of the client application in the users browser.

The biggest weakness of GWT is that it does not easily scale up from 0. It is a complex and heavy technology which requires a lot of upfront planing and architecture. As many UI frameworks, a reasonably complete minimal "hello world" app would probably be a few hundred lines of setup and boiler-plate. If the job is to attach a few bits of client side customization to an otherwise classic HTML web-page, the GWT is clearly not the right choice.

Based on this experience, I find GWT quite an ideal choice for massive and complex "thick client" apps, specially when backed by a java server. Assuming a reasonably "unsexy" enterprise application development environment with focus on complex functionality and business logic and without need for extreme optimization, extreme customization of exploiting the latest browser tricks, basically anywhere, where people would not consider using C or assembler otherwise...

Wednesday, January 18, 2012

Why Time is hard

At least since the "Y2K problem" entered the public consciousness around the turn of the last century, nobody doubts that correctly representing time in computer systems is somehow hard. While today nobody is hopefully trying to save a few bytes by representing years in 2 digits, the state of time computations in many programming environment is still over-simplistic to say the least.

Having (almost) gotten caught by surprise by last years changes to civil time in Russia, this article is an attempt to understand, what it takes to handle time somewhat correctly for business related computer applications.

There are 2 common uses of time in computer systems:
  1. a monotonically increasing measure representing a global reference clock of some sorts and which can be used to determine an absolute ordering of all events in the system as well as their relative duration.
  2. Representation of civil time as it is used by communities of people living in some particular place to go about their daily lives and converting between multiple such references.
While 1. is an interesting technological problem, 2. is typically the focus, when computers programs are used to solve some practical everyday problem, which is probably the case for the majority of people writing software today.

Modern time measurement is based on the SI second which since 1967 is defined a the duration of 9 192 631 770 periods of the radiation corresponding to the transition between the two hyperfine levels of the ground state of the cesium 133 atom. At sea level, at rest and at a temperature of 0K. Before that the common definitions of time used to be either defined based on astronomical observations.

For the sake of some sanity in a globalized world, politicians at some point in the 19th century agreed on a single coordinated time reference system which is based on an approximation of average solar time at the Greenwich Observatory near London, setting location of Greenwich as the global prime meridian and the Greenwich Mean Time (GMT) as the global reference time.Toady's global time reference is called Coordinated Universal Time (UTC) which piece-wise identical to the International Atomic Time (TAI), takes as the average of some 200 atomic clocks world-wide. The difference between TAI and UTC comes from the occasional insertion of a leap-seconds in order to keep UTC in line with UT1, an idealized model of astronomical time at the prime meridian.

Starting from the prime meridian at Greenwich, the world is then partitioned into 24 reference time-zones, each at 1h increment from the previous one, defining a standard local time up to about 30min different from the local solar time. These timezones are typically names "UTC +/- x" or sometimes "GMT +/- x". Most standard timezones or combinations of timezones have more or less obvious names and abbreviations. E.g. EST stands for Eastern Standard Time and refers to UTC - 5,  the timezone used roughly in winter along the US eastern seaboard, but the exact list of applicability is frankly a bit confusing. Also timezone abbreviations are not unique and don't follow a logical pattern. E.g. BST stands for both British Summer Time (UTC + 1) and Bangladesh Standard Time (UTC + 6).

And if that were not yet complicated enough politician proceeded to make a complete hash of things by choosing for a variety of political entities (countries, states, provinces, towns, who knows what...), which timezone it should belong to - sometimes coinciding with the reference timezone this entity was located in, sometimes not. To make things even worse, they proceeded to invent a thing which is called "summer time" in some places "daylight savings time" in others, which is typically done by shifting the local time back and forth by 1h at random times in the spring and fall. This ritual is supposed to have some benefits, but mostly likely just drives people and livestock mad...

Another problem with daylight savings time is that it breaks the intuitive assumption of time being at some reasonable level continuous and monotonically increasing. However in all those parts of the world which observe some form of DST, this assumption is broken twice a year when the clocks are moved forward resp. backwards at some particular points in time. This means that some specific representations of local time are invalid, as they do not exist, i.e. not correspond to any valid point in time expressed in UTC or any other global reference time. E.g. 2012-03-25T02:30 does not exist as a valid local time in Zürich, as it is being skipped during the wintertime to summertime switch. Similarly 2011-10-30T02:30 in local time for the same region is ambiguous, as it corresponds to 2 different points in time due to the clocks being set back by 1 hour.

And since politicians need to keep busy, they sometimes change any of those rules arbitrarily whenever they feel like, causing frantic activities of software updates and all kinds of malfunctions in software which is dealing with representation of local time.

While obtaining a for most purposes decent enough approximation of UTC is not a big challenge any more for most computer systems (e.g. through GPS or NTP), figuring out what time a clock should show on the wall of any arbitrary place in the world is still a hard problem, thanks to our politicians.

In the absence of an authoritative standard of all timezone definitions, each software package which does local time comparisons and computations needs to somehow be changed and updated when any timezone related rule changes anywhere in the world - for some arbitrary definitions of "any"...

Probably a majority of the most popular time handling libraries, which are sophisticated enough to handle these issues anywhere near correctly, rely on a group of volunteers, which maintain the open-source tz database and associated tools, now also called the IANA Time Zone Database. It uses a particular definition of timezone (from tz project page):

"Each location in the database represents a national region where all clocks keeping local time have agreed since 1970. Locations are identified by continent or ocean and then by the name of the location, which is typically the largest city within the region. For example, America/New_York represents most of the US eastern time zone; America/Phoenix represents most of Arizona, which uses mountain time without daylight saving time (DST); America/Detroit represents most of Michigan, which uses eastern time but with different DST rules in 1975; and other entries represent smaller regions like Starke County, Indiana, which switched from central to eastern time in 1991 and switched back in 2006."

Judging from the traffic on the mailing list, there seem to be some change somewhere in the world every few months, which we can either choose to ignore or which may require an upgrade of the timezone definition database.

Friday, December 30, 2011

Android 4.0 (Ice Cream Sandwich) on Galaxy Nexus

I have been very happy with my Nexus One for over two years now, but the news that the Nexus One will not be upgradable to the new 4.0 version of Android put my loyalty somewhat in question. Even though I still prefer the physical look and feel of the Nexus One over the Galaxy Nexus, the larger, crisper display and generally much faster hardware make a switch tempting.

There is nothing really in Android 4.0 which is a must-have for me. The new UI design may be crisper, but no quantum leap in usability. One new feature I like is the tracking and control of network usage by application - a feature, I had been missing since day one.

With its size, the Galaxy Nexus is practically a "phablet" a phone/table hybrid. It is starting to get borderline in size for me to balance on one hand and still reach all the controls on the screen with the opposable thumb. On the other hand, I find myself to do more reading of news articles than before, using some newspaper apps which have started to appear for Android as well (unfortunately, the Economist app does not seem to support 4.0 yet).

Wednesday, February 16, 2011

Android 2.3 - Gingerbread?

Maybe it is a sign of maturity for the Android platform that for the first time, I don't find at least one highly anticipated new feature in a new Android release or new phone. When comparing supposedly new flagship phone Nexus S with Android 2.3 with my current Nexus One with Android 2.2, I actually prefer my Nexus One.

Gingerbread looks much like a maintenance release, providing a lot of internal cleanup and a very subtle UI face-lift. Since I don't play any games, I didn't notice any of the speed improvements supposedly coming from using native hardware graphics acceleration. The only new major feature - support for NFC tags - is of no use to me, since there are no NFC tags around where I live. Comparing the new Nexus S to the Nexus One, there is not as much of a wow factor either. Performance is about comparable, battery life supposedly a bit worse even, display, physical design and overall build quality mostly a matter of preference. There is clearly not as much of a leap as there was from a G1/G2 to the Nexus One. And given the reports about stability problems with the Nexus S, maybe even somewhat of a step backwards.

I has now been a few month since the release of Gingerbread and there has been no OTA for the Nexus One yet. Not that I miss it, given its lack of must-have features, but it reduces the credibility of the Nexus line as the always bleeding edge Android phone.

But maybe the Android team is too busy working on the new and highly-anticipated tablet optimized Android 3.0 Honeycomb, which should be out any day now. After all, tablets is where the new frontier is for Android these days anyway.

Tuesday, November 30, 2010

Mutability Considered (Somewhat) Harmful

We recently had a coffee-room discussion on the futility of trying to introduce new relevant programming languages. Computer scientists seem to invent new programming languages on a daily basis - it's an important contribution to conceptual research in computer science and having at least one language to one's name seems to be important for bragging rights in certain circles.

However for a programming language to become practically relevant is extremely rare. We can make the somewhat flippant educated guess, that since the dawn of commercial computer use in the 1960ies, there have been about 5 major commercially successful languages: FORTRAN, COBOL, C, C++ and Java - roughly about 1-2 per decade.

Certainly, there have been many other vendor and/or domain specific languages over the years or others with a significant popularity, just not enough to make it into the all-time A-list. There are multiple surveys which try to measure which are currently the most popular programming languages, the graph below is an example of one of them.


But if a new significant general-purpose language were to emerge, the one thing it would have to get right is concurrency - maybe by largely avoiding it. In the 15 years since Java was introduced, computers have increasingly evolved towards distributed systems. Today's top super-computers are basically massive clusters of customized PC servers. Even desktop computers have typically multi-core CPUs with complex distributed caches and memory hierarchies, which work increasingly harder at pretending that they are still a von Neumann machine with a single, flat and consistent memory space. Maybe it is time to give up this convenient illusion and find a new model for the more complex and certainly more dynamic reality. E.g. this talk by Rich Hickey might provide some food for thought on how a practically programming model outside the von Neumann box might look like.

Concurrent programming using shared state and mutual exclusion has been around for a long time, but it is tricky, error prone and until recently has been the domain of a small number of programmers working on operating systems, databases and other high-performance computing systems. Also many people may find that even on moderately multi-core machines, lock contention is starting to cause serious performance issues which are hard to diagnose and even harder to fix.

Maybe the most successful way of dealing with concurrency is to avoid it. E.g. immutable value-objects and pure functions are one way to reduce the impact of concurrency. But since most real life computer applications involve dealing with things that evolve, the key might be to come up with a programming model where mutability is pushed into the framework, much how garbage collection moved resource management from being a concern of the application to being (largely) the problem of the programming run-time environment.

There has been a fair amount of recent research in lock-free concurrency control, which avoids shared state in favor of atomic operations and data replication and versioning. There is obviously a cost associated with increased data replication, but maybe in a world of high-N cores and complex distributed memory hierarchies, the trade-off could still be a winning proposition.

And maybe functional programming languages, which have existed outside the mainstream for decades might finally have their day in the sun.

Friday, October 22, 2010

Privacy: the Transatlantic Divide

Despite what the likes of Mark Zuckerberg may say, there are still some people who strongly care about privacy.  This seems to be more so the further east you go from Silicon Valley. And even in the Valley, an increasing number of people are becoming aware of this, even though they may not understand or appreciated the alternate viewpoints.

Whenever I am asked for an opinion on the European "obsession" with privacy or for an explanation on why the Germans seem so incredibly hung-up on privacy, my standard answer goes about as follows:

Yes, there are indeed differences in the approach and attitude to privacy in particular between the US and non Anglo-Saxon continental Europe, but in substance the differences may be smaller than the commonality. And yet we humans seem to be particularly good a picking out small (cultural) differences and get disproportionately stressed-out over over them. In robotics when a humanoid model is close, but just doesn't feel quite right,  this effect is called the uncanny valley. It is also very hard to really understand any alternate viewpoint which derives from a different experience that is not ones own.

For somebody with a US perspective, maybe the following analogy with freedom of speech is worth considering. Practically all major democracies have a strong constitutional commitment to freedom of speech (often called freedom of expression outside the US). And yet when it comes to trading off freedom of speech vs. other fundamental rights, the US typically strikes the balance strongly in favor of freedom of speech, while Europeans typically favor the kinds of human rights which protect the individual from harm. Americans overall feel very strongly about freedom of speech and are willing to make sacrifices in other areas to support it.

Since the current debate about privacy rights on the Internet is also much about the conflict between freedom of expression and the protection of the individual, this difference in tradition and priorities might also explain a good part of the difference in approach and attitude on both sides of the Atlantic. While in Europe privacy and data protection is considered a human right per-se, the US rather sees it as a consumer protection issue, mostly concerned with material damages.

Another explanation for the difference in attitude towards data protection in particular is that many Europeans have a recent memory of where the abuse of information can lead. The use and abuse of information played an important role for both fascist and socialist totalitarian regimes through much of the 20th century in Europe.  As Vaclav Havel describes in the "The Power of the Powerless", the essential source of power for a post-totalitarian system rests in its ability to control information in order to create a collective distortion of reality ("living a lie").

And finally, anybody who wants to better understand the fears of excessive data collection and abuse of personal information, should watch the excellent German film and 2007 winner of the best foreign language academy award -  "The Lives of Others".

Tuesday, August 10, 2010

I @#$%&* JavaScript!

I am by no means an expert web developer or particularly familiar with JavaScript, which might influence my distaste for it. My experience is mostly from making small changes in moderately complex existing applications.

JavaScript was originally intended as a small domain-specific glue-language to ad some client-side behavior to otherwise largely server-side web applications: do some client-side input validation, dynamically modify the page based on user input, etc. But with the growing popularity of AJAX style web-applications, much of the JavaScript client-side code has grown into monstrosities - mostly because of a lack of inherent support for modularity and encapsulation.

One of the most important properties a language environment should support in order to scale to large projects is a way to divide an conquer. There should be a way for one programmer to build upon the work of others without having to understand the implementation details of these building blocks which might be called libraries, modules, packages, interfaces, objects, widgets, components, etc. depending on the on the language. This should also include the ability to debug in the context and level of abstraction in which the code is being written.

For example, if an error occurs as the result of an API call, the error should be reported in terms of the API and the parameters passed through it and not just by a location in somebody elses low-level library code, where supposedly something has gone wrong, most likely because I make a mistake in an API call.

My experience with complex JavaScript libraries and frameworks (in particular the Closure library) is that the benefits one would expect to gain from using high-level libraries and components is greatly reduced, by having to debug and understand so much of the provided low-level code - just because the language system does not provide the necessary isolation. If the inclination is to write everything from scratch, so that at least I understand the code which I have to debug in my application, then the language environment has failed from a scaling point of view.

In this case, the blame not only lies with JavaScript as a language but as much with the ecosystem it lives in. De facto, JavaScript runs in a set of target environment, which are typically the most popular browsers (IE, Firefox, safari, chrome, etc.). The debugging support of these browsers is still severely lacking - even though things have gotten a lot better with things like the firebug extension to Firefox or JavaScript console built into Webkit. Part of the problem also comes from the fact that the typical JavaScript application is never self-contained but interacts with and depends on the DOM representation of a page in the browser and depends heavily on the quirks with which this particular browser renders the page and interprets the CSS attributes. The trickiest part of using a 3rd party JavaScript UI widget is often getting the right kind of CSS definitions loaded in the right order in the page where the JavaScript code is being executed.

Since rich AJAX style web applications are compelling and powerful to use, there will have to be a way out of this mess. JavaScript and its browser based ecosystem will either need to grow the features needed to support large scale software development, a higher-level language abstraction will be layered over it to make it more productive to programmers or a whole new web programming model will be created.

For an example of the second approach, GWT is an interesting step in the right direction. GWT is a compiler based approach, where an AJAX web application is written in Java and then compiled into JavaScript as the target for execution in the browser - relegating JavaScript to a kind of assembly or virtual machine language. The app can be partially debugged naively in Java. However because of browser quirks, much debugging is still required on the browser, where the abstraction breaks down again as it did for high-level languages before the existence of source-level debuggers: write code in a high-level language and debug the generated machine instructions.

Another advantage of this high-level compiled language approach is that the execution engine in the browsers is now only used by code generated by compilers and could be more easily optimized or even replaced by something new altogether simply by close collaboration between whoever builds the compilers and the JavaScript engines in the various leading browser platforms.

However there is another unfortunate issue with the GWT approach: while it can be scaled up pretty well to very complex AJAX web applications, it cannot be scaled down as easily to the simple tasks JavaScript was originally designed to do. This would leave a world where for simple things one would still use hand-written JavaScript and for complex applications a compiled AJAX framework like GWT. With an obvious discontinuity when a once simple application grows into a complex one and unfortunately, this is a pretty common case in real life.

Among the languages I commonly use, Python has the best properties of scalability from a software development point of view. It seems to be easily approachable by novice programmers and is now commonly used to teach a gentle introduction to programming for non-technical users. It is sufficiently high-level and low on framework-overhead to do simple things simply and easily and yet has just enough rigor and structure to scale to some pretty amazingly large-scale projects. It is an embeddable language and while it's standard distribution is quite a bit bigger than JavaScript, it would not be impossible to embed into a browser a library to manipulate the DOM. In fact some efforts to do that seem to exists. While Python was nowhere near as mature and proven in 1995 as it is today, one is left to wonder what would be the state of web programming, if Netscape had chosen to embed Python into its browser instead of JavaScript...