Wednesday, January 18, 2012

Why Time is hard

At least since the "Y2K problem" entered the public consciousness around the turn of the last century, nobody doubts that correctly representing time in computer systems is somehow hard. While today nobody is hopefully trying to save a few bytes by representing years in 2 digits, the state of time computations in many programming environment is still over-simplistic to say the least.

Having (almost) gotten caught by surprise by last years changes to civil time in Russia, this article is an attempt to understand, what it takes to handle time somewhat correctly for business related computer applications.

There are 2 common uses of time in computer systems:
  1. a monotonically increasing measure representing a global reference clock of some sorts and which can be used to determine an absolute ordering of all events in the system as well as their relative duration.
  2. Representation of civil time as it is used by communities of people living in some particular place to go about their daily lives and converting between multiple such references.
While 1. is an interesting technological problem, 2. is typically the focus, when computers programs are used to solve some practical everyday problem, which is probably the case for the majority of people writing software today.

Modern time measurement is based on the SI second which since 1967 is defined a the duration of 9 192 631 770 periods of the radiation corresponding to the transition between the two hyperfine levels of the ground state of the cesium 133 atom. At sea level, at rest and at a temperature of 0K. Before that the common definitions of time used to be either defined based on astronomical observations.

For the sake of some sanity in a globalized world, politicians at some point in the 19th century agreed on a single coordinated time reference system which is based on an approximation of average solar time at the Greenwich Observatory near London, setting location of Greenwich as the global prime meridian and the Greenwich Mean Time (GMT) as the global reference time.Toady's global time reference is called Coordinated Universal Time (UTC) and is piece-wise identical to the International Atomic Time (TAI), derived from  the average of some 200 atomic clocks world-wide. The difference between TAI and UTC comes from the occasional insertion of a leap-seconds in order to keep UTC in line with UT1, an idealized model of astronomical time at the prime meridian.

Starting from the prime meridian at Greenwich, the world is then partitioned into 24 reference time-zones, each at 1h increment from the previous one, defining a standard local time up to about 30min different from the local solar time. These timezones are typically names "UTC +/- x" or sometimes "GMT +/- x". Most standard timezones or combinations of timezones have more or less obvious names and abbreviations. E.g. EST stands for Eastern Standard Time and refers to UTC - 5,  the timezone used roughly in winter along the US eastern seaboard, but the exact list of applicability is frankly a bit confusing. Also timezone abbreviations are not unique and don't follow a logical pattern. E.g. BST stands for both British Summer Time (UTC + 1) and Bangladesh Standard Time (UTC + 6).

And if that were not yet complicated enough politician proceeded to make a complete hash of things by choosing for a variety of political entities (countries, states, provinces, towns, who knows what...), which timezone it should belong to - sometimes coinciding with the reference timezone this entity was located in, sometimes not. To make things even worse, they proceeded to invent a thing which is called "summer time" in some places "daylight savings time" in others, which is typically done by shifting the local time back and forth by 1h at random times in the spring and fall. This ritual is supposed to have some benefits, but most likely just drives people and livestock mad...

Another problem with daylight savings time is that it breaks the intuitive assumption of time being at some reasonable level continuous and monotonically increasing. However in all those parts of the world which observe some form of DST, this assumption is broken twice a year when the clocks are moved forward resp. backwards at some particular points in time. This means that some specific representations of local time are invalid, as they do not exist, i.e. not correspond to any valid point in time expressed in UTC or any other global reference time. E.g. 2012-03-25T02:30 does not exist as a valid local time in Z├╝rich, as it is being skipped during the wintertime to summertime switch. Similarly 2011-10-30T02:30 in local time for the same region is ambiguous, as it corresponds to 2 different points in time due to the clocks being set back by 1 hour.

And since politicians need to keep busy, they sometimes change any of those rules arbitrarily whenever they feel like, causing frantic activities of software updates and all kinds of malfunctions in software which is dealing with representation of local time.

While obtaining a decent enough approximation of UTC is not a big challenge anymore for most computer systems (e.g. through GPS or NTP), figuring out what time a clock should show on the wall of any arbitrary place in the world is still a hard problem, thanks to our politicians.

In the absence of an authoritative standard of all timezone definitions, each software package which does local time comparisons and computations needs to somehow be changed and updated when any timezone related rule changes anywhere in the world - for some arbitrary definitions of "any"...

It seems that most popular time handling libraries, which are sophisticated enough to handle these issues anywhere near correctly, rely on a group of volunteers, which maintain the open-source tz database and associated tools, now also called the IANA Time Zone Database. It uses a particular definition of timezone (from tz project page):

"Each location in the database represents a national region where all clocks keeping local time have agreed since 1970. Locations are identified by continent or ocean and then by the name of the location, which is typically the largest city within the region. For example, America/New_York represents most of the US eastern time zone; America/Phoenix represents most of Arizona, which uses mountain time without daylight saving time (DST); America/Detroit represents most of Michigan, which uses eastern time but with different DST rules in 1975; and other entries represent smaller regions like Starke County, Indiana, which switched from central to eastern time in 1991 and switched back in 2006."

Judging from the traffic on the mailing list, there seem to be some change somewhere in the world every few months, which we can either choose to ignore or which may require an upgrade of the timezone definition database.