Tuesday, November 30, 2010

Mutability Considered (Somewhat) Harmful

We recently had a coffee-room discussion on the futility of trying to introduce new relevant programming languages. Computer scientists seem to invent new programming languages on a daily basis - it's an important contribution to conceptual research in computer science and having at least one language to one's name seems to be important for bragging rights in certain circles.

However for a programming language to become practically relevant is extremely rare. We can make the somewhat flippant educated guess, that since the dawn of commercial computer use in the 1960ies, there have been about 5 major commercially successful languages: FORTRAN, COBOL, C, C++ and Java - roughly about 1-2 per decade.

Certainly, there have been many other vendor and/or domain specific languages over the years or others with a significant popularity, just not enough to make it into the all-time A-list. There are multiple surveys which try to measure which are currently the most popular programming languages, the graph below is an example of one of them.

But if a new significant general-purpose language were to emerge, the one thing it would have to get right is concurrency - maybe by largely avoiding it. In the 15 years since Java was introduced, computers have increasingly evolved towards distributed systems. Today's top super-computers are basically massive clusters of customized PC servers. Even desktop computers have typically multi-core CPUs with complex distributed caches and memory hierarchies, which work increasingly harder at pretending that they are still a von Neumann machine with a single, flat and consistent memory space. Maybe it is time to give up this convenient illusion and find a new model for the more complex and certainly more dynamic reality. E.g. this talk by Rich Hickey might provide some food for thought on how a practically programming model outside the von Neumann box might look like.

Concurrent programming using shared state and mutual exclusion has been around for a long time, but it is tricky, error prone and until recently has been the domain of a small number of programmers working on operating systems, databases and other high-performance computing systems. Also many people may find that even on moderately multi-core machines, lock contention is starting to cause serious performance issues which are hard to diagnose and even harder to fix.

Maybe the most successful way of dealing with concurrency is to avoid it. E.g. immutable value-objects and pure functions are one way to reduce the impact of concurrency. But since most real life computer applications involve dealing with things that evolve, the key might be to come up with a programming model where mutability is pushed into the framework, much how garbage collection moved resource management from being a concern of the application to being (largely) the problem of the programming run-time environment.

There has been a fair amount of recent research in lock-free concurrency control, which avoids shared state in favor of atomic operations and data replication and versioning. There is obviously a cost associated with increased data replication, but maybe in a world of high-N cores and complex distributed memory hierarchies, the trade-off could still be a winning proposition.

And maybe functional programming languages, which have existed outside the mainstream for decades might finally have their day in the sun.