Saturday, May 25, 2013

Raspberry Pi Internet Access Monitor

If your Internet access is down and you are not watching, is it really down? For most people the answer is most likely - who cares!

Since our Internet connection had recently been down a few times, when I did notice it and I was getting curious about how frequently this was happening.

Besides, it might be interesting to get a long term record on the stability and "quality" of our ISP and maybe even compare it with the results from users of competing ISPs.

What does it mean for Internet access to be working? Is it enough to check that the link between our home router and the ISPs access router is up and working (a DOCSIS cable plant in my case). Or should we include end-end application layer scenarios like the ability to get my email or my files from some place "in the cloud"?

But what exactly represents the "the cloud" or "the Internet"? In reality they are massively large distributed systems at a global scale, consisting of millions of components, points of failure and recovery.

For the purpose of these measurements, we need to choose a few relevant and representative  destinations as being the stand-ins for the whole Internet. Since many people seem to think, that Google,  Facebook, Amazon or other top-tier web properties ARE the Internet, we can as well use them as a reasonable proxy for the Internet. Depending on the mix of Internet or "cloud" services we use on a daily basis, it should be easy to come up with a short-list of destinations and services which we particularly care about. We also need to be very conscious of the load, these measurements are putting on the services and whether their owners would likely object. E.g. choosing very popular services which already have very high traffic loads, would help to mitigate the additional impact of the probes. Running these measurements should have less impact on any part of the Internet infrastructure than leaving a few browser tabs with AJAX apps open over night...

Without access to the low-level network and systems monitoring the most basic way to judge Internet connectivity would require to periodically probe whether some destination or service is currently reachable. There are quite a few open-source network and service monitoring tools available, but for our goal of  long-term automated connectivity testing, the most natural choice might be SmokePing, an open-source tool, byt the author or MRTG and RRDtool, very popular among IP network admins.

SokePing gets its name from plotting latency, jitter (variation in latency) and packet loss in a single graph, drawing jitter as a smoky cloud around the line of median round trip time, as in the example below:



A low-cost, low-power Raspberry Pi in headless mode, which can be left in headless mode attached to the Internet gateway, would seem like an ideal platform for such monitoring & measurements. And fortunately, SmokePing already comes pre-packaged with all its dependencies (Perl, Apache etc.) for Raspbian, so installing it is as easy as:

sudo apt-get install smokeping

After that, we can edit the config files in /etc/smokeping/config.d/ to determine what kind of probing to run and how to display the results in the web interface. All the configuration settings are documented here.

The smokePing prober did not start properly in the default configuration on my system, because of a missing reference to sendmail. Removing the sendmail line from /etc/smokeping/config.d/pathnames did resolve the problem.  The Raspberry Pi has more than enough compute power to run the smokePing prober for a small configuration as outlined below. Running the web front-end, which generates the graphs from the RRD timeseries database, can be a bit taxing on one's patience as rendering a each page takes about 10 to 15 seconds at 100% CPU load.

Measurement Setup

The most basic, low-level and least intrusive way to do connectivity probing in IP networks is using the ICMP echo protocol implemented in the kernel as part of the IP stack and by the ping network diagnostic utility. As targets for the ping probes, we choose the front-door address of some major Internet companies: www.google.com, www.facebook.com and www.yahoo.com. All of these are logical addresses, which map to a long list are heavily replicated and geographically dispersed physical machines, assigned by a DNS load-balancer based on availability and proximity. These 3 companies and sites represent a good part of the Internet traffic and are unlikely to go down, specially not all 3 at the same time. Should the pings to all these 3 destinations suddenly start failing, the outage would most likely be our Internet access connection of the access network of our ISP. SmokePing is configured to use 10 probes to each destination every 5 min to collect packet delay and loss information.



In order to locate the actual server to probe, ping relies of the domain name system (DNS), itself a highly replicated and distributed infrastructure at the very core of the Internet. In order to isolate IP connectivity from name service issues, we are setting up a secondary set of probes using DNS queries directly to the physical IP address of the primary name server of our ISP and to the well-know 8.8.8.8 address of the Google public DNS service (itself heavily replicated using BGP anycast routing). Since this is starting to hit more complex services in application space, we are reducing the polling rate to 5 probes each every 15 minutes.



Clearly, low latency, latency variance (jitter) and packet loss rates are an important part of network performance, but do not give the full picture. Ideally it would be nice to also measure the available bandwidth, most representative of perceived network "speed". However doing so requires expensive, heavy load probes, which try to saturate the network path to estimate its capacity limit. Doing so regularly in an automated long-term test would seem a bit frivolous and wasteful.

As as small compromise of qualifying how the network performs for common "cloud" services, we are using a http probe, which downloads a copy of the photo above from a public folder in dropbox, a cloud based file storage service and from the static user content server of the Google+ photo service.


Here is the core of the setup for these experiments in /etc/smokeping/config.d/Probes and /etc/smokeping/config.d/Targets respectively:
*** Probes ***

+ FPing
binary = /usr/bin/fping
step = 300
pings = 10

+EchoPingDNS
binary = /usr/bin/echoping
step = 900
pings = 5

+EchoPingHttp
binary = /usr/bin/echoping
step = 900
pings = 3


*** Targets ***

probe = FPing

menu = Top
title = Raspberry Pi Internet Access Monitor
remark = Latency to a few select sites and services in the Internet.

+ Internet
menu = Internet
title = Internet Access (Ping)

++ Google
title = Google
menu = Google
host = www.google.com

++ FB
title = Facebook
menu = Facebook
host = www.facebook.com

++ Yahoo
title = Yahoo
menu = Yahoo
host = www.yahoo.com

+ DNS
menu = DNS
title = Name Servers

++ gdns
title = Google public DNS
menu = Google public DNS
probe = EchoPingDNS
dns_request = www.google.com
host = 8.8.8.8

++ cablecom
title = Cablecom DNS
menu = Cablecom
probe = EchoPingDNS
dns_request = www.google.com
host = <Your ISPs primary NS IP address>

+ Cloud
menu = Cloud
title = Cloud Services

++ dropbox
title = Dropbox
menu = Dropbox
probe = EchoPingHttp
host = dl.dropboxusercontent.com
port = 80 
url = /u/12770892/benchmark/raspberrypi.jpg

++ gusercontent
title = Google+ Photo
menu = Google 
probe = EchoPingHttp
host = lh4.googleusercontent.com
port = 80
url = /UB5Y5yJKtj51bs2asd8kJGjOxwigev7JPQz3g9tw1C0=w614-h801-no

Monday, May 13, 2013

Back to Broadcast

About 3 years ago, I speculated in this post, that "user curated content" would become the next logical step to the "user generated content" wave unleashed by the interactivity of web 2.0. By and large I have been wrong.

What has happened instead is an accelerating professionalization of online content creation and a return towards the traditional broadcast model with a pronounced split between few creators who produce stuff and the many consumers who consume it.

True, there are still myriads of users engaged in some form of content creation, but increasingly only few creators matter. True the cost of creating and distributing digital content has been lowered to much below a level representing a serious barrier to entry, but more so than ever it takes a serious level of luck, perseverance and highly professionalized marketing to stand out from the crowd.

Maybe it is a sign of maturing for any new medium that a period of frantic and chaotic experimentation is followed by consolidation and professionalization, even though in this new medium there is hardly an inherent, natural monopoly and the only scarce resource is the attention of the audience.

It is quite telling that G+, the most recent and most contemporary of the major social networking platforms is based on an asymmetric relationship model of follower rather than friend and makes a clear distinction between profiles (for users) and pages (for corporate entities and brands).

Some of the most symmetric and egalitarian platforms like Facebook or YouTube, which date from the mid noughties, are now well into a process of retooling themselves into a place where increasingly the masses can in some form follow, subscribe to or endorse a relatively small number of online celebrities.

While the original notion of Facebooks "friend" implies a relatively small number of peer to peer relationships, today many facebook "stars" have millions of "friends", most of whom they would probably not recognize in the street. Many of those are not even people, but brands, which seem hardly capable of feelings such as friendship.

YouTube started out as a video sharing website, where every user could also be an uploader and the key purpose seemed to be sharing and exchanging amateur videos. Today, YouTube more clearly distinguishes between partners, who create content and viewers who consume it.

Yes, it is still possible for ordinary Joes or Janes to be friends with each other on Facebook or for any YouTube user to upload a video, but this is no longer the most relevant pattern of use.

Part of the reason for this change of focus may be that it turned out to be very hard to make money with user generated content. While it may be fun for a while to see a few random people's home videos or read about what they are having for lunch, in the long run, most of us favor for our entertainment some level of professionalism in content and production values. The platform operators also face an increasing pressure to make money and the easiest way to do this, is to cater to well funded entities who want to be your friend and want you to consume their content.

But while it's basically back to broadcast, back to being a fan, follower or viewer, the entities who are winning most of our attention are not necessarily the same old traditional household names.

Some blogs, like the The Huffington Post have become veritable new-media powerhouses or some musicians like Justin Bieber or Psy have managed to leverage their presence on YouTube into an A-list international career. And even at a less high-profile level, many talented musicians, photographers, writers or journalists, endowed with a certain knack for self-promotion, have managed to build for themselves a good career as new-media entrepreneurs or social media personalities.

With the digital media revolution, there is an exciting new world evolving before our eyes, but in some ways, it seems to turn out like the old one quite a bit more than I once had thought...

Thursday, February 28, 2013

The limits of Virtuality

The management edict to ban telecommuting a Yahoo has stirred up quite a controversy. It seems a bit ironic for an Internet company to admit that telecommuting isn't working for them. Kind of like an oil company saying that the industrial revolution was all a big mistake and we should go back to animal & slave power.

But on second thought maybe things are not as black and white. I have spent quite a few years working on systems to make communication and collaboration easier and more frictionless, often collaborating with people 6-12 timezones away. And my social network is equally spread-out across the world. One should think that I should have figured out remote interaction by now.

The reality is that despite high-speed networks, cloud-based collaboration tools and high-quality video-conferencing, remote collaboration still is surprisingly hard and (co-)location still matters. Without any obvious reason, the workers of the post-industrial knowledge economy, seem to physically cluster even stronger than when the natural location of steel, coal, farmland natural harbors or waterways dictated the location of industries: computer engineers are disproportionally in Silicon Valley, musicians in Nashville, advertising agencies, fashion designers or investment bankers in New York or film and TV people in Los Angeles. If creative/knowledge workers could work from anywhere in the world, why don't we?

For one, timezones are a killer. We still tend to be awake during the natural day-time, wherever in the world we are physically located, even if that means that the people at the other end of the world we are trying collaborate are asleep. Trying to find a few hours of overlap may require sacrifices and unnatural behavior, like working during the night.

It is true that networked communication and collaboration tools have become amazingly powerful and effective of the last few decades, specially with the explosive development of the Internet. But even with these sophisticated tools, we can only collaborate well, once we know and trust each other and we know what we want to accomplish together. Establishing a close, trusting relationship remotely is still rather challenging.

Even video-conferences, the most immersive and sensual form of remote interaction, are typically scheduled, limited time interaction that are purposeful and problem oriented. The participants on a video-conference tend to be there with a mission and agenda to solve some stated problem at head and the mode of interaction is often formal if not a bit confrontational. Such video meetings maybe be a good setting to resolve a small, tactical problem, but not for getting to know each other, building trust or coming up with a broad common vision.

Geographically split teams inevitably form cliques, due to the uneven flow of information and the very different intensity of the daily interactions that are possible locally and remotely. It takes a conscious and non-trivial effort to keep "the other side" in the loop on decisions that have been made locally in informal discussions and inevitably we tend to sometimes underestimate the contributions of the other, remote participants. The only way this can be avoided is if no two people in a team a physically collocated  but depending on the nature of the project, this could come at a tremendous loss of efficiency. The sense of under-appreciation is probably the worst for single remote workers attached to an otherwise centralized team.

The irony of modern telecommunication is that while it allows us to work (somewhat haphazardly) across great distances, it has not reduced the need to travel and meet each other in person - maybe it has even increased that need.

While I daily use email, IM video conferencing and all kinds of distributed collaborative tools to work with people across continents, I still have to travel several times per year to meet some of them in person. I don't typically travel to get work done, but to build and maintain personal relationships, exchange ideas and establish a common culture on the basis of which we can then again get some stuff done for a while. For the sake of productivity it also helps to structure and partition projects in away to minimize the need tight coordination across locations and allow teams in each location to work as much as possible as an isolated unit.

At least for some types of innovative and creative projects, the best ideas and inspirations often come from completely unplanned and informal interactions. From chance encounters, hanging out, having a drink and chat about god and the world. It almost seems that the effectiveness of a creative cluster can be measured by its ability to efficiently spread gossip. A campus buzzing with bright and energetic students is one of the reasons why top-class universities still offer a great advantage, even in day where courses could as well be taken online.

The key to better remove collaboration would not be even more effective communication and collaboration tools, but a virtual water-cooler to emulate the daily intra-office gossip. Social networks and their frictionless sharing of seemingly irrelevant information, maybe be a step in this direction. But having our centers of life far apart in different continents, cultures and languages also makes it harder to have a social rapport as we might not have much in common to gossip about.

From my work experience on large, complex projects, I value the flexibility of being theoretically able to work from anywhere, even though I hardly ever do. As well as working for an organization which measures performance based on results and not based on physical presence behind a desk from 9-5. But I also feel that I benefit a lot from the daily opportunity of direct human interaction within the immediate project team as well as less regular but often stimulating chance encounters across a campus with lots of interesting people. I am lucky to have a very short commute (15min on foot), which helps with flexibility and is a big plus in terms of quality of life. I don't think, I would want to work from home entirely  but a mixed-use, small scale neighborhood, which offers close proximity between work and home and plenty of opportunities for chance encounters in between seems to be an optimal compromise.