Sunday, May 15, 2016

Give me a ping, Vasili!

The Linux command line has a rich set of powerful tools. Today we are looking at some examples of commands which allow us to troubleshoot networking issues.

Let’s picture a situation where our network connection to the Internet has been working fine yesterday, but suddenly seems to be down or at least has become very unreliable. We also know that the IP address of our firewall/router connecting to the Internet as 192.168.0.1 and that for an arbitrary point out in the Internet, for example 8.8.8.8, the access address of the Google public DNS service.

One of the first tools an experienced network administrator might reach for in such a situation is the ping utility. It allows to test the connectivity and measure delay and packet loss to any host in the network. First things first, let’s see if we can still reach the router on our local network:

pi@raspberrypi ~ $ ping -c3 192.168.0.1
PING 192.168.0.1 (192.168.0.1) 56(84) bytes of data.
64 bytes from 192.168.0.1: icmp_req=1 ttl=64 time=0.854 ms
64 bytes from 192.168.0.1: icmp_req=2 ttl=64 time=0.782 ms
64 bytes from 192.168.0.1: icmp_req=3 ttl=64 time=0.778 ms

--- 192.168.0.1 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2002ms
rtt min/avg/max/mdev = 0.778/0.804/0.854/0.047 ms

The ping command works by sending probe packets to the destination and waits for them to be sent back. For that it uses the special echo request and echo reply messages defined by the Internet Control Message Protocol (ICMP). A communication protocol is a well specified convention on how two systems can communicate with each other. In the case of ICMP, this convention is documented as an Internet standard in RFC792.

 Since ICMP echo request/reply processing is typically part of the lowest level of networking support in each properly implemented Internet host, it is a very reliably way to determine if a host can be reached over the network.

The ping utility is nearly as old as the Internet itself and is named after the eery sharp, metallic sound of an acoustic sonar probe, we might be familiar with from submarine movies.

From the output above we can see that our Internet gateway still exists on the network and that we can fairly reliably reach in less than a millisecond round-trip time. Out of 3 probe packets we sent, 3 responses were received and the fluctuation in the response time is quite low.

Instead of using ping -c 3 192.168.0.1 , we could also just use ping 192.168.0.1 in which case the program runs forever until interrupted by the user, sending a request every second. This lets us observe the state of network connectivity over time, for example as we wiggle network cables or plug and unplug devices.

As we can see from man ping, there are many more options which can be specified. Some particularly interesting ones are:
  • -c count : only send <count> probes and then stop
  • -i interval : send a probe approximately every <interval> seconds (default 1 second)
  • -s size : send probe packets of size <packetsize> (default 64 bytes)
  • -n : don’t try to translate numeric IP addresses into hostnames
In another example

pi@raspberrypi ~ $ ping -c 3 8.8.8.8
PING 8.8.8.8 (8.8.8.8) 56(84) bytes of data.

--- 8.8.8.8 ping statistics ---
3 packets transmitted, 0 received, 100% packet loss, time 2000ms

shows that the host with IP address 8.8.8.8 is currently not reachable right now from our network. Since we can still reach our router, but not an arbitrary address outside, we now suspect that our local area network might be working fine, but that there might be a problem with the connection of our router to the Internet.

Besides cases with either 100% success and 100% failure, there can be situations with irregular packet loss, which might be caused by flaky network cables, loose connectors, unstable or overloaded gateways in the network. Even without or with low packet loss, high delays or high variation of delay might degrade the performance of higher level protocols like http or ssh.

Like any proper Internet host, the Linux kernel in our own Raspberry Pi contains a responder for ICMP echo requests and we can effectively test our own networking stack and how fast the our kernel can process small packets:

pi@raspberrypi ~ $ ping -c 5  localhost
PING localhost (127.0.0.1) 56(84) bytes of data.
64 bytes from localhost (127.0.0.1): icmp_req=1 ttl=64 time=0.153 ms
64 bytes from localhost (127.0.0.1): icmp_req=2 ttl=64 time=0.155 ms
64 bytes from localhost (127.0.0.1): icmp_req=3 ttl=64 time=0.163 ms
64 bytes from localhost (127.0.0.1): icmp_req=4 ttl=64 time=0.146 ms
64 bytes from localhost (127.0.0.1): icmp_req=5 ttl=64 time=0.201 ms

--- localhost ping statistics ---
5 packets transmitted, 5 received, 0% packet loss, time 4005ms
rtt min/avg/max/mdev = 0.146/0.163/0.201/0.024 ms

While ping is a quick and easy way to determine if we can reach a destination or not, traceroute allows us to find out more about each hop our traffic is taking towards the destination. For example, in the situation above, we can use traceroute to see where exactly the traffic stops going through and where the problem might be:

pi@raspberrypi ~ $ traceroute 8.8.8.8
traceroute to 8.8.8.9 (8.8.8.8), 30 hops max, 60 byte packets
 1  192.168.1.1 (192.168.1.1)  0.885 ms  0.800 ms  0.838 ms
 2  * * *
 3  46.140.4.109 (46.140.4.109)  35.432 ms  35.328 ms  35.082 ms
 4  84.116.200.241 (84.116.200.241)  34.854 ms  34.642 ms  34.429 ms
 5  ch-zrh01b-ra1-ae-1.aorta.net (84.116.134.142)  33.977 ms  33.460 ms  33.244 ms
 6  * * *
 7  * * *
 8  * * *
 9  * * *
10  * * *
11  * * *
12  * * *
13  * * *
14  * * *
15  * * *

Which appears to be a problem a few hops away from our Internet connection itself. And indeed, a few moments later, the service is restored and we can now again successfully reach the destination:

pi@raspberrypi ~ $ traceroute 8.8.8.8
traceroute to 8.8.8.8 (8.8.8.8), 30 hops max, 60 byte packets
 1  192.168.1.1 (192.168.1.1)  0.965 ms  0.906 ms  1.072 ms
 2  * * *
 3  46.140.4.109 (46.140.4.109)  11.581 ms  17.977 ms  17.604 ms
 4  84.116.200.241 (84.116.200.241)  20.844 ms  20.765 ms  20.367 ms
 5  ch-zrh01b-ra1-ae-1.aorta.net (84.116.134.142)  15.715 ms  15.642 ms  15.240 ms
 6  74.125.49.101 (74.125.49.101)  20.088 ms  13.548 ms 72.14.212.210 (72.14.212.210)  13.250 ms
 7  66.249.94.52 (66.249.94.52)  18.371 ms 72.14.232.120 (72.14.232.120)  21.770 ms 66.249.94.52 (66.249.94.52)  20.267 ms
 8  209.85.251.248 (209.85.251.248)  20.250 ms 209.85.251.180 (209.85.251.180)  20.315 ms 209.85.251.248 (209.85.251.248)  19.785 ms
 9  209.85.254.118 (209.85.254.118)  20.204 ms 209.85.254.112 (209.85.254.112)  31.628 ms 209.85.254.116 (209.85.254.116)  19.167 ms
10  * * *
11  google-public-dns-a.google.com (8.8.8.8)  19.840 ms  19.633 ms  19.524 ms

The traceroute command shows addresses and hostnames of all the router nodes which packets are going through from our Raspberry Pi to the destination. For that, traceroute takes advantage of another ICMP feature, the time-to-live expiry message. All packets which are sent through the Internet have a limit set of how many times they can be passed on by routers and which is decremented at each hop to prevent packets going around forever if they can’t find their destination. When a pack is discarded in the network, the router sends out an ICMP message to alert the sender that the packet has been discarded.

In order to discover a network path, traceroute sends out a series of packets (default 3) with a time-to-live limited to 1, just to see who will send back an ICMP time exceeded message and then repeats this process with increasing time-to-live values until it reaches the destination.

There are many more useful commands to look at the state of the network or test its performance, but the main  advantage of tools like ping and traceroute is  that they work directly with support deep in the operating system kernel. This can sometimes mean that a computer is still responding to ping requests, even if it otherwise appears to be completely stuck or has no networking applications running.

When some network application like ssh or web browsing are not working properly, tools like ping and traceroute are great to figure out whether there is a low-level networking problem or the problem is maybe with the application itself.

In a future episode, we will take a closer look at layers of networking support in the Linux kernel and some tools to look at them. Until then, can you find out from where we can know, what is the address of our local Internet gateway? Hint: have a look at the netstat command.


A similar version of this article appeared in The MagPi Magazine issue 21.