Sunday, May 15, 2016

A Host by any other Name

In the previous two episodes about IP networking, we have seen a lot about raw addresses and port-numbers, because that is how the networking stack operates internally. But this is not how we interact with the Internet in real life. Except for trouble-shooting, we don’t typically use raw addresses and IDs but rather names. For example, instead of http://173.194.113.115:80, we would enter http://www.google.com.

In the earliest days of the Internet, people kept a list of name to IP address mappings on each computer connected to the network, similar to having each a copy of a phone book. The remnants of this file still exists today on Linux in /etc/hosts for some special local default addresses.

pi@raspberrypi ~ $ cat /etc/hosts
127.0.0.1 localhost
::1 localhost ip6-localhost ip6-loopback

127.0.1.1 raspberrypi

Beyond that, it is hardly used for name management except for the smallest networks with only up to a few hosts with static IP addresses.

Domain Name System (DNS)

As the early Internet grew rapidly, maintaining and distributing this static list of addresses to all hosts became too cumbersome and was replaced around 1984 with a more automated system, the Domain Name System (DNS).

There are two Linux tools commonly used to test and troubleshoot DNS issues: host and dig. They are in many ways fairly similar, with host having often a more terse and to the point output, while dig provides more options and the output of dig is closer to the internal DNS data format. For this article we will generally use host whenever possible, even though it is said, that real network administrators prefer dig.

pi@raspberrypi ~ $ host www.themagpi.com
www.themagpi.com has address 74.208.151.6
www.themagpi.com has IPv6 address 2607:f1c0:1000:3016:ca5a:fd42:5e1e:9032
www.themagpi.com mail is handled by 10 mx00.1and1.com.
www.themagpi.com mail is handled by 10 mx01.1and1.com.

DNS is essentially a hierarchical and distributed database for names, addresses and a bunch of other resources on the Internet. The DNS systems consists of a potentially replicated tree of authoritative name-servers, each of which being responsible for a particular subdomain or sub-organization of the network. Fully qualified DNS hostnames reflect that hierarchy by chaining a list of sub-names separated by dots. For examples www.themagpi.com represents a host called “www” owned by an organization with sub-domain “themagpi” within the top-level domain initially created for US commercial use.

pi@raspberrypi ~ $ dig any +nostats themagpi.com

; <<>> DiG 9.8.4-rpz2+rl005.12-P1 <<>> any +nostats themagpi.com
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 7925
;; flags: qr rd ra; QUERY: 1, ANSWER: 7, AUTHORITY: 0, ADDITIONAL: 0

;; QUESTION SECTION:
;themagpi.com. IN ANY

;; ANSWER SECTION:
themagpi.com. 85673 IN MX 10 mx01.1and1.com.
themagpi.com. 85673 IN SOA ns51.1and1.com. hostmaster.1and1.com. 2014022701 28800 7200 604800 86400
themagpi.com. 85673 IN MX 10 mx00.1and1.com.
themagpi.com. 85673 IN NS ns52.1and1.com.
themagpi.com. 85673 IN NS ns51.1and1.com.
themagpi.com. 85673 IN A 74.208.151.6
themagpi.com. 85673 IN AAAA 2607:f1c0:1000:3016:ca5a:fd42:5e1e:9032

This example shows a few common DNS resource types for hosts and sub-domains: IPv4 address (A), IPv6 address (AAAA), authoritative name-server (NS), designated email exchange (MX) or zone master information (SOA).

Or for a more complicated sub-domain hierarchy with a host aptly named enlightenment at Christ Church, a constituent college of the University of Oxford, which is part of the British academic and research network under the .uk top-level domain.

pi@raspberrypi ~ $ host enlightenment.chch.ox.ac.uk
enlightenment.chch.ox.ac.uk has address 129.67.123.166
enlightenment.chch.ox.ac.uk mail is handled by 9 oxmail.ox.ac.uk.

At the root of the DSN hierarchy are a set of currently 13 root nameservers which contain information about all the top-level domains in the Internet. This authoritative master for this data is currently  operated by Internet Corporation for Assigned Names and Numbers (ICANN).

In order to look up any hostname in the DNS system, a client only needs to know the address of one or more of the root servers to start the resolution. The query starts at one of the root servers, which returns the addresses of the name servers which are in term the authoritative source of information about the next sub-domain in the name, until one is reached which finally knows the address of host we are looking for. In the case of enlightenment.chch.ox.ac.uk we need to ask 4 different servers until we finally reach the one which knows the address (SOA stands for start of authority, the identity of a new authoritative zone):

pi@raspberrypi ~ $ host -t SOA  .
. has SOA record a.root-servers.net. nstld.verisign-grs.com. 2014030701 1800 900 604800 86400
pi@raspberrypi ~ $ host -t SOA  uk
uk has SOA record ns1.nic.uk. hostmaster.nic.uk. 1394217217 7200 900 2419200 172800
pi@raspberrypi ~ $ host -t SOA  ac.uk
ac.uk has SOA record ns0.ja.net. operations.ja.net. 2014030760 28800 7200 3600000 14400
pi@raspberrypi ~ $ host -t SOA  ox.ac.uk
ox.ac.uk has SOA record nighthawk.dns.ox.ac.uk. hostmaster.ox.ac.uk. 2014030772 3600 1800 1209600 900
pi@raspberrypi ~ $ host -t SOA chch.ox.ac.uk
chch.ox.ac.uk has no SOA record

The dig command has a +trace option which allows us to find all the authoritative nameservers in the resolution path:

pi@raspberrypi ~ $ dig +trace www.themagpi.com

; <<>> DiG 9.8.4-rpz2+rl005.12-P1 <<>> +trace www.themagpi.com
;; global options: +cmd
. 3599979 IN NS j.root-servers.net.
. 3599979 IN NS b.root-servers.net.
. 3599979 IN NS m.root-servers.net.
. 3599979 IN NS e.root-servers.net.
. 3599979 IN NS g.root-servers.net.
. 3599979 IN NS h.root-servers.net.
. 3599979 IN NS c.root-servers.net.
. 3599979 IN NS i.root-servers.net.
. 3599979 IN NS l.root-servers.net.
. 3599979 IN NS k.root-servers.net.
. 3599979 IN NS a.root-servers.net.
. 3599979 IN NS d.root-servers.net.
. 3599979 IN NS f.root-servers.net.
;; Received 241 bytes from 62.2.17.60#53(62.2.17.60) in 238 ms

com. 172800 IN NS i.gtld-servers.net.
com. 172800 IN NS j.gtld-servers.net.
com. 172800 IN NS d.gtld-servers.net.
com. 172800 IN NS h.gtld-servers.net.
com. 172800 IN NS f.gtld-servers.net.
com. 172800 IN NS e.gtld-servers.net.
com. 172800 IN NS b.gtld-servers.net.
com. 172800 IN NS a.gtld-servers.net.
com. 172800 IN NS l.gtld-servers.net.
com. 172800 IN NS g.gtld-servers.net.
com. 172800 IN NS k.gtld-servers.net.
com. 172800 IN NS m.gtld-servers.net.
com. 172800 IN NS c.gtld-servers.net.
;; Received 494 bytes from 192.112.36.4#53(192.112.36.4) in 279 ms

themagpi.com. 172800 IN NS ns51.1and1.com.
themagpi.com. 172800 IN NS ns52.1and1.com.
;; Received 110 bytes from 192.5.6.30#53(192.5.6.30) in 198 ms

www.themagpi.com. 86400 IN A 74.208.151.6
;; Received 50 bytes from 217.160.81.164#53(217.160.81.164) in 37 ms

DNS resolution happens itself over UDP or TCP (port 53) and as we can imagine from the previous article, this would require quite a bit work and messages sent all around the Internet, just to find out the IP address of the host we actually want to connect to.

Fortunately this isn’t usually as complicated and expensive in real life. There are plenty of non-authoritative, caching & recursive-resolution name-servers deployed all around the edge of the Internet, which will do the work for us and remember the result for some time in case somebody asks again.

Most networking application on Linux are linked to a standard library which contains the name resolver client. This resolver will usually start by looking in the good old /etc/hosts file for a name and otherwise continue with asking name-servers in the list contained in /etc/resolv.conf.

As we can imagine, a slow or flaky name-server can severely degrade the performance of our Internet experience.  We can have a look at the time it takes to resolve certain names, and compare query times from different name-servers - .e.g. our default nameserver vs. a Google public DNS nameserver reachable at 8.8.8.8:

pi@raspberrypi ~ $ dig  +stats +noquestion +nocomment www.themagpi.com
; <<>> DiG 9.8.4-rpz2+rl005.12-P1 <<>> +stats +noquestion +nocomment www.themagpi.com
;; global options: +cmd
www.themagpi.com. 80817 IN A 74.208.151.6
;; Query time: 36 msec
;; SERVER: 62.2.17.60#53(62.2.17.60)
;; WHEN: Sat Mar  8 22:08:11 2014
;; MSG SIZE  rcvd: 50

pi@raspberrypi ~ $ dig  @8.8.8.8 +stats +noquestion +nocomment www.themagpi.com
; <<>> DiG 9.8.4-rpz2+rl005.12-P1 <<>> @8.8.8.8 +stats +noquestion +nocomment www.themagpi.com
; (1 server found)
;; global options: +cmd
www.themagpi.com. 20103 IN A 74.208.151.6
;; Query time: 27 msec
;; SERVER: 8.8.8.8#53(8.8.8.8)
;; WHEN: Sat Mar  8 22:08:51 2014
;; MSG SIZE  rcvd: 50

Dynamic Host Configuration Protocol (DHCP)

As we have seen so far, in order to properly use the Internet, we need an IP address for our local Ethernet interface, we need to know the IP address of the IP gateway to the Internet on our local LAN and we need to know the IP address of at least one name-server willing to provide name resolution.

Most of us who are using a Raspberry Pi with a standard Raspbian image have not configured all these by ourselves and probably didn’t even know what they are before we started poking around. The system which is commonly used to provide the essential configuration to hosts on a local network is called Dynamic Host Configuration Protocol (DHCP). The Ethernet interface in the standard Raspbian distribution is configured to run dhclient, a DHCP client implementation for Linux.

Whenever a host is newly connect to a network, it sends out calls for help on a well defined Ethernet broadcast address. If there is a DHCP server listening on the same network, it will respond with the necessary information about how this new host should configure its core network settings. These settings, in particular the address assignment, are only valid for a certain period of time and then need to be renewed, potentially resulting in a different configuration. In the DHCP-speak this is called a “lease”:

pi@raspberrypi ~ $ cat /var/lib/dhcp/dhclient.eth0.leases 
lease {
  interface "eth0";
  fixed-address 192.168.1.136;
  option subnet-mask 255.255.255.0;
  option routers 192.168.1.1;
  option dhcp-lease-time 86400;
  option dhcp-message-type 5;
  option domain-name-servers 62.2.17.60,62.2.24.162;
  option dhcp-server-identifier 192.168.1.1;
  option domain-name "mydomain.net";
  renew 6 2014/03/08 01:00:42;
  rebind 6 2014/03/08 10:35:34;
  expire 6 2014/03/08 13:35:34;
}

Using DHCP, a network administrator can configure an entire network through a central server instead of having to configure each host as they are connected to the network. Similar to  host and domain-names, IP addresses are managed in a distributed and hierarchical fashion, where certain network operators are assigned certain blocks of addresses, which they in turn hand out in smaller blocks to the administrators of sub-networks. Since each address must only exist once in the public Internet, address allocation requires a lot of careful planning for which protocols like DHCP can help administrators to more easily manage address at the host level.

Running a local name-server

We have seen that for a typical home network, using the default name-server of the Internet access provider can easily add 10s to 100s of milliseconds of additional latency to each connection setup.

There are many choices of DNS servers on Linux but probably the best choice for a local cache or a small local network would be dnsmasq. It is very easy to administer, has a small resource usage and can also act as a DHCP server, which makes it an easy integrated network administration tool for small networks, like a home network with just a few hosts and an Internet connection.

To configure dnsmasq as a simple local caching name-server is a simple as installing it with sudo apt-get install dnsmasq and test it:

pi@raspberrypi ~ $ dig  @localhost +stats +noquestion +nocomment www.themagpi.com
; <<>> DiG 9.8.4-rpz2+rl005.12-P1 <<>> @localhost +stats +noquestion +nocomment www.themagpi.com
; (2 servers found)
;; global options: +cmd
www.themagpi.com. 82234 IN A 74.208.151.6
;; Query time: 8 msec
;; SERVER: 127.0.0.1#53(127.0.0.1)
;; WHEN: Sat Mar  8 23:16:48 2014
;; MSG SIZE  rcvd: 50

And we get sub-10ms query times for cached addresses. In its default configuration, dnsmasq forwards all requests it has not yet cached to the default name-server configured in /etc/resolv.conf, which in our case are set by the DHCP client. We can now enable the local DNS cache to be used as the new default for the local resolver by adding the line prepend domain-name-servers 127.0.0.1 to the dhclient config file in /etc/dhcp/dhclient.conf. This will put our local server in first and default position in /etc/resolv.conf and dnsmasq is smart enough to ignore itself as a forwarder in order not to create an infinite forwarding loop.

Conclusion

As we have seen, name resolution at Internet scale requires a complex machinery which kicks into action each time we type a URL name into the browser navigation bar. The Domain Name System is a critical and sometimes political part of the Internet infrastructure. Invisible to the user, slow or flaky DNS server can severely degrade the performance we experience on the Internet. Sometimes it is not a download itself that is slow, but resolving the name of the server before the download can even start. Relying on the DNS infrastructure also requires a great deal of trust, as compromised DNS servers could easily redirect traffic to a completely different server.