The Internet layer determines the logical path that packets will traverse through a local network and the Internet as a whole. The link layer defines the protocols that control how the bits are transmitted across an underlying physical technology. For example, the network shown previously in Figure 5.5.3 places an emphasis on routing a packet through a series of routers in an AS. These routers may be part of distinct networks that use different underlying technologies. For instance, each router may be operated and administered by separate departments that are part of the same organization; some links may consist of wireless connections, while other logical links are created by a chain of switches—router-like devices that are connected by cables. The link layer, then, focuses on the task of forwarding packets across point-to-point connections between routers and end-point devices.
The distinction of routing and forwarding, like the distinction between routers and switches, is subtle and can be misunderstood. In essence, the classification of routing/routers is used to describe the communication between heterogeneous networks that may rely on different types of communication technologies. One network might employ a packet switching technology (e.g., Ethernet) that uses a structured message format that allows any device to send and receive data at any time. A router might connect that network to one that uses circuit switching (e.g., FDDI or token ring), in which two devices communicate directly over a dedicated channel; other devices may be connected to the network, but they have to wait until it is their turn to control the transmission channel. In contrast, the classification of forwarding/switches refers to communication within a homogeneous network with a single underlying technology. A switch does not receive a message from one technology (Ethernet) and forward it using another (FDDI). Switches only serve as the links between hosts in a single network.
The switching that occurs at the link layer creates another possible source of packet loss that TCP’s reliability is intended to address. When a packet arrives, there is a processing delay associated with the work to compute checksums, determine the higher-level protocol, and so on. Queueing delays occur while the packet is waiting to be processed or transmitted. Transmission delays are imposed by the work to encode the data into light signals or radio waves. Since the light signals and radio waves must travel across physical space, the packet also experiences propagation delays. These delays occur at every switching link in the network, accumulating to create increasingly greater round-trip times. Consequently, link-layer protocols strive to balance correctness (ensuring limited re-transmissions) with efficient processing in order to avoid causing packet losses.
Given the ubiquity of the technology, many readers probably associate the term Ethernet with the cable. In actuality, Ethernet is a collection of standards defined and maintained by the IEEE 802.3 working group. That is, Ethernet is not defined as a stand-alone protocol defined in an RFC; Ethernet involves several protocols that are co-designed with the physical cable technology that they use. These physical technologies can range from twisted-pair copper wires to fiber-optic wires made of glass or plastic.
Regardless of the type of the physical medium used, all Ethernet frames maintain the same basic structure, shown in Table 5.12. Note the use of the term octet rather than byte. Although modern systems have generally settled on the use of the term byte to denote eight bits, this connotation was not always true; some technologies used byte to refer to a basic addressable unit of memory, which was not necessarily eight bits in size. An octet, however, must be exactly eight bits. [1]
8 octets | 6 octets | 6 octets | 2 octets | varies | 4 octets |
---|---|---|---|---|---|
Preamble |
Destination |
Source |
Type |
Payload |
CRC |
Table 5.12: Structure of an Ethernet frame
The preamble
of an Ethernet frame consists of seven octets of 10101010
followed by a single
octet of 10101011
. The purpose of the preamble
is to declare to a device intends to send a
frame and to synchronize the other devices to listen as receivers. The destination and source
addresses are 48-bit (6-octet) media access control (MAC) addresses. [2] Unlike IP
addresses, MAC addresses are persistently associated with a hardware device and do not provide any
implication of the device’s logical location in the network. MAC addresses are determined by the
device manufacturer and are stored in either firmware or hard-wired storage. The type field of the
Ethernet frame determines which Ethernet protocol standard is being used. The payload contains the
Internet-layer data (e.g., an IP packet); the maximum size varies based on the version of Ethernet,
but most have a maximum transmit unit (MTU) size of approximately 1500 octets. Finally, the
frame ends with the field checksum
(FCS
), which is a 32-bit cyclic redundancy check
(CRC) calculation that provides a more robust error detection mechanism than checksums. As one
example of the difference, CRC values can detect when the order of the octets has been changed,
while checksums cannot.
The MTU size implies that a lot of network traffic requires multiple frames. Consider an HTTP request to load a GIF containing an Internet meme showing a short video of cats (people on the Internet love cat videos!). Image files tend to be multiple MB in size. If a single video is 3 MB in size, that image alone would require 2098 blocks of 1500 bytes. However, each frame must also have the TCP/UDP and IP headers attached, so some of the 1500 bytes is already accounted for. Using the bare minimum of 20 bytes for TCP and 40 for IP, the image would now require 2185 frames. This fragmentation exacerbates the reliability service of TCP, as all of these frames must be successfully transmitted (repeatedly) before the RTT timeout occurs. If any frame fails to arrive on time, the TCP client (i.e., the web browser) declares the entire image lost and provides the user with a (generally unhelpful) error message that the connection timed out. Hence the reason that OSPF, RIP, and BGP prioritize finding the shortest, most efficient path possible.
Example 5.6.1
To illustrate the structure of an Ethernet frame, the following header extends the IPv4 datagram from Example 5.5.1 (which extends the TCP segment from Example 5.3.2).
Preamble | Destination | Source | Type | Payload | FCS |
aaaaaaaaaaaaaaab |
f0def12cc22b |
f45c89bd332d |
0800 |
... |
64713722 |
The destination field is the MAC address f0-de-f1-2c-c2-2b
, and the source field is the address
4f-5c-89-bd-33-2d
. These identifiers are persistently associated with the networking hardware
components. The type 0800 indicates that this frame is using Ethernet II, the most common style of
Ethernet framing. Finally, the FCS is the 32-bit CRC calculation over the entire frame.
The figure below illustrates the complete structure of the Ethernet frame by combining this example
with Example 5.3.2 and NetIPExample. The frame begins with the Ethernet
header. The Ethernet payload combines the IPv4 header, TCP header, and HTTP header. (As a GET
request, the HTTP message body is empty and only the header is sent.) At the same time, the IPv4
payload consists of the TCP and HTTP headers, whereas the HTTP header is the payload of the TCP
segment.
Anatomy of a complete Ethernet frame with IPv4, TCP, and HTTP data
The previous discussion of Ethernet introduced a new form of addressing to locate hosts within a network. Figure 5.6.3 shows a simple Ethernet segment with two end devices and a router; each of these three hosts has both a MAC address and an IP address. MAC addresses do not have any logical relationship to the network topology itself, while IP addresses are logical identifiers that are not tied to the hardware. As such, routers need some way to translate an IP address into a MAC address. Without such a mapping, routers would not be able to encapsulate the IP packet in an Ethernet frame for the intended host device.
The Address Resolution Protocol (ARP) is a simple protocol for establishing this mapping, as
defined in RFC 826. Assume that the two end host devices in Figure 5.6.3 need to communicate, with
the 192.168.1.2 host sending data to 192.168.1.3. The sender broadcasts an Ethernet frame containing
an ARP query to the reserved MAC address ff-ff-ff-ff-ff-ff
. All nodes receive the query, but
only the intended recipient, 192.168.1.3, replies. At that point, the 192.168.1.2 host stores this
mapping in a local cache for a period of time. After doing this, 192.168.1.2 can use the appropriate
destination MAC address to transmit the IP packets as needed.
Note
ARP is an insecure protocol that assumes all connected devices behave correctly. In an ARP cache poisoning attack, an adversary that has access to a network can respond to ARP queries with its own MAC address. The protocol defines no authentication mechanism to confirm that the response is correct. This weakness is often acceptable if networks are secured so that only authorized devices can be used. However, in public settings, such as a free café Wi-Fi network, the assumption of trust can break down, allowing devices to intercept messages intended for others.
To summarize the Internet model up to this point, the application layer uses transport-layer protocols to create a process-to-process logical communication channel. The transport layer encapsulates this information in a host-to-host link using Internet-layer routing between potentially heterogeneous networks. The link layer then provides the mechanism for point-to-point data transmission in a homogeneous network using the same underlying physical technology. These layers of abstraction leave one question remaining: How do the bits actually get transmitted from one device to another?
Figure 5.6.5 illustrates the basic principles involved in the physical data transmission. Fundamentally, all of the physical networking technologies are transmitting either light or radio signals, both of which can be modeled as an oscillating waveform. The default signal with no encoded information is called a carrier signal. This signal can be modulated to encode information by manipulating one of three characteristics of waves: frequency, amplitude, or phase. The frequency refers to the number of oscillations in a given time, as illustrated by how many times the wave oscillates between a maximum and minimum value. The amplitude denotes the height of the wave. The phase refers to the timing of when the wave begins and ends, illustrated by the alignment of the maximum and minimum values.
In phase shift keying (PSK), the carrier wave operates at a fixed frequency, but its phase is manipulated by changing the sine and cosine of the inputs. The precise calculations depend on the particular scheme being used, but these techniques generally all map the measurement values to points on the complex number plane. In binary PSK (BPSK), there are two possible points to indicate the values 0 or 1. Other schemes use more points to map multiple bits. For instance, each measurement in quadrature PSK (QPSK) maps to one of four points to encode two bits; 8-PSK uses eight points to encode three bits.
In frequency modulation (FM), the frequency is changed to be either faster or slower than the carrier wave. If this technique were used on sound waves that were in a range audible to humans, FM would correspond to making the pitch higher or lower than the default range. Amplitude modulation (AM) keeps the frequency the same as the carrier wave but increases or decreases the magnitude of the difference between the maximum and minimum values. In the audible range, this would correspond to making the sound louder or softer.
FM and AM have long been used for analog signal transmission. Readers may associate these terms with radio stations, and for good reason: Radio stations with AM channels use amplitude modulation to encode sound in the range of 540 kHz to 1600 kHz (kHz = 1,000 cycles per second). FM radio stations use frequency modulation to encode sound between 88 MHz and 108 MHz (MHz = 1,000,000 cycles per second). So, a radio station that advertises itself as FM 101.1 is sending a stream of bits by changing the frequency to be slightly above and slightly below 101.1 MHz. Note, though, that FM and AM are not restricted to analog radio signals. All three techniques are used to modulate digital signals, as well.
Consequently, when we say that Ethernet is sending the preamble of a frame by changing the transmitted bit from 1 to 0, it means that the network device is using one of these techniques to manipulate the signal it is transmitting. The received performs the corresponding de-modulation to restore the carrier wave and records the transmitted bit. By coordinating this signal transmission, the link layer can transmit a packet from one network device to another. These links can then be chained together to establish network routing, leading to higher levels of communication protocols.
[1] | The bit ordering in Ethernet can be confusing to interpret. Ethernet writes the most
significant octet first, but the least significant bit within the octet is written first. For
example, the hexadecimal value 0xa7 would be written as 1110 0101 rather than 1010
0111 . We ignore this detail in Example 5.6.1 for simplicity. |
[2] | There is no relationship between a MAC address as described here and the cryptographic notion of MAC—message authentication code. |