Project 2: Protocol implementation

In this multi-part project, you will implement a rudimentary DNS client that is capable of retrieving information about multiple types of DNS records.

Set up a repository on stu.cs.jmu.edu based on the instructions in the CS 361 Submission Procedures, using the submit directory p2-dns.git.

Background information

DNS is a distributed database system that is used to translate human-readable domain names (e.g., www.jmu.edu) into IP addresses. The protocol is defined in RFC 1034 and the key data structures are defined in RFC 1035.

In its simplest form, DNS is a stateless request-response protocol. A DNS client sends a query to a server. This query contains one or more questions that indicate the domain name under consideration and the type of DNS record being retrieved. The server will then try to match this information with the requested record. If found, the server's response will include one or more answers to the query.

The textbook provides an overview of the structuere of queries, responses, and resource records, with more details in RFC 1035. The following example illustrates a query for the IPv4 address for jmu.edu:

12 34 01 00 00 01 00 00 00 00 00 00 03 6a 6d 75 03 65 64 75 00 00 01 00 01

The first twelve bytes are the header and can be interpreted as follows:

`1234`	`XID=0x1234`	random identifier
`0100`	`OPCODE=SQUERY`	message is a request
`0001`	`QDCOUNT=1`	1 question is asked
`0000`	`ANCOUNT=0`	0 answers provided
`0000`	`NSCOUNT=0`	0 authoritative records provided
`0000`	`ARCOUNT=0`	0 additional information records provided

The remaining thirteen bytes are the question asked in this query. In DNS, domain names are not written in the standard dotted notation. Instead, one byte is used to indicate the length of the next portion of the address. So 03 6a 6d 75 is the "jmu" portion followed by 03 65 64 74 ("edu"). The next byte is the null byte (00) to indicate the end of the address.

The final four bytes indicate the QCLASS is 00 01 (IN, which indicates "Internet") and QTYPE is 00 01 (A record, which indicates IPv4). In this project, all records will use the IN value for the QCLASS value, but you will support different QTYPE records.

The corresponding response would be:

12 34 81 80 00 01 00 01 00 00 00 00 03 6a 6d 75 03 65 64 75 00 00 01 00 01 c0 0c 00 01 00 01 00 00 03 84 04 86 7e 7e 63

The first 25 bytes of this response are an exact copy of the request with two differences. Bytes three and four (81 80) set additional flags (RESPONSE and RA) to indicate that it is a response and recursive lookups are available. Bytes seven and eight indicate that one answer is provided.

The answer starts with the bytes c0 0c. The structure of A records begins with the domain name. However, DNS compresses the responses by avoiding repetition. The first byte (c0) indicates the domain name is compressed to repeat the bytes starting at offset 0c (byte 12). (See the encoding of jmu.edu described above.)

The remaining bytes indicate the QTYPE is 00 01 (QTYPE=A), the QCLASS is 00 01 (QCLASS=IN), the TTL is 900 (0x384) seconds, the size of the data (RDLENGTH) is 4 bytes and the data result (RDATA) is 86 7e 7e 63 (134.126.126.99).

In this project, you will be implementing a DNS client, formatting queries according to the structure above. Your client will receive the responses as the binary data and print out the results in a manner similar to the dig command-line utility.

Implementation requirements

This project is designed to be completed incrementally in multiple phases. You should plan on an average of 10-14 work days for each phase. If you commit to this schedule, you will be able to complete all phases by the final deadline.

Phase 1: Socket communication basics

Your first task is to build a hard-coded request to a provided DNS server and interpret the results. You will use a hard-coded XID value of 1 and an empty domain name. That is, your request will consist of the following bytes:

00 01 01 00 00 01 00 00 00 00 00 00 00 00 01 00 01

The response that comes back will be for one of the root servers. The server will select one to use for the reply randomly, and you will need to report the results based on the data received. For the structure of the output, see the files in p2-dns/tests/expected

NOTE: DNS utilities like dig have a convention of appending a dot on the end of an interpreted domain name. As such, you should make sure to indicate that the domain name is a.root-servers.net. (with the dot) rather than a.root-servers.net (without).

Testing your client

Throughout this project, you are building a client that will interact with a pre-built server. Due to the nature of network-based communication, you should NOT rely on make test for testing your code. Specifically, doing so, you would not be able to distinguish between your client failing to send the data, your client sending invalid data, and your client failing to retrieve the response.

Instead, you will need to use two terminal windows to start the server manually then running your client separately to send the request. In the p2-dns/tests directory, running ./dukens -s 10 will start the server and wait up to 10 seconds for a request. You can adjust the wait time with a different -s argument. You can then run your client as ./digduke in the p2-dns directory.

The window running the server will provide helpful information about your code's functionality. When the server receives a packet, it will display the bytes it received. The server will also try to interpret the query's question and display the bytes it is sending in response.

Phase 2: DNS queries for IPv4 records (C requirements)

Once you have basic network communication working, you will add support for sending requests and receiving responses for A records, both with and without compression. The format of your output will be similar to that used by dig. For example, the response described in the background section of this page would be formatted as follows:

;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 4660
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0

;; QUESTION SECTION:
;jmu.edu. IN A

;; ANSWER SECTION:
jmu.edu.                       IN A 900.        134.126.126.99

Much of this output can be compared with the example described above. The status and flags field warrant more explanation. Despite their separate labels in this output, they both derive from the third and fourth bytes of the response header. In the example above, these bytes were 81 80. These indicate the qr (query result), rd (recursion desired), and qa (recursion available) bits are set. You will also need to detect whether the aa (authoritative answer) bit is set (resulting in 8580).

The last hex digit is used to indicate the status. A value of 0 (NOERROR) was successful. A value of 2 (SERVFAIL) would indicate that the query failed because of a bad domain name.

Phase 3: Multiple responses (B requirements)

In the previous phase, all requests were for IPv4 records and the response contained a single A record. In this phase, you'll extend your client to support a wider array of responses. As a first step, you will handle responses that return multiple IPv4 records rather than a single one. In practice, a network client (e.g., a web browser) would select one of these answers at random to use. The purpose of having multiple IP addresses for something like a web server is to distribute the workload across multiple instances of the server rather than sending all requests to a single centralized server, making it vulnerable to being overloaded.

Your next step will be to support sending and processing requests for other types of DNS records. Specifically, you'll add support for IPv6 addresses (AAAA records), SMTP mail exchange servers (MX records), and DNS name servers (NS records).

You will also need to handle a few circumstances that are not immediately obvious. First, some of the responses will use compression techniques more than once. Second, you will need to support MX records that have empty domain names (used to indicate that there is no such mail server). Finally, you'll need to handle the case where a record is sought but the response contains no answers. Note that, in this last case, the query is considered to have a NOERROR status; it is just that the number of answers is 0.

Phase 4: More advanced records (A requirements)

In this final phase, you'll add support for a few more types of DNS records, including canonical names (CNAME), reverse DNS lookup (PTR), and start-of-authority (SOA) records. In contrast to the previous record types, these rely on more than just the ANSWER. Rather, they will also rely on the ADDITIONAL and AUTHORITY fields to convey other information. Your client will need to examine the NSCOUNT and ARCOUNT fields to determine if they are used.

For CNAME results, the record itself indicates the canonical name (e.g., stu.cs.jmu.edu for the domain name stu). In some circumstances, the DNS server also contains record entries for this canonical name. These records may include the IPv4 address. These records may be returned in either the ADDITIONAL field or as more records in the ANSWER field.

PTR record lookups require two special considerations. First, the IPv4 address that is being considered must be converted to an .in-addr.arpa domain name. In doing so, the address must be reversed to reflect the hierarchical nature of IPv4. For example, the address 1.2.3.4 would be converted to 4.3.2.1.in-addr.arpa. Second, PTR results may be accompanied by CNAME records to indicate which result should be considered definitive.

Finally, you will need to add support for additional error conditions. This could include NXDOMAIN status fields to indicate an error or returning SOA records for IPv6 queries that do not have answers.