Project 1: Mini-ELF checker

This project serves as an introduction to C programming and writing standard Linux command-line programs, and the main goal is to reinforce the concept that information = bits + context.

In this project you will open a binary file and verify that it is a valid Mini-ELF file based on its header. You must also print its contents based on contextual information, first as simply a series of bytes and then according to the file's format. The file is formatted in a simplified version of the Executable and Linkable Format (ELF) used to store relocatable object files in Linux. The format is simplified, but it is still a very real executable format, and in later projects you will load these programs into memory and "run" them.

For generic project instructions, refer to the project guide. Please read both that guide and this description carefully and refer to it while you are working on the project. These documents have been refined extensively over several years and are very complete; most questions you will have are answered already if you read carefully.

Here is the path to the starter tarball file on stu:

/cs/students/cs261/f20/p1-check.tar.gz

This project is based on a project originally written by Dr. Michael Kirkpatrick.

Mini-ELF format

The details of the Mini-ELF format are described by comments in elf.h. You will need to read the documentation in that file to proceed. For this project, you need only focus on reading the first 16 bytes of the file, which is the ELF header. This header provides information about how to interpret the rest of the file, which you will need to do in the next project. In this project, you should read the header into an in-memory data structure (of type elf_hdr_t, provided for you in elf.h) and print the values accordingly.

Here is a portion of the documentation from elf.h regarding the Mini-ELF format:

   Mini-ELF file format (byte 0 = first byte of the file)
   +----------------------------------------------+
   | header (elf_hdr_t) - 16 bytes                |
   +----------------------------------------------+
   | program headers (elf_phdr_t) - 20 bytes each |
   +----------------------------------------------+
   | program segments - variable length of bytes  |
   +----------------------------------------------+
   | symbol table - each entry is 4 bytes each    |
   +----------------------------------------------+
   | string table - variable length of strings    |
   +----------------------------------------------+

   ELF header structure:
   +----------------------------------------------------------------------------+
   |  0   1  |  2   3  |  4   5  |  6   7  |  8   9  | 10  11  | 12  13  14  15 |
   | version | entry   | phdr    | numphdr | symtab  | strtab  | magic number   |
   +----------------------------------------------------------------------------+

   Sample ELF header (all entries in hex, format is little endian):
   +----------------------------------------------------------------------------+
   |  01  00 |  00  01 |  10  00 |  02  00 |  58  00 |  70  00 | 45  4c  46  00 |
   | version | entry   | phdr    | numphdr | symtab  | strtab  | magic number   |
   +----------------------------------------------------------------------------+

   version = 0x0001     entry = 0x0100      phdr = 0x0010     numphdr = 0x0002
   symtab = 0x0058      strtab = 0x0070     magic = "ELF\0"

   Interpretation:

   This file was created under version 1 of this format. When the program is
   loaded into memory, the instruction at address 0x100 (256) will be executed
   first. The first program header (which indicates segments in this file)
   starts at offset 0x10 (16) into the file, and there are 2 program headers
   total. The symbol table starts at offset 0x58 (88) into this file, and the
   string table starts at offset 0x70 (112). The magic number is the string
   "ELF\0", stored in the elf_hdr_t format as a 4-byte integer 0x00464c45
   (4607045) and is used for checking the validity of the header.

We have included several example Mini-ELF files in the tests/inputs subfolder of the project distribution; you can use these to test your program from the command line. You can examine these files from the command line using the hd (hex dump) command:

$ hd tests/inputs/simple.o
00000000  01 00 00 01 10 00 02 00  58 00 70 00 45 4c 46 00  |........X.p.ELF.|
00000010  38 00 00 00 15 00 00 00  00 01 00 00 01 00 05 00  |8...............|
00000020  ef be ad de 4d 00 00 00  0b 00 00 00 00 02 00 00  |....M...........|
00000030  00 00 06 00 ef be ad de  30 f3 0f 00 00 00 20 31  |........0..... 1|
00000040  40 13 fd ff ff ff 60 31  70 08 01 00 00 aa bb cc  |@.....`1p.......|
00000050  dd 00 00 00 dd 00 00 00  07 00 08 01 76 00 00 02  |............v...|
00000060  7b 00 02 02 7e 00 04 02  81 00 05 02 85 00 07 02  |{...~...........|
00000070  8a 00 09 02 8d 00 6c 6f  6f 70 00 41 42 00 43 44  |......loop.AB.CD|
00000080  00 7a 69 70 00 7a 65 72  6f 00 64 64 00 76 61 72  |.zip.zero.dd.var|
00000090  32 00                                             |2.|

If you look carefully at the first line of output, you'll see the same sequence of bytes as in the elf.h excerpt above. Your task in this project is to parse command line parameters, open a provided Mini-ELF file, read in the first 16 bytes into an elf_hdr_t struct, and print the bytes and individual fields in the formats described below. For this project, you will just ignore the rest of the file beyond the first 16 bytes (we will begin using that information in the next project).

Unit Requirements

Here are the required functions that you must implement in p1-check.c. We will use unit tests to exercise this portion of your submission.

For this project there is only one required function:

  • bool read_header (FILE *file, elf_hdr_t *hdr);

    Read a Mini-ELF header from the file into the space pointed to by hdr. This function should do error checking to make sure the header is valid (i.e., the file is large enough and the header has the proper magic number). Return true if and only if a valid Mini-ELF header was read.

WARNING: To ensure your solution's compatibility with future projects, you should NOT print anything to standard output in these functions. To verify that this function is working, complete the integration requirements for the "-H" output or use gdb to examine the contents of the struct after calling this function to make sure they match what you expect.

See the C Function reference (specifically the "File I/O" section) for a list of standard library functions that may be useful in implementing this functionality (in particular, you will need the fopen, fread, and fclose functions).

Recommended Functions

We recommend writing the following functions as helpers for your main routine, but they are not required and thus are not covered by unit tests. You may add these to either main.c or p1-check.c, but we recommend the latter. In future projects, we will provide compiled implementations of these functions in p1-check.o.

  • bool parse_command_line_p1 (int argc, char **argv, bool *print_header, char **filename);

    Using the command-line options passed in argv, set the boolean pointed to by print_header to true if the -H option is passed. Set filename to point to the file name string. Return true if and only if valid arguments are passed. If the -h option is passed, you should also return true, but the function should should also print the help message and your program should exit immediately after returning (however, you should exit in main(), not in this function!).
  • void dump_header (elf_hdr_t hdr);

    Print the Mini-ELF header passed in hdr according to the specification described below.

WARNING: To ensure your solution's compatibility with future projects, you should NOT use the above function names if you are not adhering to the above specifications. In particular, the command-line parsing routine should ONLY handle command line parsing; all of the actual file manipulation should happen in your main().

Integration Requirements

In addition, you must implement main() in main.c such that your program behaves as described below. The main goal of this part of the semester-long project is to open a Mini-ELF file, load its header into memory and check it for validity. We will use integration tests to exercise this portion of your submission. We recommend that you implement and use the recommended functions described above to make your main() method cleaner.

For this project, your program must parse command-line parameters according to the following interface:

Usage: ./y86 <option(s)> mini-elf-file
 Options are:
  -h      Display usage
  -H      Show the Mini-ELF header

It is strongly recommended that you use the getopt() library function to parse the command-line parameters. We have provided a function usage_p1 that prints the usage text above.

For this project, there are only two possible flags ("-h" and "-H"; note that it is also valid to pass neither flag). The first flag ("-h") is the standard "help" option that is customary in Linux programs. If that flag is passed, your program should print the help text (use the provided usage_p1 function!) and exit without doing anything else. Otherwise, the program should open the given file and extract its Mini-ELF header. The second flag ("-H" -- note that capitalization matters here as much as it does in the C language itself!) should cause your program to print the Mini-ELF header in the format described below.

IMPORTANT: You should open the file, load the header, and check it for validity regardless of whether the "-H" flag is specified! Later projects will build on this one and specifically we will re-use the Mini-ELF header in P2. To clarify, this means that the Mini-ELF file name should NOT be a command-line argument for the "-H" flag; it should be handled as a separate option after any flags.

If the -H option is passed, you must first print out the first 16 bytes of the file in a little-endian format similar to that produced by the hd utility. This will require you to re-interpret the elf_hdr_t struct as individual bytes. We recommend doing this by using a type cast to obtain a pointer to the beginning of the struct, where the type of the pointer is such that it allows you to extract individual bytes using pointer arithmetic or the indexing operation (i.e., use the pointer as if it was an array of byte-sized elements).

After the raw bytes, you should print the contents of the Mini-ELF header as it's meant to be interpreted in a well-formatted series of output lines that is best demonstrated by example. The following sample output shows the expected output for some of the provided Mini-ELF test files (in the tests/inputs subfolder):

$ ./y86 -H tests/inputs/simple.o
01 00 00 01 10 00 02 00  58 00 70 00 45 4c 46 00
Mini-ELF version 1
Entry point 0x100
There are 2 program headers, starting at offset 16 (0x10)
There is a symbol table starting at offset 88 (0x58)
There is a string table starting at offset 112 (0x70)

$ ./y86 -H tests/inputs/stripped.o
01 00 00 01 10 00 02 00  00 00 00 00 45 4c 46 00
Mini-ELF version 1
Entry point 0x100
There are 2 program headers, starting at offset 16 (0x10)
There is no symbol table present
There is no string table present

The first line is a simple little-endian hex dump of the Mini-ELF header data (i.e., the first 16 bytes of the file). The rest of the output contain contextual data from the Mini-ELF header, such as the version number, the entry point address, and information about the program headers and symbol/string tables. See elf.h for help understanding the information required and how to access it from the ELF header struct. If a table address is zero, that indicates that the table is not present in the Mini-ELF file and you should print the appropriate message as shown above.

Your output must match the expected output exactly. Note there's a double space separating the first eight bytes from the second eight bytes on the same row. This format improves visibility a bit, especially in the next project where you will be printing out much larger segments of hex data. Note also that there is no trailing whitespace after the bytes.

Error checking

Robust software proactively checks for potential errors and responds appropriately when an error occurs. Failure to build robust software leads to security breaches, lost sales, and other problems; this failure is not acceptable. Our grading procedures will try to break your code. The following list is a sample (not complete) of the types of errors we may test:

  • Passing NULL values in pointer parameters
  • Passing names of files that do not exist or have permission restrictions that prevent reading
  • Passing invalid command-line options
  • Not passing a file name
  • Passing the name of a file that is too small
  • Passing a file that contains an invalid header

If the given file cannot be opened or contains an invalid header, your program should print the error message "Failed to read file" with a newline and exit with the EXIT_FAILURE code defined in stdlib.h.

The above list is not necessarily exhaustive, and you should think carefully about what sort of errors might occur so that you can prevent them or perform additional error checking as needed. In particular, we will also use valgrind to detect memory leaks. Failure to respond appropriately will result in grade reductions.

Hints

  • Read and ask questions early. If there is any part of this document or the project files that you do not understand, you should ask it on Piazza as soon as possible.
  • Start work early. You will be unable to finish this project if you leave it to the last few days before it is due. Set up your project work folder as soon as possible.
  • Follow test-driven design. Before you write a single line of code, create test cases based on the output specification described above. Do NOT write any code without having a test case ready to test your code. For this course, we are providing a comprehensive suite of tests but you should also create your own, or at least become comfortable running our tests outside the testsuite to facilitate debugging.
  • Don't over-complicate. The reference solution for this project is around 175 lines of code in main.c and p1-check.c combined. If your solution grows to more than 250 or 300 lines, you may wish to re-evaluate your approach; it's likely that you are over-complicating.
  • Use version control. Learn Git and keep your code in a repository (e.g., a private GitHub repository). It will save you much time and anguish if you ever accidentally delete something.
  • Learn to use a debugger. Debuggers (e.g., gdb) allow you to "poke around" while your program is running to figure out where your mental model of the program differs from reality. This will make fixing problems much easier.
  • Build features iteratively. Pay attention to the order of requirements and add features one at a time. For this project, you could start by hard-coding a file name, reading the first 16 bytes of that file, and printing the hex values. Then get the file name from the command line. Once you've got that working, then you should look at the ELF header stuff. Until you can read a binary file, you shouldn't even be thinking about that.

    If you write 100 lines of code without compiling or testing, you're doing it wrong; you will quickly become overwhelmed and frustrated with the number of bugs and compiler errors you will encounter. The key to success is to write a small amount of code, compile it, then test it. If after implementing the new functionality you fail a test case that you were passing previously, you should fix the regression before moving on to a new part of the project.

Grading

The following requirements are necessary to earn a grade of 'A' for this project:

  1. Read the 16-byte Mini-ELF header and print it in hex format
  2. Print the ELF header in the specified structured format
  3. Accept all valid command-line options
  4. Handle error checking (described above) appropriately
  5. Reject invalid command-line arguments

Completing steps 1, 2, and 3 are required to earn a grade of 'B' while completing only step 1 will yield a maximum grade of 'C'. Note that these are the maximum grades you can earn. Inadequate error checking, failure to adhere to the coding standards, or deviations from the submit procedures will result in grade reductions as described in the project guide.

Failure to submit code that compiles on stu.cs.jmu.edu may result in an automatic grade of 0.

Submission

You must submit your files from stu. To submit, run the following command from your project directory:

/cs/students/cs261/f20/submit.sh p1

Please see the project guide for general project help and grading policies. Please refer to the coding standards for coding practice guidelines.