Project 1: Mini-ELF checker

This project serves as an introduction to C programming and writing standard Linux command-line programs, and the main goal is to reinforce the concept that information = bits + context.

In this project you will open a binary file and verify that it is a valid Mini-ELF file based on its header. You must also print its contents based on contextual information, first as simply a series of bytes and then according to the file's format. The file is formatted in a simplified version of the Executable and Linkable Format (ELF) used to store relocatable object files in Linux. The format is simplified, but it is still a very real executable format, and in later projects you will load these programs into memory and "run" them.

For generic project instructions, refer to the project guide.

Here is the path to the starter tarball file on stu:

/cs/students/cs261/f17/common/p1-check.tar.gz

This project is based on a project originally written by Dr. Michael Kirkpatrick.

Mini-ELF format

The details of the Mini-ELF format are described by comments in elf.h. You will need to read the documentation in that file to proceed. For this project, you need only focus on reading the first 16 bytes of the file, which is the ELF header. This header provides information about how to interpret the rest of the file, which you will need to do in the next project. In this project, you should read the header into an in-memory data structure (of type elf_hdr_t, provided for you in elf.h) and print the values accordingly.

Here is a portion of the documentation from elf.h regarding the Mini-ELF format:

   Mini-ELF file format (byte 0 = first byte of the file)
   +----------------------------------------------+
   | header (elf_hdr_t) - 16 bytes                |
   +----------------------------------------------+
   | program headers (elf_phdr_t) - 20 bytes each |
   +----------------------------------------------+
   | program segments - variable length of bytes  |
   +----------------------------------------------+
   | symbol table - each entry is 4 bytes each    |
   +----------------------------------------------+
   | string table - variable length of strings    |
   +----------------------------------------------+

   ELF header structure:
   +----------------------------------------------------------------------------+
   |  0   1  |  2   3  |  4   5  |  6   7  |  8   9  | 10  11  | 12  13  14  15 |
   | version | entry   | phdr    | numphdr | symtab  | strtab  | magic number   |
   +----------------------------------------------------------------------------+

   Sample ELF header (all entries in hex, format is little endian):
   +----------------------------------------------------------------------------+
   |  01  00 |  00  01 |  10  00 |  05  00 |  ac  00 |  c2  00 | 45  4c  46  00 |
   | version | entry   | phdr    | numphdr | symtab  | strtab  | magic number   |
   +----------------------------------------------------------------------------+

   version = 0x0001     entry = 0x0100      phdr = 0x0010     numphdr = 0x0005
   symtab = 0x00ac      strtab = 0x00c2     magic = "ELF\0"

   Interpretation:
   This file was created under version 1 of this format. When the program is
   loaded into memory, the instruct at address 0x100 (256) will be executed
   first. The first program header (which indicates segments in this file)
   starts at offset 0x10 (16) into the file, and there are 5 program headers
   total. The symbol table starts at offset 0xac (172) into this file, and the
   string table starts at offset 0xc2 (194). The magic number is the string
   "ELF\0" and is for error checking.

We have included several example Mini-ELF files in the tests/inputs subfolder of the project distribution; you can use these to test your program from the command line.

Unit Requirements

Here are the required functions that you must implement in p1-check.c. We will use unit tests to exercise this portion of your submission.

For this project there is only one required function:

  • bool read_header (FILE *file, elf_hdr_t *hdr);

    Read a Mini-ELF header from the file into the space pointed to by hdr. This function should do error checking to make sure the header is valid (e.g., the file is large enough and the header has the proper magic number). Return true if and only if a valid Mini-ELF header was read.

WARNING: To ensure your solution's compatibility with future projects, you should NOT print anything to standard output in these functions.

Recommended Functions

We recommend writing the following functions as helpers for your main routine, but they are not required and thus are not covered by unit tests. You may add these to either main.c or p1-check.c, but we recommend the latter. In future projects, we will provide compiled implementations of these functions in p1-check.o.

  • bool parse_command_line_p1 (int argc, char **argv, bool *header, char **file);

    Using the command-line options passed in argv, set the boolean pointed to by header to true if the -H option is passed. Set file to point to the file name string. Return true if and only if valid arguments are passed. If the -h option is passed, you should also return true, but the function should should also print the help message and your program should exit immediately after returning (however, you should exit in main(), not in this function!).
  • void dump_header (elf_hdr_t hdr);

    Print the Mini-ELF header passed in hdr according to the specification described below.

Integration Requirements

In addition, you must implement main() in main.c such that your program behaves as described below. The main goal of this part of the semester-long project is to open a Mini-ELF file, load its header into memory and check it for validity. We will use integration tests to exercise this portion of your submission. We recommend that you implement and use the recommended functions described above to make your main() method cleaner.

For this project, your program must parse command-line parameters according to the following interface:

Usage: y86 <option(s)> mini-elf-file
 Options are:
  -h      Display usage
  -H      Show the Mini-ELF header

It is strongly recommended that you use the getopt() library function to parse the command-line parameters. We have provided a function usage_p1 that prints the usage text above.

For this project, there are only two possible switches ("-h" and "-H"). The first switch ("-h") is the standard "help" option that is customary in Linux programs. If that switch is passed, your program should print the help text (use the provided usage_p1 function!) and exit without doing anything else. Otherwise, the program should open the given file and extract its Mini-ELF header. The second switch ("-H" -- note that capitalization matters here as much as it does in the C language itself!) should cause your program to print the Mini-ELF header in the format described below.

IMPORTANT: You should open the file, load the header, and check it for validity regardless of whether the -H switch is specified! Later projects will build on this one and specifically we will re-use the Mini-ELF header in P2.

If the -H option is passed, you must first print out the first 16 bytes of the file in a little-endian format similar to that produced by the hexdump utility. Then, you must re-interpret those 16 bytes as a Mini-ELF header, printing its contents in a well-formatted structure that is best demonstrated by example. The following sample output shows the expected output for some of the provided Mini-ELF test files (in the tests/inputs subfolder):

$ ./y86 -H tests/inputs/simple.o
00000000  01 00 00 01 10 00 02 00  58 00 70 00 45 4c 46 00
Mini-ELF version 1
Entry point 0x100
There are 2 program headers, starting at offset 16 (0x10)
There is a symbol table starting at offset 88 (0x58)
There is a string table starting at offset 112 (0x70)

$ ./y86 -H tests/inputs/stripped.o
00000000  01 00 00 01 10 00 02 00  00 00 00 00 45 4c 46 00
Mini-ELF version 1
Entry point 0x100
There are 2 program headers, starting at offset 16 (0x10)
There is no symbol table present
There is no string table present

The first line is a simple little-endian hex dump of the Mini-ELF header data (i.e., the first 16 bytes of the file). The rest of the output contain contextual data from the Mini-ELF header, such as the version number, the entry point address, and information about the program headers and symbol/string tables. See elf.h for help understanding the information required and how to access it from the ELF header struct. If a table address is zero, that indicates that the table is not present in the Mini-ELF file and you should print the appropriate message as shown above.

Your output must match the expected output exactly. Note there's a double space after the file offset (which will always be "00000000" for this project because you're only printing the first 16 bytes), and then another double space separating the first eight bytes from the second eight bytes on the same row. This format improves visibility a bit, especially in the next project where you will be printing out much larger segments of hex data.

Error checking

Robust software proactively checks for potential errors and responds appropriately when an error occurs. Failure to build robust software leads to security breaches, lost sales, and other problems; this failure is not acceptable. Our grading procedures will try to break your code. The following list is a sample (not complete) of the types of errors we may test:

  • Passing NULL values in pointer parameters
  • Passing names of files that do not exist or have permission restrictions that prevent reading
  • Passing invalid command-line options
  • Not passing a file name
  • Passing the name of a file that is too small
  • Passing a file that contains an invalid header

If the given file cannot be opened or contains an invalid header, your program should print the error message "Failed to read file" with a newline and exit with the EXIT_FAILURE code defined in stdlib.h.

The above list is not necessarily exhaustive, and you should think carefully about what sort of errors might occur so that you can prevent them or perform additional error checking as needed. In particular, we will also use valgrind to detect memory leaks. Failure to respond appropriately will result in grade reductions.

Hints

  • Read and ask questions early. If there is any part of this document or the project files that you do not understand, you should ask it on Piazza as soon as possible.
  • Start work early. You will be unable to finish this project if you leave it to the last few days before it is due. Set up your project work folder as soon as possible.
  • Follow test-driven design. Before you write a single line of code, create test cases based on the output specification described above. Do NOT write any code without having a test case ready to test your code.
  • Don't over-complicate. The reference solution for this project is around 175 lines of code in main.c and p1-check.c combined. If your solution grows to more than 250 or 300 lines, you may wish to re-evaluate your approach; it's likely that you are over-complicating.
  • Use version control. Learn Git or Mercurial and keep your code in a repository. It will save you much time and anguish if you ever accidentally delete something.
  • Learn to use a debugger. Debuggers allow you to "poke around" while your program is running to figure out where your mental model of the program differs from reality. This will make fixing problems much easier.
  • Build features iteratively. Pay attention to the order of requirements and add features one at a time. For this project, you could start by hard-coding a file name, reading the first 16 bytes of that file, and printing the hex values. Then get the file name from the command line. Once you've got that working, then you should look at the ELF header stuff. Until you can read a binary file, you shouldn't even be thinking about that.

    If you write 100 lines of code without compiling, you're doing it wrong; you will quickly become overwhelmed and frustrated with the number of bugs and compiler errors you will encounter. The key to success is to write a small amount of code, compile it, then test it. If after implementing the new functionality you fail a test case that you were passing previously, you should fix the regression before moving on to a new part of the project.

Grading

The following requirements are necessary to earn a grade of 'A' for this project:

  1. Read the 16-byte Mini-ELF header and print it in hex format
  2. Print the ELF header in the specified structured format
  3. Accept all valid command-line options
  4. Handle error checking (described above) appropriately
  5. Reject invalid command-line arguments

Completing steps 1, 2, and 3 are required to earn a grade of 'B' while completing only step 1 will yield a maximum grade of 'C'. Note that these are the maximum grades you can earn. Inadequate error checking, failure to adhere to the coding standards, or deviations from the submit procedures will result in grade reductions as described in the project guide.

Failure to submit code that compiles on stu.cs.jmu.edu will result in an automatic grade of 0.

Submission

Due: Fri, Sep 22 at 23:59:59 ET (midnight)

Please see the project guide for general project help and grading policies. Please refer to the coding standards for coding practice guidelines.