Project 2: Mini-ELF loader
This project is based on a project originally written by Dr. Michael Kirkpatrick.
This project serves as an expansion of the Mini-ELF utility from the first project, and the goal is to reinforce the relationship between a compiled executable and the standard Linux/C memory model.
In this project, you will continue reading data from the Mini-ELF file by first loading the program headers and then loading segments of the file into a large virtual memory array.
For generic project instructions, refer to the project guide.
Here is the path to the starter tarball file on stu:
/cs/students/cs261/f16/src/p2-load.tar.gz
The project distribution contains a compiled solution to Project 1 (i.e., p1-check.o). If your Project 1 submission passed all of the tests for that project, I recommend that you just delete this file and copy your p1-check.c into the folder, modifying the makefile appropriately (i.e., move p1-check.o from OBJS to MODS). This will give you better debug information by compiling with your solution rather than my precompiled (and stripped) solution. If you're not happy with your P1 solution, just ignore this.
UPDATE 9/27: There is an updated version of p1-check.o that fixes a subtle bug I discovered in the output routine. I have updated the main distribution file, and I've also put a copy of that individual file in the same directory if you've already started and just want to copy that file in.
TESTS UPDATED 10/6: The test suite is subject to change. Make sure you use the most recent version. If you want to download just the most recent tests, I will keep the following location on stu up-to-date so you can always pull the most recent tests from here:
/cs/students/cs261/f16/src/p2-tests
Command-line parsing
For this project, your program must parse command-line parameters according to the following interface:
Usage: y86 <option(s)> mini-elf-file Options are: -h Display usage -H Show the Mini-ELF header -a Show all with brief memory -f Show all with full memory -s Show the program headers -m Show the memory contents (brief) -M Show the memory contents (full)
The -h and -H options behave identically to their counterparts in Project 1. This proejct adds four additional options as described below. Flags can be combined in a variety of combinations:
$ y86 -H -s file.o $ y86 -sH -m file.o $ y86 -m -s file.o
The -a flag is equivalent to -H -s -m and the -f flag is equivalent to -H -s -M.
Part 1: Mini-ELF and program headers
As in the previous project, you will be working with a binary file using the Mini-ELF structure. In the previous project, you read the first 16 bytes as a header. In this project, you will read in the program headers based on the information from the header. Specifically, the hdr.e_num_phdr field tells you how many program headers you have and the hdr.e_phdr_start tells you the byte offset into the file where the program headers start. You will need to seek to this file location (fseek() may be useful here) and read the headers into a elf_phdr_t struct.
The purpose of the program headers is to provide information on how to map the contents of the file to memory when running the program. Each header will have information that includes an offset, a virtual address, and a size (see the expanded elf.h for more information). The offset refers to the byte location in the file where that data exists. The virtual address is where the data will be loaded into your simulated memory space (see below). The size is the number of bytes in the segment.
The actual loading of program headers should be implemented in the read_phdr function. Note that this function reads in a single program header but Mini-ELF files may have multiple program headers. Therefore, you will need to create an array of them in your main function and call read_phdr multiple times. The dump_phdrs function should print the full array of program headers.
Part 2: Y86 virtual memory
In the remainder of these projects, we will be simulating a 64-bit architecture called Y86. However, we will not simulate a full 64-bit address space because that much space would be excessive for our purposes. Rather, we will use a memory space with the size is determined by a constant called MEMSIZE. This constant is defined in y86.h to have a value of 0x1000, which in decimal is 4096 (so 4KB).
In your main module, you should simulate this memory by allocating space on the heap for virtual memory (simulating real memory--it's basically an array of bytes). We have provided the memory_t type (declared in y86.h) that we recommend using to store a pointer to your memory block. Because its type is a uint8_t pointer, you can use standard array accessing (e.g., mem[8] to access the byte at address 8).
After you have read in the program headers as described above, you will need to use the information in those headers to read code and data from the file into your virtual memory array. The actual loading should be implemented in the load_segment function. The corresponding output function is dump_memory, which should print the bytes in a specified region of the virtual memory.
Output specification
As in Project 1, the -H flag should cause your program to print the header:
$ ./y86 -H tests/inputs/simple.o 00000000 01 00 00 01 10 00 02 00 58 00 70 00 45 4c 46 00 Mini-ELF version 1 Entry point 0x100 There are 2 program headers, starting at offset 16 (0x10) There is a symbol table starting at offset 88 (0x58) There is a string table starting at offset 112 (0x70)
The header output above specifies that there are 2 program headers. Each program header provides information about how to copy data from the file into memory segments. The -s flag should cause your program to print the segment information:
$ ./y86 -s tests/inputs/simple.o Segment Offset VirtAddr FileSize Type Flag 00 0x0038 0x0100 0x0015 CODE R X 01 0x004d 0x0200 0x000b DATA RW
In this example, the first program header indicates that you should read 0x15 bytes (i.e., 21 bytes) from the file starting at offset 0x38 into the Y86 virtual memory at address 0x100. The program header also tells you this segment is CODE and has the flags R and X (read and execute permissions). The second segment starts at offset 0x4d in the file, is 0xb bytes (i.e., 11 bytes) in size, and should be loaded into address 0x200. This segment is DATA and has RW (read and write) permissions.
Note that there is an extra space after "RW" for the second segment, indicating that that segment is NOT executable. Make sure your output includes this space.
The -m option should print the address ranges that contain data loaded from the file (i.e., the segments). In this output, the first column is the memory address of the first byte. After that, you should print the contents of memory in hex format as follows:
$ ./y86 -m tests/inputs/simple.o Contents of memory from 0100 to 0115: 0100 30 f3 0f 00 00 00 20 31 40 13 fd ff ff ff 60 31 0110 70 08 01 00 00 Contents of memory from 0200 to 020b: 0200 aa bb cc dd 00 00 00 dd 00 00 00
As in the first project, note the address at the beginning (now a 16-bit address, in hex) as well as the extra space between sets of eight bytes. Your output must match this format exactly.
The -M flag (again, note the capital) should cause your program to print the entire 4KB contents of the Y86 memory, regardless of whether it has data loaded from the file. Any data not loaded should be zero. Example output (note the omission of some lines for brevity):
$ ./y86 -M tests/inputs/simple.o Contents of memory from 0000 to 1000: 0000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0010 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0020 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ... [some lines omitted] 00f0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0100 30 f3 0f 00 00 00 20 31 40 13 fd ff ff ff 60 31 0110 70 08 01 00 00 00 00 00 00 00 00 00 00 00 00 00 0120 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ... [some lines omitted] 01f0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0200 aa bb cc dd 00 00 00 dd 00 00 00 00 00 00 00 00 0210 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ... [some lines omitted]
When flags are combined, you should always print the header first (if requested), followed by the segments (if requested), then the contents of memory. For instance, the following example should the output for the -a flag (which as noted above is equivalent to -H -s -m):
$ ./y86 -a tests/inputs/simple.o 00000000 01 00 00 01 10 00 02 00 58 00 70 00 45 4c 46 00 Mini-ELF version 1 Entry point 0x100 There are 2 program headers, starting at offset 16 (0x10) There is a symbol table starting at offset 88 (0x58) There is a string table starting at offset 112 (0x70) Segment Offset VirtAddr FileSize Type Flag 00 0x0038 0x0100 0x0015 CODE R X 01 0x004d 0x0200 0x000b DATA RW Contents of memory from 0100 to 0115: 0100 30 f3 0f 00 00 00 20 31 40 13 fd ff ff ff 60 31 0110 70 08 01 00 00 Contents of memory from 0200 to 020b: 0200 aa bb cc dd 00 00 00 dd 00 00 00
Requirements
Here are the required functions that you must implement in p2-load.c. We will use unit tests to exercise this portion of your submission.
- bool parse_command_line_p2 (int argc, char **argv, bool *header,
bool *segments, bool *membrief, bool *memfull, char **file);
As in Project 1, set the boolean pointed to by the parameter header to true if the -H option is passed. Additionally, set the boolean pointed to by segments to true if -s was passed. Set membrief or memfull to true based on whether -m or -M was passed; it is invalid to pass both. If the command line is invalid, return false; if it is valid, return true. Also, the user must pass exactly one file name that should be returned using file, as in Project 1. - bool read_phdr (FILE *file, uint16_t offset, elf_phdr_t *phdr);
Read a Mini-ELF program header from file into the space pointed to by phdr, starting at byte offset into the file. If the reading fails or it is not a valid program header, return false. - void dump_phdrs (uint16_t numphdrs, elf_phdr_t phdr[]);
Print the Mini-ELF program headers passed in phdr. There will be numphdrs of them. - bool load_segment (FILE *file, memory_t memory, elf_phdr_t phdr);
Read data from the file into an address space memory based on the program header phdr. Note that memory_t is an alias for uint8_t*. As such, you can use memory[i] to access the element stored at memory address i. Note also that there could be zero-byte segments; for such segments there is no need to actually read anything from the file. - void dump_memory (memory_t memory, uint16_t start, uint16_t end);
Print the contents of memory starting at address start and ending just before address end. For instance, if start = 5 and end = 8, then you will print the bytes at addresses 5, 6, and 7.
In addition, you must implement main() in main.c such that your program behaves as described above. We will use integration tests to exercise this portion of your submission. Make sure you use the functions from p2-load.o in main.c--do not re-invent the wheel!
Error checking
Robust software proactively checks for potential errors and responds appropriately when an error occurs. Failure to build robust software leads to security breaches, lost sales, and other problems; this failure is not acceptable. Our grading procedures will try to break your code. The following list is a sample (not complete) of the types of errors we may test:
- Passing NULL values in pointer parameters
- Passing names of files that do not exist or have permission restrictions that prevent reading
- Passing invalid command-line options
- Not passing a file name
- Passing the name of a file that is too small
- Passing a file that contains an invalid header
- Passing a file with program headers that do not match the rest of the file
- Passing a file with invalid program headers
If the given file cannot be opened or contains invalid Mini-ELF data, your program should print the error message "Failed to read file" with a newline and exit with the EXIT_FAILURE code defined in stdlib.h.
The above list is not necessarily exhaustive, and you should think carefully about what sort of errors might occur so that you can prevent them or perform additional error checking as needed. In particular, we will also use valgrind to detect memory leaks. Failure to respond appropriately will result in grade reductions.
Hints
- Work incrementally. First, update your command-line parsing routine from P1 and verify that it works. Then implement read_phdr and dump_phdrs and verify that they work. Finally, use the information that you now have from the program headers to implement load_segment and dump_memory.
Grading
The following requirements are necessary to earn a grade of 'A' for this project:
- Read the program headers from the file
- Print the segment information from the program headers
- Load the segment contents into a dynamically allocated memory array
- Print the full memory contents according if passed -M
- Print the brief memory contents according if passed -m
- Accept all valid command-line options
- Handle error checking appropriately
- Reject invalid command-line arguments
Completing steps 1-6 are required to earn a grade of 'B' while completing only steps 1-4 will yield a maximum grade of 'C'. Note that these are the maximum grades you can earn. Inadequate error checking, failure to adhere to the coding standards, or deviations from the submit procedures will result in grade reductions as described in the project guide.
Failure to submit code that compiles on stu.cs.jmu.edu will result in an automatic grade of 0. In particular, you SHOULD NOT modify the included header files; your code must compile without errors or warnings against the original headers.
Caution: No test suite is fully exhaustive, and the test suite distributed with this program is no exception. Be aware that we may test your code with new test cases created after the submission deadline. These test cases will not substantively change the project specification, but they may exercise your program more thoroughly than the current test suite does. You should treat the given test suite as providing your maximum possible base grade, and you should always anticipate how your program may fail to adhere to the project spec in ways that the initial test suite does not test.
Submission
Due: Fri, Oct 7 at 23:59:59 ET (midnight)
Please see the project guide for general project help and grading policies. Please refer to the coding standards for coding practice guidelines.