CS 261

Project 4: Mini-ELF interpreter

This project is based on a project originally written by Dr. Michael Kirkpatrick.

This project serves as an expansion of the Mini-ELF utility from the first three projects, and the goal is to reinforce assembly code and CPU architecture concepts, focusing on the execution of machine code instructions.

In the previous project, you retrieved Y86 instructions from a virtual memory image and decoded them to produce the assembly language code. In this project, you will simulate their execution by implementing an interpreter for the Y86 architecture.

For generic project instructions, refer to the project guide.

Here is the path to the starter tarball file on stu:

/cs/students/cs261/f16/src/p4-interp.tar.gz

The project distribution contains a compiled solution to projects 1-3. As for the other projects, we recommend using your solutions instead if they are complete.

TESTS UPDATED 11/16: The test suite is subject to change. Make sure you use the most recent version. If you want to download just the most recent tests, I will keep the following location on stu up-to-date so you can always pull the most recent tests from here:

/cs/students/cs261/f16/src/p4-tests

Command-line parsing

As in project 3, you will expand your program to recognize two new command-line options: "-e" to execute code and "-E" to execute code in debug/trace mode. All functionality present in the first three projects should still be present in this project.

Part 1: Y86 instruction cycle

All modern CPU architectures, including Y86, are considered to be examples of von Neumann architecture. This means that all code and data for a program get loaded into memory, and instructions are executed in a sequence of discrete substeps. In particular, executing a single Y86 instruction means performing the following steps in exactly this order:

Fetch - Retrieve the instruction's opcode by looking at the memory address specified in the program counter (PC) or instruction pointer (IP). Determine the type of instruction the opcode specifies. If the instruction encoding requires additional bytes of memory, retrieve these as well.
Decode - Retrieve values from the register file that will be used for arithmetic or logic calculations.
Execute - Perform any required arithmetic or logic calculation for this particular instruction. Note that this calculation may include determining the memory address to access.
Memory - For instructions that move data between the CPU registers and memory, transfer data as needed.
Write back - Update the values of registers as needed.
Update PC - Update the PC register to point to the next instruction for execution.

The disassembler project involved (partially) implementing the fetch and update PC stages in order to disassemble Y86 executable files. In this project, you will implement the other five stages. In particular, the update PC stage will be different for this project as you will need to support instructions for branching logic and function calls; consequently, the new PC value will be determined by the other stages of computation. However, the fetch stage will remain the same, and you should just call fetch() in this project rather than re-implementing it.

Your implementation must adhere to the formal semantics of the instructions as defined in the Y86-64 ISA reference and described in CS:APP section 4.3. It is crucial that you read and understand these semantics, because this project will require you to translate them into concrete C code. For convenience, we split the five CPU stages in this project into two routines: one for decode and execute, and one for memory, write back, and update PC. These two routines will communicate via several parameters as in the formal semantics, including cond, valA, and valE.

Note that we are using the Mini-ELF object format for these projects, and thus execution should begin at the entry point specified in the ELF header rather than at address zero (as in the textbook).

In the regular execute mode ("-e"), your interpreter should print the following as output, according the format demonstrated in the example below:

The address of the entry point.
Any write to memory, including the address and the new value.
The final state of the CPU, including all registers, condition flags, and the status code.

Here is a simple Y86 program:

.pos 0 code
    jmp _start

.pos 0x100 code
_start:
    irmovq _stack, %rsp
    call main
    halt

main:
    irmovq $0x2, %rax
    irmovq $0x3, %rcx
    addq %rcx, %rax
    ret

.pos 0xf00 stack
_stack:

Here is the expected interpreter output for the above program:

Beginning execution at 0x0100
Memory write to 0x0ef8: 0x113
Y86 CPU state:
  %rip: 0000000000000000   flags: Z0 S0 O0     HLT
  %rax: 0000000000000005    %rcx: 0000000000000003
  %rdx: 0000000000000000    %rbx: 0000000000000000
  %rsp: 0000000000000f00    %rbp: 0000000000000000
  %rsi: 0000000000000000    %rdi: 0000000000000000
Total execution count: 7

Observe that there has been a value written to memory (the return address that is pushed onto the stack when main is called) and several registers (rax, rcx, and rsp) have been changed to non-zero values. With respect to format, note that the memory addresses are only padded to four hex digits (with a "0x" prefix) while the register values are padded to their full width of 16 hex digits (without a prefix). Memory write values are not padded.

Here are some observations/clarifications regarding the Y86 semantics that you should implement in this project:

The values of registers change only if an instruction requires it. The values from previous instructions are kept unless overwritten.
An instruction can leave the CPU can be in one of 4 states: AOK (normal operation), HLT (halting), ADR (address exception), and INS (instruction exception). In all states except AOK and HLT, the PC should be set to -1 and you should stop processing after printing the CPU state. For HLT, the PC should be set to 0.
The Z (zero), S (sign), and O (overflow) bits should be set for any move (including stack push/pop) or arithmetic operation based on the result. Note that overflow can only happen with arithmetic, not moves. Halt will clear these bits (set them to 0). For other instructions (including conditional moves that don't actually happen), these bits should be untouched.

Part 2: Debug/tracing output

In addition to the regular execution mode that only prints information about the entry point, memory writes, and the final state, you must implement a "debug" or "trace" mode that prints the disassembly of every instruction that executes. This mode should also dump the full CPU state after every instruction. This mode is crucial to debugging Y86 programs.

Here is example output from the debug mode ("-E") when run on the simple program given above:

Beginning execution at 0x0100
Y86 CPU state:
  %rip: 0000000000000100   flags: Z0 S0 O0     AOK
  %rax: 0000000000000000    %rcx: 0000000000000000
  %rdx: 0000000000000000    %rbx: 0000000000000000
  %rsp: 0000000000000000    %rbp: 0000000000000000
  %rsi: 0000000000000000    %rdi: 0000000000000000

Executing: irmovq 0xf00, %rsp
Y86 CPU state:
  %rip: 000000000000010a   flags: Z0 S0 O0     AOK
  %rax: 0000000000000000    %rcx: 0000000000000000
  %rdx: 0000000000000000    %rbx: 0000000000000000
  %rsp: 0000000000000f00    %rbp: 0000000000000000
  %rsi: 0000000000000000    %rdi: 0000000000000000

Executing: call 0x114
Memory write to 0x0ef8: 0x113
Y86 CPU state:
  %rip: 0000000000000114   flags: Z0 S0 O0     AOK
  %rax: 0000000000000000    %rcx: 0000000000000000
  %rdx: 0000000000000000    %rbx: 0000000000000000
  %rsp: 0000000000000ef8    %rbp: 0000000000000000
  %rsi: 0000000000000000    %rdi: 0000000000000000

Executing: irmovq 0x2, %rax
Y86 CPU state:
  %rip: 000000000000011e   flags: Z0 S0 O0     AOK
  %rax: 0000000000000002    %rcx: 0000000000000000
  %rdx: 0000000000000000    %rbx: 0000000000000000
  %rsp: 0000000000000ef8    %rbp: 0000000000000000
  %rsi: 0000000000000000    %rdi: 0000000000000000

Executing: irmovq 0x3, %rcx
Y86 CPU state:
  %rip: 0000000000000128   flags: Z0 S0 O0     AOK
  %rax: 0000000000000002    %rcx: 0000000000000003
  %rdx: 0000000000000000    %rbx: 0000000000000000
  %rsp: 0000000000000ef8    %rbp: 0000000000000000
  %rsi: 0000000000000000    %rdi: 0000000000000000

Executing: add %rcx, %rax
Y86 CPU state:
  %rip: 000000000000012a   flags: Z0 S0 O0     AOK
  %rax: 0000000000000005    %rcx: 0000000000000003
  %rdx: 0000000000000000    %rbx: 0000000000000000
  %rsp: 0000000000000ef8    %rbp: 0000000000000000
  %rsi: 0000000000000000    %rdi: 0000000000000000

Executing: ret
Y86 CPU state:
  %rip: 0000000000000113   flags: Z0 S0 O0     AOK
  %rax: 0000000000000005    %rcx: 0000000000000003
  %rdx: 0000000000000000    %rbx: 0000000000000000
  %rsp: 0000000000000f00    %rbp: 0000000000000000
  %rsi: 0000000000000000    %rdi: 0000000000000000

Executing: halt
Y86 CPU state:
  %rip: 0000000000000000   flags: Z0 S0 O0     HLT
  %rax: 0000000000000005    %rcx: 0000000000000003
  %rdx: 0000000000000000    %rbx: 0000000000000000
  %rsp: 0000000000000f00    %rbp: 0000000000000000
  %rsi: 0000000000000000    %rdi: 0000000000000000
Total execution count: 7

Note that you should call disassemble() from the previous project to print the disassembly, rather than re-writing it for this project.

Requirements

Here are the required functions that you must implement in p4-interp.c. We will use unit tests to exercise this portion of your submission.

bool parse_command_line_p4 (int argc, char **argv, bool *header, bool *segments, bool *membrief, bool *memfull, bool *disas_code, bool *disas_data, bool *exec_normal, bool *exec_debug, char **file);

As in Projects 1-3, set the booleans pointed to by the parameters as appropriate based on the command-line parameters. The two new parameters exec_normaland exec_debug correspond to the "-e" and "-E" flags, respectively.
void cpu_state (y86_t cpu);

Print out the contents of the CPU according the format described above.
y86_register_t decode_execute (y86_t *cpu, bool *cond, y86_inst_t inst, y86_register_t *valA);

Perform the decode and execute stages of the instruction cycle. Depending on the instruction, the cpu parameter will have its ZF and SF flags updated (if known), along with the CPU status (in case of error). The cond variable will be updated for conditional moves and jumps. The valA will be updated as specified in the semantics, and the return value is the valE value as specified in the semantics.
void memory_wb_pc (y86_t *cpu, memory_t memory, bool cond, y86_inst_t inst, y86_register_t valE, y86_register_t valA);

Perform the memory, write-back, and update PC stages. The CPU registers or memory could be modified depending on the instruction executed. The CPU PC must be updated accordingly.

In addition, you must implement the main six-stage von Neumann cycle as a loop in main() such that your program behaves as described above. Your program should start at the entry point and continue execution until the CPU status code is not AOK.

You are free (and encouraged!) to define multiple helper functions, as well. While you are free to call them anything you want, good coding practices recommend you provide meaningful names and clear purposes. Regardless, the functions above must execute correctly according to specification.

Error checking

Robust software proactively checks for potential errors and responds appropriately when an error occurs. Failure to build robust software leads to security breaches, lost sales, and other problems; this failure is not acceptable. Our grading procedures will try to break your code. The following list is a sample (not complete) of the types of errors we may test:

Passing NULL values in pointer parameters.
Passing a file that contains invalid opcodes.
Passing a file with addresses that would extend beyond the maximum MEMSIZE value.
Passing a file with program headers that have invalid flags or types.
Passing invalid instructions to the required functions.

In the case of invalid parameters or an invalid instruction, decode_execute should set the CPU status code to INS and the PC to -1. In the case of an invalid memory address, memory_wb_pc should set the CPU status code to ADRand the PC to -1.

The above list is not necessarily exhaustive, and you should think carefully about what sort of errors might occur so that you can prevent them or perform additional error checking as needed. In particular, we will also use valgrind to detect memory leaks. Failure to respond appropriately will result in grade reductions.

Hints

Work incrementally. As always, start small and gradually add more functionality. First, implement the handling for HLT and NOP instructions, which are trivial. Then work your way up to all C-level instructions, and then to B-level, and finally the full instruction set.
Make your own test cases by writing Y86 programs. This way you completely understand both the program and its expected behavior. Use test-driven development: never write code for which you do not have a test to determine if the code is working. Check your results against the canonical solution as well as the textbook's CPU simulator, both of which are provided in the distribution folder on stu.
Make helper functions. As with the last project, much of your code will look similar (large switch statements with similar bodies). You will make development and debugging significantly easier if you can write a few good helper methods (e.g., for accessing registers and memory).
For hints on condition code handling, see CS:APP section 3.6.1-3.6.3 and Figure 3.15. Addition and subtraction are signed in Y86, and the bitwise operations do not overflow. A subtraction overflows if the subtrahend is positive and the result is greater than the minuend, or if the subtrahend is negative and the result is less than the minuend.

Grading

The following requirements are necessary to earn a grade of 'A' for this project:

Correctly execute halt, nop, irmovq, and OPq instructions
Correctly execute rmmovq, mrmovq, pushq, and popq instructions
Correctly execute cmovXX, jXX, call, and ret instructions
Accept all valid Y86 Mini-ELF files
Reject all invalid Y86 Mini-ELF files

Completing steps 1-2 are required to earn a grade of 'B' while completing only step 1 will yield a maximum grade of 'C'. You may receive a 'D' if you do not pass all 'C' level tests but there is some evidence of a good-faith effort. Note that these are the maximum grades you can earn. Inadequate error checking, failure to adhere to the coding standards, or deviations from the submit procedures will result in grade reductions as described in the project guide.

Failure to submit code that compiles on stu.cs.jmu.edu will result in an automatic grade of 0. In particular, you SHOULD NOT modify the included header files; your code must compile without errors or warnings against the original headers.

Caution: No test suite is fully exhaustive, and the test suite distributed with this program is no exception. Be aware that we may test your code with new test cases created after the submission deadline. These test cases will not substantively change the project specification, but they may exercise your program more thoroughly than the current test suite does. You should treat the given test suite as providing your maximum possible base grade, and you should always anticipate how your program may fail to adhere to the project spec in ways that the initial test suite does not test.

Submission

Due: Fri, Nov 18 at 23:59:59 ET (midnight) (Soft deadline: Fri, Nov 11 for C and B level tests)

Please see the project guide for general project help and grading policies. Please refer to the coding standards for coding practice guidelines.

Computer Systems I - Fall 2016