CS 261

Project 4: Y86 interpreter

This project is based on a project originally written by Dr. Michael Kirkpatrick.

This project serves as an expansion of the Mini-ELF utility from the first three projects, and the goal is to reinforce assembly code and CPU architecture concepts, focusing on the execution of machine code instructions.

In the previous project, you retrieved Y86 instructions from a virtual memory image and decoded them to produce the assembly language code. In this project, you will simulate their execution by implementing an interpreter for the Y86 architecture.

For generic project instructions, refer to the project guide.

Here is the path to the starter tarball file on stu:

/cs/students/cs261/f22/p4-interp.tar.gz

The project distribution contains a compiled solution to projects 1-3. As for project 3, we recommend using your solutions instead if they are complete, but again we recommend testing without them before you submit just to make sure that you haven't made any incorrect assumptions that were not exposed by the earlier test suites.

Y86 instruction cycle

All modern CPU architectures, including Y86, are considered to be examples of von Neumann architecture. This means that all code and data for a program get loaded into memory, and instructions are executed in a sequence of discrete substeps. In particular, executing a single Y86 instruction means performing the following steps in exactly this order:

Fetch - Retrieve the instruction's opcode by looking at the memory address specified in the program counter (PC) or instruction pointer (IP). Determine the type of instruction the opcode specifies. If the instruction encoding requires additional bytes of memory, retrieve these as well.
Decode - Retrieve values from the register file that will be used for arithmetic or logic calculations.
Execute - Perform any required arithmetic or logic calculation for this particular instruction. Note that this calculation may include determining the memory address to access.
Memory - For instructions that move data between the CPU registers and memory, transfer data as needed.
Write back - Update the values of registers as needed.
Update PC - Update the PC register to point to the next instruction for execution.

The disassembler project involved (partially) implementing the fetch and update PC stages in order to disassemble Y86 executable files. In this project, you will implement the other five stages. In particular, the update PC stage will be different for this project as you will need to support instructions for branching logic and function calls; consequently, the new PC value will be determined by the other stages of computation. However, the fetch stage will remain the same, and you should just call fetch() in this project rather than re-implementing it.

Your implementation must adhere to the formal semantics of the instructions as defined in the Y86-64 ISA reference and described in CS:APP section 4.3. It is crucial that you read and understand these semantics, because this project will require you to translate them into concrete C code. For convenience, we split the five CPU stages in this project into two routines: one for decode and execute, and one for memory, write back, and update PC. These two routines will communicate via several parameters as in the formal semantics, including cnd, valA, and valE.

Note that we are using the Mini-ELF object format for these projects, and thus execution should begin at the entry point specified in the ELF header rather than at address zero (as in the textbook).

Here is an overview diagram from the slides:

Unit requirements

Here are the required functions that you must implement in p4-interp.c. We will use unit tests to exercise this portion of your submission.

y86_reg_t decode_execute (y86_t *cpu, y86_inst_t inst, bool *cnd, y86_reg_t *valA);

Perform the decode and execute stages of the instruction cycle. Depending on the instruction, the function may update the CPU status in case of error. The cnd variable will be updated for conditional moves and jumps. The valA will be updated as specified in the semantics, and the return value is the valE value as specified in the semantics.
void memory_wb_pc (y86_t *cpu, y86_inst_t inst, byte_t *memory, bool cnd, y86_reg_t valA, y86_reg_t valE);

Perform the memory, write-back, and update PC stages. The CPU registers or memory could be modified depending on the instruction executed. The CPU PC must be updated accordingly.

You are free (and encouraged!) to define multiple helper functions as well. While you are free to call them anything you want, good coding practices recommend you provide meaningful names and clear purposes. Regardless, the functions above must execute correctly according to specification.

WARNING: You should NOT print anything to standard output in decode_execute, and the only printing from memory_wb_pc should be to implement iotrap functionality).

Recommended functions

We recommend writing the following functions as helpers for your main routine, but they are not required and thus are not covered by unit tests. You may add these to either main.c or p4-interp.c, but we recommend the latter.

bool parse_command_line_p4 (int argc, char **argv, bool *header, bool *segments, bool *membrief, bool *memfull, bool *disas_code, bool *disas_data, bool *exec_normal, bool *exec_trace, char **filename);

As in Projects 1-3, set the booleans pointed to by the parameters as appropriate based on the command-line parameters. The two new parameters exec_normaland exec_trace correspond to the "-e" and "-E" flags respectively, which should be mutually exclusive in the same way that "-m" and "-M" were in P2.
void dump_cpu_state (y86_t cpu);

Print out the contents of the CPU according the format described below.

Integration requirements

As in project 3, you will expand your program to recognize two new command-line options: "-e" to execute code and "-E" to execute code in trace mode. These options should be mutually-exclusive; i.e., if both "-e" and "-E" are passed, you should just print the usage information and exit.

All functionality present in the first three projects should still be present in this project. In addition, you must implement the main six-stage von Neumann cycle as a loop in main() such that your program behaves as described above. Your program should start at the entry point and continue execution until the CPU status code is not AOK.

In the regular execute mode ("-e"), your interpreter should print the following as output, according the format demonstrated in the example below:

The address of the entry point.
The final state of the CPU, including all registers in unsigned hexadecimal, condition flags, and the status code.
The total number of instructions that were executed.

Here is a simple Y86 program:

.pos 0 code
    jmp _start

.pos 0x100 code
_start:
    irmovq _stack, %rsp
    call main
    halt

main:
    irmovq $0x2, %rax
    irmovq $0x3, %rcx
    addq %rcx, %rax
    ret

.pos 0xf00 stack
_stack:

Here is the expected interpreter output for the above program:

Beginning execution at 0x0100
Y86 CPU state:
  %rip: 0000000000000000   flags: Z0 S0 O0     HLT
  %rax: 0000000000000005    %rcx: 0000000000000003
  %rdx: 0000000000000000    %rbx: 0000000000000000
  %rsp: 0000000000000f00    %rbp: 0000000000000000
  %rsi: 0000000000000000    %rdi: 0000000000000000
   %r8: 0000000000000000     %r9: 0000000000000000
  %r10: 0000000000000000    %r11: 0000000000000000
  %r12: 0000000000000000    %r13: 0000000000000000
  %r14: 0000000000000000
Total execution count: 7

Observe that there has been a value written to memory (the return address that is pushed onto the stack when main is called) and several registers (rax, rcx, and rsp) have been changed to non-zero values. With respect to format, note that the memory addresses are only padded to four hex digits (with a "0x" prefix) while the register values are padded to their full width of 16 hex digits (without a prefix).

Here are some observations/clarifications regarding the Y86 semantics that you should implement in this project:

The values of registers should be initialized to zero and change only if an instruction requires it. The values from previous instructions are kept unless overwritten.
An instruction can leave the CPU can be in one of 4 states: AOK (normal operation), HLT (halting), ADR (address exception), and INS (instruction exception).
The Z (zero), S (sign), and O (overflow) bits should be set for any OPq operation based on the result. Note that overflow can only happen with addition or subtraction (and further note that this means that those two operations interpret their operands as signed 2's complement integers; otherwise it would be a carry flag). Halt will clear these bits (set them to 0). For other instructions (including conditional moves that don't actually happen), these bits should be untouched.
The iotrap instruction does not have formal semantics. See the "Input and Output" section of the specification below for details about how this instruction should operate.

In addition to the regular execution mode that only prints information about the entry point, memory writes, and the final state, you must implement a "trace" mode that prints the disassembly (reuse the disassemble function from P3!) of every instruction that executes. This mode should also dump the full CPU state after every instruction, as well as the entire memory address space (in the same format as "-M" in P2; reuse the dump_memory function!) after the program has finished executing. This mode provides more information than the regular mode, and this helps considerably while debugging Y86 programs.

Here is example output from the trace mode ("-E") when run on the simple program given above:

Beginning execution at 0x0100
Y86 CPU state:
  %rip: 0000000000000100   flags: Z0 S0 O0     AOK
  %rax: 0000000000000000    %rcx: 0000000000000000
  %rdx: 0000000000000000    %rbx: 0000000000000000
  %rsp: 0000000000000000    %rbp: 0000000000000000
  %rsi: 0000000000000000    %rdi: 0000000000000000
   %r8: 0000000000000000     %r9: 0000000000000000
  %r10: 0000000000000000    %r11: 0000000000000000
  %r12: 0000000000000000    %r13: 0000000000000000
  %r14: 0000000000000000

Executing: irmovq 0xf00, %rsp
Y86 CPU state:
  %rip: 000000000000010a   flags: Z0 S0 O0     AOK
  %rax: 0000000000000000    %rcx: 0000000000000000
  %rdx: 0000000000000000    %rbx: 0000000000000000
  %rsp: 0000000000000f00    %rbp: 0000000000000000
  %rsi: 0000000000000000    %rdi: 0000000000000000
   %r8: 0000000000000000     %r9: 0000000000000000
  %r10: 0000000000000000    %r11: 0000000000000000
  %r12: 0000000000000000    %r13: 0000000000000000
  %r14: 0000000000000000

Executing: call 0x114
Y86 CPU state:
  %rip: 0000000000000114   flags: Z0 S0 O0     AOK
  %rax: 0000000000000000    %rcx: 0000000000000000
  %rdx: 0000000000000000    %rbx: 0000000000000000
  %rsp: 0000000000000ef8    %rbp: 0000000000000000
  %rsi: 0000000000000000    %rdi: 0000000000000000
   %r8: 0000000000000000     %r9: 0000000000000000
  %r10: 0000000000000000    %r11: 0000000000000000
  %r12: 0000000000000000    %r13: 0000000000000000
  %r14: 0000000000000000

Executing: irmovq 0x2, %rax
Y86 CPU state:
  %rip: 000000000000011e   flags: Z0 S0 O0     AOK
  %rax: 0000000000000002    %rcx: 0000000000000000
  %rdx: 0000000000000000    %rbx: 0000000000000000
  %rsp: 0000000000000ef8    %rbp: 0000000000000000
  %rsi: 0000000000000000    %rdi: 0000000000000000
   %r8: 0000000000000000     %r9: 0000000000000000
  %r10: 0000000000000000    %r11: 0000000000000000
  %r12: 0000000000000000    %r13: 0000000000000000
  %r14: 0000000000000000

Executing: irmovq 0x3, %rcx
Y86 CPU state:
  %rip: 0000000000000128   flags: Z0 S0 O0     AOK
  %rax: 0000000000000002    %rcx: 0000000000000003
  %rdx: 0000000000000000    %rbx: 0000000000000000
  %rsp: 0000000000000ef8    %rbp: 0000000000000000
  %rsi: 0000000000000000    %rdi: 0000000000000000
   %r8: 0000000000000000     %r9: 0000000000000000
  %r10: 0000000000000000    %r11: 0000000000000000
  %r12: 0000000000000000    %r13: 0000000000000000
  %r14: 0000000000000000

Executing: addq %rcx, %rax
Y86 CPU state:
  %rip: 000000000000012a   flags: Z0 S0 O0     AOK
  %rax: 0000000000000005    %rcx: 0000000000000003
  %rdx: 0000000000000000    %rbx: 0000000000000000
  %rsp: 0000000000000ef8    %rbp: 0000000000000000
  %rsi: 0000000000000000    %rdi: 0000000000000000
   %r8: 0000000000000000     %r9: 0000000000000000
  %r10: 0000000000000000    %r11: 0000000000000000
  %r12: 0000000000000000    %r13: 0000000000000000
  %r14: 0000000000000000

Executing: ret
Y86 CPU state:
  %rip: 0000000000000113   flags: Z0 S0 O0     AOK
  %rax: 0000000000000005    %rcx: 0000000000000003
  %rdx: 0000000000000000    %rbx: 0000000000000000
  %rsp: 0000000000000f00    %rbp: 0000000000000000
  %rsi: 0000000000000000    %rdi: 0000000000000000
   %r8: 0000000000000000     %r9: 0000000000000000
  %r10: 0000000000000000    %r11: 0000000000000000
  %r12: 0000000000000000    %r13: 0000000000000000
  %r14: 0000000000000000

Executing: halt
Y86 CPU state:
  %rip: 0000000000000000   flags: Z0 S0 O0     HLT
  %rax: 0000000000000005    %rcx: 0000000000000003
  %rdx: 0000000000000000    %rbx: 0000000000000000
  %rsp: 0000000000000f00    %rbp: 0000000000000000
  %rsi: 0000000000000000    %rdi: 0000000000000000
   %r8: 0000000000000000     %r9: 0000000000000000
  %r10: 0000000000000000    %r11: 0000000000000000
  %r12: 0000000000000000    %r13: 0000000000000000
  %r14: 0000000000000000
Total execution count: 7

Contents of memory from 0000 to 1000:
  0000  70 00 01 00 00 00 00 00  00 00 00 00 00 00 00 00
... [lines omitted for brevity]
  0100  30 f4 00 0f 00 00 00 00  00 00 80 14 01 00 00 00
  0110  00 00 00 00 30 f0 02 00  00 00 00 00 00 00 30 f1
  0120  03 00 00 00 00 00 00 00  60 10 90 00 00 00 00 00
... [lines omitted for brevity]
  0ef0  00 00 00 00 00 00 00 00  13 01 00 00 00 00 00 00
... [lines omitted for brevity]
  0ff0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00

Input and Output

To provide interactivity, we have added the iotrap instruction to Y86. This instruction is not covered in your textbook so you should read this section carefully, and you should implement this functionality last after everything else. This instruction should implement basic buffered text I/O stream operations using a syscall-like trap interface.

Every iotrap includes an encoded trap ID. Like system calls, the semantics of the instruction depend on the ID:

0 - Write a single character from memory to the output buffer.
1 - Read a single character from standard input into memory.
2 - Write a single 64-bit integer in decimal (as text) from memory to the output buffer.
3 - Read a single 64-bit integer in decimal (as text) from standard input into memory.
4 - Write a null-terminated character string from memory to the output buffer.
5 - Flush (copy) the output buffer to standard output and clear the buffer.

All of the output traps (0, 2, and 4) use a source address stored in %rsi and all of the input traps (1 and 3) use a destination address stored in %rdi.

Normally, a trap would result in a transfer to kernel code implementing the low-level I/O operations. However, because we are not emulating kernel mode or kernel memory in our Y86 interpreter, you should just implement these functions using the printf and scanf functions in the C standard library. Because most of these traps read or write memory, all of this should happen during the Memory stage of the Y86 instruction cycle.

You will also need to maintain an output buffer that you can store using a character array (snprintf will be useful here). The buffer should be large enough to store 100 characters.

If any trap fails (e.g., if the decin trap does not read a valid integer), you should print "I/O Error" and halt execution, setting the CPU status to HLT.

Error checking

Robust software proactively checks for potential errors and responds appropriately when an error occurs. Failure to build robust software leads to security breaches, lost sales, and other problems; this failure is not acceptable. Our grading procedures will try to break your code. The following list is a sample (not complete) of the types of errors we may test:

Passing NULL values in pointer parameters.
Passing a file that contains invalid opcodes.
Passing a file with addresses that would extend beyond the maximum MEMSIZE value.
Passing a file with program headers that have invalid flags or types.
Passing invalid instructions to the required functions.

In the case of invalid parameters or an invalid instruction, decode_execute should set the CPU status code to INS. In the case of an invalid memory address, memory_wb_pc should set the CPU status code to ADR.

Additionally, your integration code should not even attempt to call decode_execute or memory_wb_pc if fetch sets the CPU status to ADR or INS, but should immediately exit; in other words, the invalid instruction should not be included in the final execution count. If trace mode is enabled, the program should print the error message "Invalid instruction at 0xYYYY" (where YYYY is the address of the invalid instruction) instead of the regular "Executing: " message.

The above list is not necessarily exhaustive, and you should think carefully about what sort of errors might occur so that you can prevent them or perform additional error checking as needed. In particular, we will also use valgrind to detect memory leaks. Failure to respond appropriately will result in grade reductions.

In P3 and P4, some tests are non-deterministic, meaning that they use random numbers to prevent hard-coding of solutions. You can see some examples by searching for calls to the rand() function in public.c. This means that it is even more important than ever to ensure that your solution is as general as possible.

Hints

After you have read the above description thoroughly, watch the P4 experimentation video for a less structured and more informal walkthrough of the "big picture" of this project before you dive into the details of your implementation. Note that this video was recorded in 2020 and that the PC update phase for the HALT instruction is different this year (there's no special case, it's just the same valP update as most other instructions).

Work incrementally. As always, start small and gradually add more functionality. First, implement the handling for HALT and NOP instructions, which are trivial. Then work your way up to C-level instructions, and then to B-level, and finally the full instruction set.
Make your own test cases by writing Y86 programs. This way you completely understand both the program and its expected behavior. Use test-driven development: never write code for which you do not have a test to determine if the code is working. Check your results against the canonical solution as well as the textbook's CPU simulator, both of which are provided in the /cs/students/cs261/y86 folder on stu.
Make helper functions. As with the last project, much of your code will look similar (large switch statements with similar bodies). You will make development and debugging significantly easier if you can write a few good helper methods (e.g., for accessing registers and memory).
For hints on condition code handling, see CS:APP section 3.6.1-3.6.3 and Figure 3.15. Addition and subtraction are signed in Y86, and the bitwise operations (AND and XOR) do not overflow. A subtraction overflows if the subtrahend is positive and the result is greater than the minuend, or if the subtrahend is negative and the result is less than the minuend.
Don't forget that the data movement instructions with memory operands support the use of the NOREG register ID to simulate an absolute addressing mode.

Grading

The following requirements are necessary to earn a grade of 'A' for this project:

Correctly execute halt and nop instructions (simple one-byte instructions) with appropriate error handling
Correctly execute irmovq and OPq instructions (computation w/ registers) with appropriate error handling (but ignoring condition codes)
Correctly execute rmmovq, mrmovq, pushq, popq, call, and ret instructions (memory and stack operations) with appropriate error handling
Correctly execute OPq, cmovXX, jXX, and iotrap instructions (condition codes and I/O) with appropriate error handling (note that there are very few unit tests for this level because of the complexity of encoding correct semantics; most functionality is exercised via integration tests)

Completing steps 1-3 are required to earn a grade of 'B', completing steps 1-2 are required to earn a grade of 'C', and only completing step 1 will yield a maximum grade of 'D'. You may receive a grade of 'A-' for completing all of steps 1-4 with the exception of correct iotrap instruction handling (as exercised by the A_iotrap_* integration tests). Note that these are the maximum grades you can earn. Inadequate error checking, failure to adhere to the coding standards, or deviations from the submit procedures will result in grade reductions as described in the project guide. Because this project requires more complex code than the previous projects, it is particularly important to include sufficient documentation.

Failure to submit code that compiles on stu.cs.jmu.edu may result in an automatic grade of 0. In particular, you SHOULD NOT modify the included header files; your code must compile without errors or warnings against the original headers. Also, your code must not depend on being compiled with your previous solutions to P1-P3. If you wish to use helpers that you wrote for those projects, please copy them into p4-interp.c.

Caution: No test suite is fully exhaustive, and the test suite distributed with this program is no exception. Be aware that we may test your code with new test cases created after the submission deadline. These test cases will not substantively change the project specification, but they may exercise your program more thoroughly than the current test suite does. You should treat the given test suite as providing your maximum possible base grade, and you should always anticipate how your program may fail to adhere to the project spec in ways that the initial test suite does not test.

Submission

Please see the project guide for general project help and grading policies. Please refer to the coding standards for coding practice guidelines.

Computer Systems I - Fall 2022