Lab 1: Machine Code

This lab is designed to reinforce core systems themes by introducing you to ELF machine code files on Linux.

  • THEME: Information = bits + context
  • THEME: Systems as foundational platforms
  • THEME: Use the right tool for the job

To begin this lab, you will need a terminal window. If your local machine is Linux, you can simply open a terminal and begin. If you are using OS X or Windows, I recommend connecting to stu via an SSH client first. Doing this is beyond the scope of the lab.


1. Assembly code is an intermediate step in compilation

Create a C program with the following functions. Compare mult() and mult2(). Note the return value from main.

    int mult (int x, int y)
    {
        return x * y;
    }

    int mult2 (int x, int y)
    {
        asm ("mov %esi, %eax");
        asm ("imull %edi, %eax");
    }

    int main ()
    {
        return mult(3, 4);
    }

Compile the program to assembly using the "-S" flag:

  $ gcc -Og -S asm.c

Examine asm.s and note the sequence of instructions in mult() function. As an aside, note that by convention in x86-64, parameters are passed using %edi and %esi. Also, the return value is passed in $eax. The following command will strip all the directive lines (beginning with a period) from the assembly file:

  $ grep -v -e "^\s*\..*" asm.s

Compile to machine code:

  $ gcc -Og -o asm asm.c

Run and examine return value:

  $ ./asm
  $ echo $?

2. Executable files contain encoded instructions

Dump contents of file:

  $ hexdump -C asm

Where did mult() and mult2() go? Can we find them? Let's start by disassembling the executable:

  $ objdump -d asm

Note contents of mult() function and how mult2() is nearly identical. Here is an example of what mult() may look like:

  00000000004004d6 <mult>:
    4004d6:	89 f8                	mov    %edi,%eax
    4004d8:	0f af c6             	imul   %esi,%eax
    4004db:	c3                   	retq

Yours may look slightly different. Now, examine the ELF header, program headers, and symbols:

  $ readelf -h asm
  $ readelf -l asm
  $ readelf -s asm

Note address and size of mult() function as well as the offsets of the main code segment (the one that is readable and executable). We can now calculate the offset in the file by subtracting the segment offset (0x400000) from the disassembled adress.

Now we can find the actual bytes in the file. Continuing our previous example:

  $ hexdump -C asm

  000004d0  5d e9 7a ff ff ff 89 f8  0f af c6 c3 89 f8 0f af  |].z.............|
                             |<--   mult()   -->|

Finally, try stepping through the program in gdb using “si” and “p” commands to watch the code work. You may want to enable the machine-code version of the TUI first using the "CTRL-X 2" combination.