## CS 261 Fall 2022

Mike Lam, Professor

Q. Why do assembly programmers need to know how to swim?

A. Because they work below C level!



#### x86-64 Miscellaneous Topics

## Topics

- Homogeneous data structures
  - Arrays
  - Nested / multidimensional arrays
- Heterogeneous data structures
  - Structs / records
  - Unions
- Floating-point code

#### Arrays

- An array is simply a block of memory (bits)
  - Fixed-sized homogeneous elements of a particular type (context)
  - Contiguous layout
  - Fixed length (not stored as part of the array!)



#### Arrays and pointers

- Array name is essentially a pointer to first element (base)
  - The *i*th element is at address (base + size \* *i*)
- C pointer arithmetic uses intervals of the element width
  - No need to explicitly multiply by size in C
  - "stuff+0" or "stuff" is the address of the first element
  - "stuff+1" is the address of the second element
  - "stuff+2" is the address of the third element
- Indexing = pointer arithmetic plus dereferencing
  - "stuff[i]" means "\*(stuff + i)"
  - In assembly, use the scaled index addressing mode
    - (base, index, scale)  $\rightarrow$  e.g., (%rbx, %rdi, 4) for 32-bit elements

### Question

• Fill in the blank to correctly translate the following C code into x86-64:



### Question

• Fill in the blank to correctly translate the following C code into x86-64:



## Nested / multidimensional arrays

- Generalizes cleanly to multiple dimensions
  - Think of the elements of outer dimensions as being arrays of inner dimensions
  - "Row-major" order: outer dimension specified first
  - E.g., "int16\_t grid[4][3]" is a 4-element array of 3-element arrays of 16-bit integers
  - 2D: Address of (*i*,*j*)th element is (base + size(cols \* *i* + *j*))
  - 3D: Address of (i,j,k)th element is  $(base + size((n_{d1} * n_{d2}) * i + n_{d2} * j + k))$



#### **Structs**

- C structs are also just regions of memory
  - "Structured" heterogeneous regions--they're split into fields
  - Contiguous layout (w/ occasional gaps for alignment)
  - Offset of each field can be determined by the compiler
  - Sometimes called "records" generally



# Alignment

- Alignment restrictions require addresses be *n*-divisible
  - E.g., 4-byte alignment means all addresses must be divisible by 4
  - Specified using an assembler directive
  - Improves memory performance if the hardware matches
  - Can be avoided in C using "attribute (packed)" (as in elf.h)



## Union

• C unions are also just regions of memory

- Can store one "thing", but it could be multiple sizes depending on what kind of "thing" it currently is (so context is even more important!)
- All "fields" start at offset zero
- Generally a bad idea! (circumvents the type system in C)
- Can be used to do OOP in C (i.e., polymorphism)

```
typedef enum { CHAR, INT, FLOAT } objtype_t;
typedef struct {
    objtype_t type;
    union {
        char c;
        int i;
        float f;
    } data;
} obj_t;
```

#### Aside: Enums

- Enumerations are types where all values are listed
  - Declared in C using enum keyword
  - In C, the actual values are stored as integers
  - Can assign integer values if desired
  - Primary advantage: named constants

```
typedef enum {
    MON = 1, TUE, WED, THU, FRI, SAT, SUN
} day_t;
// essentially the same as: int midterm_day = 3;
day_t midterm_day = WED;
```

## **Floating-point code**

- x87: extension of x86 for floating-point arithmetic
  - Originally for the 8087 floating-point co-processor
  - Adds new floating-point "stack" registers ST(0) ST(7)
    - 80-bit extended double format (15 exponent and 63 significand bits)
  - Push/pop with FLD and FST instructions
  - Arithmetic: FADD, FMUL, FSQRT, etc.
  - Largely deprecated now in favor of new SIMD architectures

## **Floating-point code**

- Single-Instruction, Multiple-Data (SIMD)
  - Performs the same operation on multiple pairs of elements
  - Also known as vector instructions
- Various floating-point SIMD instruction sets
  - MMX, SSE, SSE2, SSE3, SSE4, SSE5, AVX, AVX2
  - 16 new extra-wide XMM (128-bit) or YMM (256-bit) registers for holding multiple elements
    - Floating-point arguments passed in %xmm0-%xmm7
    - Return value in %xmm0
    - All registers are caller-saved

## **Floating-point code**

- SSE (Streaming SIMD Extensions)
  - 128-bit XMM registers
    - Can store two 64-bit doubles or four 32-bit floats
  - New instructions for movement and arithmetic
    - General form: <op><s|p><s|d>
    - <s|p>: s=scalar (single data) p=packed (multiple data)
    - <*s*|*d*>: s=single (32-bit) d=double (64-bit)
    - E.g., "addsd" = add scalar 64-bit doubles
    - E.g., "mulps = multiply packed 32-bit floats
- AVX (Advanced Vector Extensions)
  - 256-bit YMM registers
    - Can store four 64-bit doubles or eight 32-bit floats
  - Similar instructions as SSE (but with "v" prefix, e.g., vmulps)

| %×mm0 |   |   |   |  |  |
|-------|---|---|---|--|--|
| (     | k | d |   |  |  |
| f     | f | f | f |  |  |

### SSE/AVX

#### Movement

- movss / movsd
- movaps / movapd

#### Conversion

- cvtsi2ss / cvtsi2sd
- cvtss2si / cvtsd2si
- cvtss2sd / cvtsd2ss

#### • Arithmetic

- addss / addsd
- addps / addpd
  - ... (sub, mul, div, max, min, sqrt)
- andps / andpd
- xorps / xorpd
- Comparison

- ucomiss / ucomisd

(AVX has "v\_\_\_\_" opcodes)

| 255    | 127    | 0                  |
|--------|--------|--------------------|
| %ymmO  | %xmm0  | 1st FP arg./Return |
| %ymm1  | %xmm1  | 2nd FP argument    |
| %ymm2  | %xmm2  | 3rd FP argument    |
| %ymm3  | %xmm3  | 4th FP argument    |
| %ymm4  | %xmm4  | 5th FP argument    |
| %ymm5  | %xmm5  | 6th FP argument    |
| %ymm6  | %xmm6  | 7th FP argument    |
| %ymm7  | %xmm7  | 8th FP argument    |
| %ymm8  | %xmm8  | Caller saved       |
| %ymm9  | %xmm9  | Caller saved       |
| %ymm10 | %xmm10 | Caller saved       |
| %ymm11 | %xmm11 | Caller saved       |
| %ymm12 | %xmm12 | Caller saved       |
| %ymm13 | %ymm13 | Caller saved       |
| %ymm14 | %xmm14 | Caller saved       |
| %ymm15 | %xmm15 | Caller saved       |

### Bitwise operations in SSE/AVX

- Assembly instructions provide low-level access to floating-point numbers
  - Some numeric operations can be done more efficiently with simple bitwise operations
- AKA: Floating-Point Hacks™
  - Set to zero (value XOR value)
  - Absolute value (value AND 0x7ffffff)
  - Additive inverse (value XOR 0x8000000)
- Lesson: Information = Bits + Context
  - (even if it wasn't the intended context!)

#### Preview: Y86-64 ISA

| Byte             | 0 |    | 1  |      | 2     | 3 | 4     | 5    | 6    | 7     | 8 | 9  |
|------------------|---|----|----|------|-------|---|-------|------|------|-------|---|----|
| halt             | 0 | 0  | ]  |      |       |   |       |      |      |       |   |    |
| nop              | 1 | 0  | ]  |      |       |   |       |      |      |       |   |    |
| rrmovq rA, rB    | 2 | 0  | rA | rB   |       |   |       |      |      |       |   |    |
| irmovą V, rB     | 3 | 0  | F  | rB   |       |   |       |      | ٧    |       |   |    |
| rmmovq rA, D(rB) | 4 | 0  | rA | rB   |       |   |       |      | D    |       |   |    |
| mrmovą D(rB), rA | 5 | 0  | rA | rB   |       |   |       |      | D    |       |   |    |
| OPq rA, rB       | 6 | fn | rA | rB   |       |   |       |      |      |       |   |    |
| jXX Dest         | 7 | fn |    |      |       |   | De    | st   |      |       |   |    |
| cmovXX rA, rB    | 2 | fn | rA | rB   |       |   |       |      |      |       |   |    |
| call Dest        | 8 | 0  |    |      |       |   | De    | st   |      |       |   |    |
| ret              | 9 | 0  |    |      |       |   |       |      |      |       |   |    |
| pushq rA         | A | 0  | rA | F    |       |   | Not i | n CS | S:AP | P:    |   |    |
| popq rA          | В | 0  | rA | F    |       |   | i0    | trap | id   |       | C | id |
| Operations       |   |    |    | Brai | nches |   |       |      | N    | loves |   |    |

jne 7

jg

jge 7 5

7 6

4

rrmovq 2

cmovl

cmove

cmovle 2 1

0

2 2

2 3

cmovne 2 4

cmovge 2 5

cmovg 2 6

jmp 7

jl

je

jle 7 1

7 2

7 3

0

addq 6 0

subq 6 1

andq 6 2

xorq 6 3

| Number | Register name |          |              |
|--------|---------------|----------|--------------|
| 0      | %rax          | 8        | %r8          |
| 1      | %rcx          | 9        | %r9          |
| 2      | %rdx          | 10       | %r10         |
| 3      | %rbx          | 11<br>12 | %r11<br>%r12 |
| 4      | %rsp          | 13       | %r13         |
| 5      | %rbp          | 14       | %r14         |
| 6      | %rsi          |          |              |
| 7      | %rdi          |          |              |

| Name | Meaning                         |
|------|---------------------------------|
| AOK  | Normal operation                |
| HLT  | halt instruction encountered    |
| ADR  | Invalid address encountered     |
| INS  | Invalid instruction encountered |
|      | AOK<br>HLT<br>ADR               |

#### **RF: Program registers**

|      | 27.0 |      |       |
|------|------|------|-------|
| %rax | %rsp | %r8  | %r12  |
| %rcx | %rbp | %r9  | %r13  |
| %rdx | %rsi | %r10 | %r14  |
| %rbx | %rdi | %r11 | 17150 |

| CC:<br>Condition<br>codes |    |    | Stat: Program status |
|---------------------------|----|----|----------------------|
| ZF                        | SF | OF | DMEM: Memory         |
|                           | PC |    |                      |
|                           |    |    |                      |