Pointers in C

In this lab, you will gain experience with pointer declaration and manipulation in C.

Objectives:

  • Define a "pointer" as used in C and explain dereferencing
  • Write code to declare and use pointers
  • Distinguish between data and pointers in existing code
  • Use the malloc and free functions to manage heap memory
  • Understand the link between arrays and pointers
  • Use pointers to allocate and access arrays on the heap
  • Use pointers to allocate and access structs on the heap

Setup

Begin this lab with an empty main.c and the standard makefile for your operating system. You will want to add the following includes at the top of your file:

#include <stdbool.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

Of course, you will also want a basic definition of the main function. You should follow along with the code examples in this lab, adding them to your main function and testing them.

Review

As discussed in the first lecture, computer memory consists of a large bank of bytes. Every byte has a unique address. Until now, we have only been referring to a location in memory by a variable name:

    int x;
    x = 5;

The first line allocates space for an integer (usually 4 or 8 bytes), and gives it a name: "x". In the second line, we store the value 5 in the memory location referenced by the name x. The association between x and the location never changes; i.e., x cannot be changed to reference a different location in memory.

You can see the actual address of the variable x using the "address-of" operator (&):

    printf("Variable x is located at %p\n", &x);

This uses the %p format specifier, which prints addresses in hexadecimal. The above code should print a slightly different address each time you run it, because the variable x will be assigned to different places in memory during each execution.

Pointers

We are now going to introduce a major new C concept: pointers. A pointer is a variable that holds a memory address. Pointers are declared using the asterisk (*) operator:

    int *p;

In the above example, the type of the variable p is int* (usually pronounced "int star" or "int pointer"). Like other variables, it is a fixed reference to an allocated place in memory. Unlike other variables, that memory location itself holds the address of another place in memory. That is what makes it a "pointer;" it "points" to another location.

Until it is initialized, however, it doesn't point to anything in particular. To avoid a lot of headaches later on, pointers should always be intialized when they are declared. If you don't know what it should point to immediately, you can always initialize it to a special value (zero) using the NULL constant:

    int *p = NULL;

Question: How would we initialize p so that it points to x?

WARNING: If you wish to declare multiple pointers on the same line, you must include an asterisk with each of them. For instance:

    int *p = NULL, *q = NULL;

In this course, we strongly recommend against declaring multiple pointers on a single line.

To assign pointers so that they point to existing variables, you can use the address-of operator (&) just as we used it earlier to print the address:

    p = &x;

After this line, p is now a pointer to the location that stores the value associated with the variable x. Note that x is NOT a pointer; it is just a variable name.

You can now print the value of p (the pointer) and verify that it is the same as the address of x:

    printf("Pointer value of p is %p\n", p);

Dereferencing pointers

To retrieve the value that a pointer points to, use the "dereference" operator (*). Yes, it's an asterisk, the same as we used to declare the pointer.

In the running example, we know that p is an int* (an "int-pointer" or a "pointer to an int"). We have also initialized it to point to the location of variable x. Thus, we can print the integer that p points to using the following code:

    printf("Dereferenced value of p is %d, and x = %d\n", *p, x);

Question: What would happen if we passed p instead of *p to printf?

Question: What would happen if we dereferenced a pointer that had the value of NULL?

Exercise: Tracing

What does the following code print? Step through it as if you were the machine. This is sometimes referred to as "tracing" the code. You may find it helpful to draw diagrams with boxes for memory locations and arrows for pointers.

    int a = 42;
    int b = 7;
    int c = 999;
    int *t = &a;
    int *u = NULL;
    printf("%d %d\n", a, *t);

    c = b;
    u = t;
    printf("%d %d\n", c, *u);

    a = 8;
    b = 8;
    printf("%d %d %d %d\n", b, c, *t, *u);

    *t = 123;
    printf("%d %d %d %d %d\n", a, b, c, *t, *u);

DO THIS: Copy the previous code into your file and test it. Then, add more lines of code to do the following:

  1. Update t to point to c. Use a pointer dereference to change the value of c to 555. Verify that it worked by adding a printout. Does this change any of the other values?

  2. Change the value of c again using a direct assignment. Verify that the pointer t still points to the value by printing the result of dereferencing it.

Question: Would happen if you tried to execute the following code? How could you fix it?

    int *v = &t;
    printf("%d\n", *v);

This illustrates an important concept: pointers can point to almost anything, even other pointers!

Aside: The sizeof operator

There is a useful operator in C called the sizeof operator. You can use it to determine the size of a particular variable in memory. You can also use it to determine the size of a type.

DO THIS: Run the following code and take note of all the sizes.

    printf("sizeof(char): %lu\n",   sizeof(char));
    printf("sizeof(42): %lu\n",     sizeof(42));
    printf("sizeof(float): %lu\n",  sizeof(float));
    printf("sizeof(double): %lu\n", sizeof(double));
    printf("sizeof(NULL): %lu\n",   sizeof(NULL));

Note that we must put parentheses around the value when using the sizeof operator.

Note also that we are using the %lu format specifier, which denotes a "long unsigned integer" (i.e., a size_t value).

Question: What is sizeof(int)? What is sizeof(3.14)?

You can also use sizeof to detect the size of static arrays:

    int d[3] = { 1, 2, 3 };
    printf("sizeof(d) = %lu\n", sizeof(d));

Allocating memory

Now that we can track arbitrary locations in memory using pointers, we can begin allocating memory on the heap (a large chunk of memory set aside for programs to use during execution). Heap memory is allocated in C using a call to the malloc function (from stdlib.h), which takes a single size_t parameter: the number of bytes requested. It is conventional to use the sizeof operator to help request the correct amount of memory:

    int *r = (int*)malloc(sizeof(int));

The malloc function returns a pointer to the allocated memory. Technically, its return type is void*, a void pointer. This is why you have to cast the result to the type of pointer you need. Such casts are safe because all pointers are guaranteed to be the same width.

Note that malloc could potentially return NULL if it does not succeed (if the system is out of memory). Therefore, you should always check the return value to be safe:

    if (r == NULL) {
        printf("Out of memory!\n");
        exit(EXIT_FAILURE);
    }

The exit function immediately aborts the program with the provided error code.

IMPORTANT: Unlike Java, C does not automatically de-allocate memory when it is no longer needed. Thus, you will need to manually de-allocate memory when you are done, using the free function:

    free(r)

If your program allocates memory and does not free it, your program has a memory leak. The leaked memory will be reclaimed when your program exits, but until then it will continue to clog up the system.

In this course, a memory leak is considered to be a software defect. You should thoroughly debug your program to find and fix all memory leaks. You may use tools like Valgrind/Memcheck to help you find memory leaks. Here is a short tutorial.

Question: What happens if you try to dereference a pointer after you free it?

It is conventional to set set pointers to NULL immediately after you free them. This will trigger a segmentation fault at the offending location if you ever attempt to dereference it, making it easier to diagnose and fix software defects.

Arrays and pointers

Here's a little secret: you've actually already been using pointers. In C, array variable names are actually just pointers to the first element. In fact, you can find out the address of an array using the same syntax as above:

    int d[3] = { 1, 2, 3 };
    printf("The array starts at %p\n", d);

You can also use pointer arithmetic to access array elements:

    printf("The first element is %d\n", *d);
    printf("The second element is %d\n", *(d+1));
    printf("The third element is %d\n", *(d+2));

This allows you to use pointers explicitly to iterate over arrays. The following two loops are equivalent:

    for (int i=0; i<3; i++) {
        printf("Element: %d\n", d[i]);
    }

    for (int *w = d; w < (d+3); w++) {
        printf("Element: %d\n", *w);
    }

In fact, the syntax you're already familiar with for accessing array elements (i.e., the square brackets used in the first loop above) actually performs pointer dereferencing and arithmetic "under the hood."

IMPORTANT: As mentioned before, there is no automatic bounds checking in C, so you must be careful not to read outside the bounds of the array. Now you can see that this is because C uses pointers and pointer arithmetic "under the hood," and there is no sure-fire way to check whether a pointer is pointing to a location inside a particular array.

We can also allocate arrays on the heap, which is very useful when we don't know at compile time how big the array should be. For example:

    int *e = (int*)malloc(sizeof(int) * b);

This will allocate an array big enough to hold a number of elements equal to the value of b. It is conventional in such allocations to use the sizeof function to determine the number of bytes needed for each element and then to multiply by the size of the array.

This now allows us to write functions that return arrays; in fact, they just return a pointer to an array. Note that you should never return a reference to a static array that is local to a function, because the array will be de-allocated at the end of the function and the pointer will be invalid. You should only return pointers to arrays that you allocate on the heap. Here is a function that returns a newly-allocated array:

char* alloc_alphabet_array()
{
    char *alphabet;
    alphabet = (char*)malloc(sizeof(char) * 27);
    if (alphabet == NULL) {
        printf("Out of memory!\n");
    }
    for (char c = 'a'; c <= 'z'; c++) {
        alphabet[c-'a'] = c;
    }
    alphabet[26] = '\0';    // null terminator
    return alphabet;
}

Question: What is sizeof(alph) if alph is the return value of this function?

Recall that in C, strings are just null-terminated arrays of characters. Thus, in this case we could describe alphabet using any of the following terms:

  • char * ("char star")
  • "pointer to array of chars"
  • "pointer to a C string"

For return values that are pointers, some people put the asterisk (*) with the type:

    char* alloc_alphabet_array()

Others put the asterisk with the function name:

    char *alloc_alphabet_array()

In this course, either approach is fine as long as you are consistent.

IMPORTANT: Don't forget to free these arrays when you're done with them!

Structs and pointers

As you might suspect by now, you can also create pointers to structs. Assuming that point_t is defined as it was in a previous lab, you can allocate a point on the heap:

    point_t *pt = (point_t*)malloc(sizeof(point_t));

To access member variables, you have to dereference the pointer:

    (*pt).x = 2;
    (*pt).y = 3;
    printf("The point is (%d, %d)\n", (*pt).x, (*pt).y);

Because this is cumbersome, C also provides a more concise way to dereference a pointer and access a member variable: the arrow operator (->). The following code is exactly equivalent to the above code:

    pt->x = 2;
    pt->y = 3;
    printf("The point is (%d, %d)\n", pt->x, pt->y);

Of course, you can also use pointers to store references to arrays of structs:

    point_t *pts = (point_t*)malloc(sizeof(point_t) * 4);
    if (pts == NULL) {
        printf("Out of memory!\n");
        exit(EXIT_FAILURE);
    }

    for (int i=0; i<4; i++) {
        pts[i].x = i*10;
        pts[i].y = i*20;
    }

    for (int i=0; i<4; i++) {
        printf("Point: (%d, %d)\n", pts[i].x, pts[i].y);
    }

Question: Is the type of pt different from the type of pts? Why or why not?

Question: Why didn't we need to use the -> operator to access the x and y member variables in the loops above?

We can also pass structs into functions using pointers, and we can return pointers to structs from functions:

point_t* midpoint(point_t *p1, point_t *p2)
{
    point_t* mid = (point_t*)malloc(sizeof(point_t));
    if (mid == NULL) {
        printf("Out of memory!\n");
        exit(EXIT_FAILURE);
    }
    mid->x = (p1->x + p2->x)/2;
    mid->y = (p1->y + p2->y)/2;
    return mid;
}

Because midpoint allocates memory and returns it, the calling function must remember to free the allocated memory. Otherwise, there will be a memory leak.

Here is some code to test the midpoint function:

    point_t pt1 = { .x =  0, .y =  0 };
    point_t pt2 = { .x = 10, .y = 10 };

    point_t *midpt = midpoint(&pt1, &pt2);

    printf("Midpoint: (%d, %d)\n", midpt->x, midpt->y);

Question: Why did we need to pass &pt1 and &pt2 into midpoint instead of just pt1 or pt2?

For large structs, this is far faster than passing or returning the struct itself, because it is quicker to copy a pointer than the entire struct. However, they are not exactly equivalent, because passing a pointer does not create a copy.

IMPORTANT: If you pass a struct into a function using a pointer, any changes you make to the struct inside the function will persist AFTER the function completes. This is because you are actually modifying the memory values of the original struct.

Exercise: More tracing

What does the following code print? Step through it as if you were the machine. You may find it helpful to draw diagrams with boxes for memory location and arrows for pointers.

typedef struct {
    int *a;
    int b;
} stuff_t;

void foo(stuff_t value)
{
    *(value.a) = 2;
    value.b = 3;
}

void bar(stuff_t *value)
{
    *(value->a) = 4;
    value->b = 5;
}

void do_stuff()
{
    stuff_t my_stuff;
    int temp = 0;

    my_stuff.a = &temp;
    my_stuff.b = 1;
    printf("a=%d b=%d\n", *(my_stuff.a), my_stuff.b);

    foo(my_stuff);
    printf("a=%d b=%d\n", *(my_stuff.a), my_stuff.b);

    bar(&my_stuff);
    printf("a=%d b=%d\n", *(my_stuff.a), my_stuff.b);
}

Exercise: Plug the leaks

Review your code from this lab; does it leak any memory?

We purposefully did not include all necessary free calls to give you experience with memory management. You should walk through your code and/or test it with a tool like Valgrind/Memcheck to find and fix any memory leaks. Here is a short tutorial.

(Optional) More exercises

You may find the following exercises helpful to reinforce your understanding of pointers: