Pointers in C
In this lab, you will gain experience with pointer declaration and manipulation in C.
Objectives:
- Define a "pointer" as used in C and explain dereferencing
- Write code to declare and use pointers
- Distinguish between data and pointers in existing code
- Use the
malloc
andfree
functions to manage heap memory - Understand the link between arrays and pointers
- Use pointers to allocate and access arrays on the heap
- Use pointers to allocate and access structs on the heap
Setup
Begin this lab with an empty main.c
and the standard makefile for your operating system. You will want to add the following includes at the top of your file:
#include <stdbool.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
Of course, you will also want a basic definition of the main
function. You should follow along with the code examples in this lab, adding them to your main
function and testing them.
Review
As discussed in the first lecture, computer memory consists of a large bank of bytes. Every byte has a unique address. Until now, we have only been referring to a location in memory by a variable name:
int x;
x = 5;
The first line allocates space for an integer (usually 4 or 8 bytes), and gives it a name: "x". In the second line, we store the value 5
in the memory location referenced by the name x
. The association between x
and the location never changes; i.e., x
cannot be changed to reference a different location in memory.
You can see the actual address of the variable x
using the "address-of" operator (&
):
printf("Variable x is located at %p\n", &x);
This uses the %p
format specifier, which prints addresses in hexadecimal. The above code should print a slightly different address each time you run it, because the variable x
will be assigned to different places in memory during each execution.
Pointers
We are now going to introduce a major new C concept: pointers. A pointer is a variable that holds a memory address. Pointers are declared using the asterisk (*
) operator:
int *p;
In the above example, the type of the variable p
is int*
(usually pronounced "int star" or "int pointer"). Like other variables, it is a fixed reference to an allocated place in memory. Unlike other variables, that memory location itself holds the address of another place in memory. That is what makes it a "pointer;" it "points" to another location.
Until it is initialized, however, it doesn't point to anything in particular. To avoid a lot of headaches later on, pointers should always be intialized when they are declared. If you don't know what it should point to immediately, you can always initialize it to a special value (zero) using the NULL
constant:
int *p = NULL;
Question: How would we initialize p
so that it points to x
?
WARNING: If you wish to declare multiple pointers on the same line, you must include an asterisk with each of them. For instance:
int *p = NULL, *q = NULL;
In this course, we strongly recommend against declaring multiple pointers on a single line.
To assign pointers so that they point to existing variables, you can use the address-of operator (&
) just as we used it earlier to print the address:
p = &x;
After this line, p
is now a pointer to the location that stores the value associated with the variable x
. Note that x
is NOT a pointer; it is just a variable name.
You can now print the value of p
(the pointer) and verify that it is the same as the address of x
:
printf("Pointer value of p is %p\n", p);
Dereferencing pointers
To retrieve the value that a pointer points to, use the "dereference" operator (*
). Yes, it's an asterisk, the same as we used to declare the pointer.
In the running example, we know that p
is an int*
(an "int-pointer" or a "pointer to an int"). We have also initialized it to point to the location of variable x
. Thus, we can print the integer that p
points to using the following code:
printf("Dereferenced value of p is %d, and x = %d\n", *p, x);
Question: What would happen if we passed p
instead of *p
to printf
?
Question: What would happen if we dereferenced a pointer that had the value of NULL
?
Exercise: Tracing
What does the following code print? Step through it as if you were the machine. This is sometimes referred to as "tracing" the code. You may find it helpful to draw diagrams with boxes for memory locations and arrows for pointers.
int a = 42;
int b = 7;
int c = 999;
int *t = &a;
int *u = NULL;
printf("%d %d\n", a, *t);
c = b;
u = t;
printf("%d %d\n", c, *u);
a = 8;
b = 8;
printf("%d %d %d %d\n", b, c, *t, *u);
*t = 123;
printf("%d %d %d %d %d\n", a, b, c, *t, *u);
DO THIS: Copy the previous code into your file and test it. Then, add more lines of code to do the following:
Update
t
to point toc
. Use a pointer dereference to change the value ofc
to555
. Verify that it worked by adding a printout. Does this change any of the other values?Change the value of
c
again using a direct assignment. Verify that the pointert
still points to the value by printing the result of dereferencing it.
Question: Would happen if you tried to execute the following code? How could you fix it?
int *v = &t;
printf("%d\n", *v);
This illustrates an important concept: pointers can point to almost anything, even other pointers!
Aside: The sizeof
operator
There is a useful operator in C called the sizeof
operator. You can use it to determine the size of a particular variable in memory. You can also use it to determine the size of a type.
DO THIS: Run the following code and take note of all the sizes.
printf("sizeof(char): %lu\n", sizeof(char));
printf("sizeof(42): %lu\n", sizeof(42));
printf("sizeof(float): %lu\n", sizeof(float));
printf("sizeof(double): %lu\n", sizeof(double));
printf("sizeof(NULL): %lu\n", sizeof(NULL));
Note that we must put parentheses around the value when using the sizeof
operator.
Note also that we are using the %lu
format specifier, which denotes a "long unsigned integer" (i.e., a size_t
value).
Question: What is sizeof(int)
? What is sizeof(3.14)
?
You can also use sizeof
to detect the size of static arrays:
int d[3] = { 1, 2, 3 };
printf("sizeof(d) = %lu\n", sizeof(d));
Allocating memory
Now that we can track arbitrary locations in memory using pointers, we can begin allocating memory on the heap (a large chunk of memory set aside for programs to use during execution). Heap memory is allocated in C using a call to the malloc
function (from stdlib.h
), which takes a single size_t
parameter: the number of bytes requested. It is conventional to use the sizeof
operator to help request the correct amount of memory:
int *r = (int*)malloc(sizeof(int));
The malloc
function returns a pointer to the allocated memory. Technically, its return type is void*
, a void pointer. This is why you have to cast the result to the type of pointer you need. Such casts are safe because all pointers are guaranteed to be the same width.
Note that malloc
could potentially return NULL
if it does not succeed (if the system is out of memory). Therefore, you should always check the return value to be safe:
if (r == NULL) {
printf("Out of memory!\n");
exit(EXIT_FAILURE);
}
The exit
function immediately aborts the program with the provided error code.
IMPORTANT: Unlike Java, C does not automatically de-allocate memory when it is no longer needed. Thus, you will need to manually de-allocate memory when you are done, using the free
function:
free(r)
If your program allocates memory and does not free it, your program has a memory leak. The leaked memory will be reclaimed when your program exits, but until then it will continue to clog up the system.
In this course, a memory leak is considered to be a software defect. You should thoroughly debug your program to find and fix all memory leaks. You may use tools like Valgrind/Memcheck to help you find memory leaks. Here is a short tutorial.
Question: What happens if you try to dereference a pointer after you free
it?
It is conventional to set set pointers to NULL
immediately after you free
them. This will trigger a segmentation fault at the offending location if you ever attempt to dereference it, making it easier to diagnose and fix software defects.
Arrays and pointers
Here's a little secret: you've actually already been using pointers. In C, array variable names are actually just pointers to the first element. In fact, you can find out the address of an array using the same syntax as above:
int d[3] = { 1, 2, 3 };
printf("The array starts at %p\n", d);
You can also use pointer arithmetic to access array elements:
printf("The first element is %d\n", *d);
printf("The second element is %d\n", *(d+1));
printf("The third element is %d\n", *(d+2));
This allows you to use pointers explicitly to iterate over arrays. The following two loops are equivalent:
for (int i=0; i<3; i++) {
printf("Element: %d\n", d[i]);
}
for (int *w = d; w < (d+3); w++) {
printf("Element: %d\n", *w);
}
In fact, the syntax you're already familiar with for accessing array elements (i.e., the square brackets used in the first loop above) actually performs pointer dereferencing and arithmetic "under the hood."
IMPORTANT: As mentioned before, there is no automatic bounds checking in C, so you must be careful not to read outside the bounds of the array. Now you can see that this is because C uses pointers and pointer arithmetic "under the hood," and there is no sure-fire way to check whether a pointer is pointing to a location inside a particular array.
We can also allocate arrays on the heap, which is very useful when we don't know at compile time how big the array should be. For example:
int *e = (int*)malloc(sizeof(int) * b);
This will allocate an array big enough to hold a number of elements equal to the value of b
. It is conventional in such allocations to use the sizeof
function to determine the number of bytes needed for each element and then to multiply by the size of the array.
This now allows us to write functions that return arrays; in fact, they just return a pointer to an array. Note that you should never return a reference to a static array that is local to a function, because the array will be de-allocated at the end of the function and the pointer will be invalid. You should only return pointers to arrays that you allocate on the heap. Here is a function that returns a newly-allocated array:
char* alloc_alphabet_array()
{
char *alphabet;
alphabet = (char*)malloc(sizeof(char) * 27);
if (alphabet == NULL) {
printf("Out of memory!\n");
}
for (char c = 'a'; c <= 'z'; c++) {
alphabet[c-'a'] = c;
}
alphabet[26] = '\0'; // null terminator
return alphabet;
}
Question: What is sizeof(alph)
if alph
is the return value of this function?
Recall that in C, strings are just null-terminated arrays of characters. Thus, in this case we could describe alphabet
using any of the following terms:
char *
("char star")- "pointer to array of chars"
- "pointer to a C string"
For return values that are pointers, some people put the asterisk (*
) with the type:
char* alloc_alphabet_array()
Others put the asterisk with the function name:
char *alloc_alphabet_array()
In this course, either approach is fine as long as you are consistent.
IMPORTANT: Don't forget to free
these arrays when you're done with them!
Structs and pointers
As you might suspect by now, you can also create pointers to structs. Assuming that point_t
is defined as it was in a previous lab, you can allocate a point on the heap:
point_t *pt = (point_t*)malloc(sizeof(point_t));
To access member variables, you have to dereference the pointer:
(*pt).x = 2;
(*pt).y = 3;
printf("The point is (%d, %d)\n", (*pt).x, (*pt).y);
Because this is cumbersome, C also provides a more concise way to dereference a pointer and access a member variable: the arrow operator (->
). The following code is exactly equivalent to the above code:
pt->x = 2;
pt->y = 3;
printf("The point is (%d, %d)\n", pt->x, pt->y);
Of course, you can also use pointers to store references to arrays of structs:
point_t *pts = (point_t*)malloc(sizeof(point_t) * 4);
if (pts == NULL) {
printf("Out of memory!\n");
exit(EXIT_FAILURE);
}
for (int i=0; i<4; i++) {
pts[i].x = i*10;
pts[i].y = i*20;
}
for (int i=0; i<4; i++) {
printf("Point: (%d, %d)\n", pts[i].x, pts[i].y);
}
Question: Is the type of pt
different from the type of pts
? Why or why not?
Question: Why didn't we need to use the ->
operator to access the x
and y
member variables in the loops above?
We can also pass structs into functions using pointers, and we can return pointers to structs from functions:
point_t* midpoint(point_t *p1, point_t *p2)
{
point_t* mid = (point_t*)malloc(sizeof(point_t));
if (mid == NULL) {
printf("Out of memory!\n");
exit(EXIT_FAILURE);
}
mid->x = (p1->x + p2->x)/2;
mid->y = (p1->y + p2->y)/2;
return mid;
}
Because midpoint
allocates memory and returns it, the calling function must remember to free
the allocated memory. Otherwise, there will be a memory leak.
Here is some code to test the midpoint
function:
point_t pt1 = { .x = 0, .y = 0 };
point_t pt2 = { .x = 10, .y = 10 };
point_t *midpt = midpoint(&pt1, &pt2);
printf("Midpoint: (%d, %d)\n", midpt->x, midpt->y);
Question: Why did we need to pass &pt1
and &pt2
into midpoint
instead of just pt1
or pt2
?
For large structs, this is far faster than passing or returning the struct itself, because it is quicker to copy a pointer than the entire struct. However, they are not exactly equivalent, because passing a pointer does not create a copy.
IMPORTANT: If you pass a struct into a function using a pointer, any changes you make to the struct inside the function will persist AFTER the function completes. This is because you are actually modifying the memory values of the original struct.
Exercise: More tracing
What does the following code print? Step through it as if you were the machine. You may find it helpful to draw diagrams with boxes for memory location and arrows for pointers.
typedef struct {
int *a;
int b;
} stuff_t;
void foo(stuff_t value)
{
*(value.a) = 2;
value.b = 3;
}
void bar(stuff_t *value)
{
*(value->a) = 4;
value->b = 5;
}
void do_stuff()
{
stuff_t my_stuff;
int temp = 0;
my_stuff.a = &temp;
my_stuff.b = 1;
printf("a=%d b=%d\n", *(my_stuff.a), my_stuff.b);
foo(my_stuff);
printf("a=%d b=%d\n", *(my_stuff.a), my_stuff.b);
bar(&my_stuff);
printf("a=%d b=%d\n", *(my_stuff.a), my_stuff.b);
}
Exercise: Plug the leaks
Review your code from this lab; does it leak any memory?
We purposefully did not include all necessary free
calls to give you experience with memory management. You should walk through your code and/or test it with a tool like Valgrind/Memcheck to find and fix any memory leaks. Here is a short tutorial.
(Optional) More exercises
You may find the following exercises helpful to reinforce your understanding of pointers: