10.6. Pointers and Dynamic Allocation¶

The previous section explored the use of pointers for call-by-reference parameters, which is a very common and effective use of pointers. A second common use of pointers is for dynamic memory allocation, which is a necessary aspect of even moderately sized programs. Dynamic memory allocation is used for several key purposes:

the exact size of a piece of data can only be determined at run-time based on features of the operating environment, including user input;

the size of the data needs to change over time;

the data manipulation involves combining or splitting the data, which is particularly common with string processing;

the data is extremely large;

the data needs persistence and scope that are not global, but are also not tied to the execution context of a single function.

The first three bullet points may seem intuitive, as they all involve the size of the data changing in one way or another. The fourth bullet point arises from the fact that all modern OS place a default maximum stack size for a thread, typically 8 MB. Declaring a large local variable, such as an array with thousands of elements, is likely to lead to a system crash when the stack runs out of space and overflows.

The final bullet point is common in many different types of languages, applications, or systems, though the terminology might be different. Code Listing A.31 provides an example in the Java programming language using the Factory design pattern. In the Factory pattern, there is a designated class (the factory) that is responsible for creating new instances of some class (here, just generic Objects). The Factory class only has a single method, typically with a name that starts with “get” or “create.” Later, in another part of the code (such as main()), a single instance of the Factory is created; this instance is then used to create the Objects over and over as needed.

/* Code Listing A.31:
   The Java Factory design pattern uses dynamic allocation for
   persistence and non-local scope
 */

public class Factory {
  public Object getObject () {
    return new Object();
  }
}

public class Demo {
  public static void main (String[] args) {
    Factory factory = new Factory();
    Object object = factory.getObject();
  }
}

Line 8 of the Factory class is an example of dynamic memory allocation for persistence and non-local scope. This line of code dynamically allocates a chunk of memory to store an Object. However, the Object is not tied to the scope of the getObject() method; it is intended to be used after getObject() has finished executing. In fact, we can generalize this example even further and suggest that the vast majority of uses of the new keyword in Java are semantically equivalent to dynamic memory allocation in C with malloc() or calloc().

Despite the fact that dynamic memory allocation serves several different purposes in C, there are only two standard mechanisms for doing this: the calloc() and malloc() functions. Both functions take parameters to indicate how much space is needed, allocate that much space on the heap, and return a pointer to the beginning of the space. The primary difference is that calloc() guarantees these bytes are initialized to all zero; malloc() performs no initialization on the bytes, so there is the possibility that they hold random initial values. The difference in the calling parameters is superficial. For malloc(), the only parameter is the total number of bytes needed. For calloc(), the first parameter specifies how many consecutive elements there are (similar to specifying the length of an array) and the second parameter indicates how many bytes are needed for each element; these two values are internally multiplied to determine the total size, and calloc() ends up allocating exactly that much space (just like malloc()). Because calloc() guarantees the data is initialized to zero, it is typically considered safer and preferred for general use. On the other hand, malloc() is faster since it does not do any initialization; this function is acceptable (or even preferred) if you know that the data will be entirely initialized before it will be otherwise used.

C library functions – <stdlib.h>

void * calloc(size_t count, size_t size);: Allocate count*size bytes of space on the heap and zero them out.
void * malloc(size_t size);: Allocate size bytes of space on the heap without initialization.
void * realloc(void *ptr, size_t size);: Resize a previously dynamically allocated memory space.
void free(void *ptr);: Mark the previously allocated space that ptr points to as available for future allocation.

Unlike Java, C does not have a garbage collector. When a piece of data is dynamically allocated with malloc() or calloc(), the programmer must determine when the data is no longer needed. Once you determine that the data can be safely deleted, you would need to call free() on the pointer to the space on the heap. This designates that space as available for future dynamic allocations.

Code Listing A.32 demonstrates the C equivalent of the Factory pattern from A.31. The get_integer() function dynamically allocates space on the heap and returns it. In C, the void* type acts comparably to the Object class in Java; any type of pointer can be implicitly cast into void*. Once the space is allocated on line 8, this space on the heap is not necessarily tied to the duration of any particular function. The integer variable name in main() is lexically bound to that function’s scope, but the heap space is not. Consequently, if main() returned without calling free() on integer (line 17), then this heap space would still be allocated but without a way to de-allocate it. That is, there is no other pointer reference to that space. This situation is known as a memory leak. With enough memory leaks over time, the heap would eventually run out of space that can be allocated; when that happens additional calls to malloc() and calloc() would return NULL to indicate the allocation failure.

/* Code Listing A.32:
   A C version of the Factory pattern from Code Listing A.31
 */

void *
get_integer (void)
{
  return calloc (1, sizeof (int));
}

int
main (void)
{
  int *integer = (int *) get_integer ();
  *integer = 5;

  free (integer);
  integer = NULL;

  return 0;
}

Bug Warning

Using a pointer after calling free() on it (called use-after-free) leads to unpredictable and unsafe behavior. While there is the possibility that the data stored on the heap remains intact, there is no guarantee that this is the case. As such, every call to free() should be followed by a line that sets the pointer to NULL, as shown on line 18 above. Once that is done, if the pointer is used again, there will be a segmentation fault. In this case, the segmentation fault would be better than the alternative: accessing a portion of memory that might unpredictably change at any point in the future.

Code Listing A.33 demonstrates an example of the first bullet-point use case described previously. In this case, the build_array() function is creating and initializing an array of int values. Since the length of the array is determined by the argument passed as length, this function uses dynamic allocation to create it as the correct size.

/* Code Listing A.33:
   Creating a dynamic-sized array
 */

int *
build_array (size_t length)
{
  int *array = calloc (length, sizeof (int));

  for (size_t i; i < length; i++)
    array[i] = i * i;

  return array;
}

As the program continues to run after dynamically allocating space, it frequently occurs that the space allocated needs to change. Perhaps the space is acting as an array that holds the list of records that make up a student’s grade for a class. As the student completes more and more assignments, the array may become full and not have space for additional records. The realloc() function provides a simple mechanism for changing the size of the space. The first parameter to realloc() is a pointer to the space to be resized (this space must be on the heap), and the second parameter is the new total size that is desired. If there is sufficient unused memory immediately after the existing space, the record for the space in the heap is simply modified. If there isn’t sufficient space, realloc() will allocate a new space that is big enough for the new size and copy the original data into this new space; the old space is automatically cleaned up. In either, realloc() returns a pointer to the newly resized heap space.

Code Listing A.34 uses the array from the function in A.33, but later determines that more capacity is needed. Initially, after lines 5 and 6, the array consists of 40 bytes (10 elements that are 4-byte int values). Line 9 doubles the value of len from 10 to 20, then passes this new length to realloc(). Like malloc(), realloc() takes a single size parameter to indicate the new total number of bytes; since the array will now be 80 bytes (20 elements of 4-byte int values), we have to specifically multiply len by sizeof(int) to get the correct size request. Finally, line 10 uses a new variable name to capture the return value from realloc(). If, for some reason, realloc() failed to allocate the new space, it would return NULL. However, the original array would still be allocated and we might need that original pointer; storing the return value of realloc() directly into array would lose the reference to the original space on the heap, creating a memory leak. Once line 11 ensures that the return value was not NULL, line 12 updates array as needed.

/* Code Listing A.34:
   Resizing the array built from A.33
 */

size_t len = 10;
int *array = build_array (len);

/* Double the capacity of the array */
len *= 2;
int *new_array = realloc (array, len * sizeof (int));
assert (new_array != NULL);
array = new_array;

Bug Warning

The ptr variable passed to realloc() must point to the heap; pointers to other memory segments will produce run-time exceptions. For instance, a common mistake arises when ptr is currently pointing to a string constant (e.g., char *ptr = "hello world"); realloc() cannot be used to resize this string.

Another common bug with realloc() arises when the old pointer is freed:

int *newptr = realloc (oldptr, newsize);
free (oldptr);
*newptr = 5;

When realloc() runs, there are two possibilities: 1) the data does not move, so oldptr and newptr point to the same place, and 2) the data moves. In case 1, freeing oldptr means that newptr is also freed, making line 3 an example of a use-after-free bug. In the second case, realloc() transparently frees the old space for you; line 2 then becomes an example of a double free in which the same heap space is freed twice, which triggers a run-time exception.

Lastly, note that realloc() behaves like malloc() when the resizing means increasing the size of the space. The additional bytes of memory allocated are not guaranteed to be initialized to any particular value.

Finally, as a general piece of guidance, you should only dynamically allocate data if you need to, according to the five scenarios identified above. One common mistake—not technically a bug, but unnecessarily complex code—follows from misapplication of the semantic equivalence of Java’s new keyword with malloc(). In Java, all objects (i.e., any variable that is not a primitive type) must be allocated with the new keyword. However, in C, any local variable will automatically be allocated space in the stack frame. If an instance of a struct does not need to persist beyond the scope of the current function, dynamically allocating space for it on the heap is a waste of both time and space.

Code Listing A.35 compares these two styles of creating a local struct instance. In this case, the style of declaring the jasmine instance is preferred. When the function begins executing, the stack pointer is moved once to allocate all of the space needed for local variables; this takes the same amount of time regardless of how many variables there are. Consequently, the space for jasmine is automatically allocated by the function calling semantics itself. The allocation of philippe, on the other hand, requires significant extra work. Specifically, malloc() has to find available space, requiring modifications to the internal data structures that define the heap; then, then programmer must remember to free() the space later, causing more changes to the heap. Ultimately, this extra work is done for no reason; in fact, the changes to the heap might actually produce even worse performance later, because the free() might not return the heap to its original state (i.e., if there are multiple threads involved). It is true that C structs perform a similar function as Java objects; however, that does not automatically require the use of dynamic allocation.

/* Code Listing A.35:
   Unnecessarily allocating a struct instance with malloc()
 */

struct user {
  char *name;
  int id;
};

int
main (void)
{
  struct user jasmine;
  jasmine.name = "Jasmine";
  jasmine.id = 50;

  struct user *philippe = malloc (sizeof (struct user));
  philippe->name = "Philippe";
  philippe->id = 75;
  free (philippe);

  return 0;
}