The previous section explored the use of pointers for call-by-reference parameters, which is a very common and effective use of pointers. A second common use of pointers is for dynamic memory allocation, which is a necessary aspect of even moderately sized programs. Dynamic memory allocation is used for several key purposes:
- the exact size of a piece of data can only be determined at run-time based on features of the operating environment, including user input;
- the size of the data needs to change over time;
- the data manipulation involves combining or splitting the data, which is particularly common with string processing;
- the data is extremely large;
- the data needs persistence and scope that are not global, but are also not tied to the execution context of a single function.
The first three bullet points may seem intuitive, as they all involve the size of the data changing in one way or another. The fourth bullet point arises from the fact that all modern OS place a default maximum stack size for a thread, typically 8 MB. Declaring a large local variable, such as an array with thousands of elements, is likely to lead to a system crash when the stack runs out of space and overflows.
The final bullet point is common in many different types of languages, applications, or systems,
though the terminology might be different. Code Listing A.31 provides an example in the
Java programming language using the Factory design pattern. In the Factory pattern, there is a
designated class (the factory) that is responsible for creating new instances of some class (here,
just generic Objects). The Factory class only has a single method, typically with a name that starts
with “get” or “create.” Later, in another part of the code (such as main()
), a single instance
of the Factory is created; this instance is then used to create the Objects over and over as needed.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | /* Code Listing A.31:
The Java Factory design pattern uses dynamic allocation for
persistence and non-local scope
*/
public class Factory {
public Object getObject () {
return new Object();
}
}
public class Demo {
public static void main (String[] args) {
Factory factory = new Factory();
Object object = factory.getObject();
}
}
|
Line 8 of the Factory class is an example of dynamic memory allocation for persistence and non-local
scope. This line of code dynamically allocates a chunk of memory to store an Object. However, the
Object is not tied to the scope of the getObject()
method; it is intended to be used after
getObject()
has finished executing. In fact, we can generalize this example even further and
suggest that the vast majority of uses of the new keyword in Java are semantically equivalent to
dynamic memory allocation in C with malloc()
or calloc()
.
Despite the fact that dynamic memory allocation serves several different purposes in C, there are
only two standard mechanisms for doing this: the calloc()
and malloc()
functions. Both
functions take parameters to indicate how much space is needed, allocate that much space on the
heap, and return a pointer to the beginning of the space. The primary difference is that
calloc()
guarantees these bytes are initialized to all zero; malloc()
performs no initialization
on the bytes, so there is the possibility that they hold random initial values. The difference in
the calling parameters is superficial. For malloc(),
the only parameter is the total number of
bytes needed. For calloc()
, the first parameter specifies how many consecutive elements there are
(similar to specifying the length of an array) and the second parameter indicates how many bytes are
needed for each element; these two values are internally multiplied to determine the total size, and
calloc()
ends up allocating exactly that much space (just like malloc()
). Because calloc()
guarantees the data is initialized to zero, it is typically considered safer and preferred for
general use. On the other hand, malloc()
is faster since it does not do any initialization; this
function is acceptable (or even preferred) if you know that the data will be entirely initialized
before it will be otherwise used.
C library functions – <stdlib.h>
void * calloc(size_t count, size_t size);
void * malloc(size_t size);
void * realloc(void *ptr, size_t size);
void free(void *ptr);
Unlike Java, C does not have a garbage collector. When a piece of data is dynamically
allocated with malloc()
or calloc(),
the programmer must determine when the data is no
longer needed. Once you determine that the data can be safely deleted, you would need to call
free()
on the pointer to the space on the heap. This designates that space as available for
future dynamic allocations.
Code Listing A.32 demonstrates the C equivalent of the Factory pattern from A.31.
The get_integer()
function dynamically allocates space on the heap and returns it.
In C, the void*
type acts comparably to the Object class in Java; any type of pointer can be
implicitly cast into void*
. Once the space is allocated on line 8, this space on the heap is not
necessarily tied to the duration of any particular function. The integer variable name in main()
is lexically bound to that function’s scope, but the heap space is not. Consequently, if main()
returned without calling free()
on integer (line 17), then this heap space would still be
allocated but without a way to de-allocate it. That is, there is no other pointer reference to that
space. This situation is known as a memory leak. With enough memory leaks over time, the
heap would eventually run out of space that can be allocated; when that happens additional calls to
malloc()
and calloc()
would return NULL
to indicate the allocation failure.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 | /* Code Listing A.32:
A C version of the Factory pattern from Code Listing A.31
*/
void *
get_integer (void)
{
return calloc (1, sizeof (int));
}
int
main (void)
{
int *integer = (int *) get_integer ();
*integer = 5;
free (integer);
integer = NULL;
return 0;
}
|
Bug Warning
Using a pointer after calling free()
on it (called use-after-free) leads to unpredictable and
unsafe behavior. While there is the possibility that the data stored on the heap remains intact,
there is no guarantee that this is the case. As such, every call to free()
should be followed
by a line that sets the pointer to NULL
, as shown on line 18 above. Once that is done, if the
pointer is used again, there will be a segmentation fault. In this case, the segmentation fault
would be better than the alternative: accessing a portion of memory that might unpredictably change
at any point in the future.
Code Listing A.33 demonstrates an example of the first bullet-point use case described
previously. In this case, the build_array()
function is creating and initializing an array of
int
values. Since the length of the array is determined by the argument passed as length
,
this function uses dynamic allocation to create it as the correct size.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | /* Code Listing A.33:
Creating a dynamic-sized array
*/
int *
build_array (size_t length)
{
int *array = calloc (length, sizeof (int));
for (size_t i; i < length; i++)
array[i] = i * i;
return array;
}
|
As the program continues to run after dynamically allocating space, it frequently occurs that the
space allocated needs to change. Perhaps the space is acting as an array that holds the list of
records that make up a student’s grade for a class. As the student completes more and more
assignments, the array may become full and not have space for additional records. The realloc()
function provides a simple mechanism for changing the size of the space. The first parameter to
realloc()
is a pointer to the space to be resized (this space must be on the heap), and the
second parameter is the new total size that is desired. If there is sufficient unused memory
immediately after the existing space, the record for the space in the heap is simply modified. If
there isn’t sufficient space, realloc()
will allocate a new space that is big enough for the new
size and copy the original data into this new space; the old space is automatically cleaned up. In
either, realloc()
returns a pointer to the newly resized heap space.
Code Listing A.34 uses the array from the function in A.33, but later
determines that more capacity is needed. Initially, after lines 5 and 6, the array
consists of
40 bytes (10 elements that are 4-byte int
values). Line 9 doubles the value of len
from 10
to 20, then passes this new length to realloc()
. Like malloc()
, realloc()
takes a single
size parameter to indicate the new total number of bytes; since the array will now be 80 bytes (20
elements of 4-byte int
values), we have to specifically multiply len
by sizeof(int)
to
get the correct size request. Finally, line 10 uses a new variable name to capture the return value
from realloc()
. If, for some reason, realloc()
failed to allocate the new space, it would
return NULL
. However, the original array
would still be allocated and we might need that
original pointer; storing the return value of realloc()
directly into array
would lose the
reference to the original space on the heap, creating a memory leak. Once line 11 ensures that the
return value was not NULL
, line 12 updates array as needed.
1 2 3 4 5 6 7 8 9 10 11 12 | /* Code Listing A.34:
Resizing the array built from A.33
*/
size_t len = 10;
int *array = build_array (len);
/* Double the capacity of the array */
len *= 2;
int *new_array = realloc (array, len * sizeof (int));
assert (new_array != NULL);
array = new_array;
|
Bug Warning
The ptr
variable passed to realloc()
must point to the heap; pointers to other memory
segments will produce run-time exceptions. For instance, a common mistake arises when ptr
is
currently pointing to a string constant (e.g., char *ptr = "hello world"
); realloc()
cannot
be used to resize this string.
Another common bug with realloc()
arises when the old pointer is freed:
1 2 3 | int *newptr = realloc (oldptr, newsize);
free (oldptr);
*newptr = 5;
|
When realloc()
runs, there are two possibilities: 1) the data does not move, so oldptr
and
newptr
point to the same place, and 2) the data moves. In case 1, freeing oldptr
means that
newptr
is also freed, making line 3 an example of a use-after-free bug. In the second case,
realloc()
transparently frees the old space for you; line 2 then becomes an example of a
double free in which the same heap space is freed twice, which triggers a run-time exception.
Lastly, note that realloc()
behaves like malloc()
when the resizing means increasing the
size of the space. The additional bytes of memory allocated are not guaranteed to be initialized to
any particular value.
Finally, as a general piece of guidance, you should only dynamically allocate data if you need to,
according to the five scenarios identified above. One common mistake—not technically a bug, but
unnecessarily complex code—follows from misapplication of the semantic equivalence of Java’s new
keyword with malloc()
. In Java, all objects (i.e., any variable that is not a primitive type)
must be allocated with the new keyword. However, in C, any local variable will automatically be
allocated space in the stack frame. If an instance of a struct
does not need to persist beyond
the scope of the current function, dynamically allocating space for it on the heap is a waste of both time and space.
Code Listing A.35 compares these two styles of creating a local struct
instance. In
this case, the style of declaring the jasmine
instance is preferred. When the function begins
executing, the stack pointer is moved once to allocate all of the space needed for local variables;
this takes the same amount of time regardless of how many variables there are. Consequently, the
space for jasmine
is automatically allocated by the function calling semantics itself. The
allocation of philippe,
on the other hand, requires significant extra work. Specifically,
malloc()
has to find available space, requiring modifications to the internal data structures
that define the heap; then, then programmer must remember to free()
the space later, causing
more changes to the heap. Ultimately, this extra work is done for no reason; in fact, the changes to
the heap might actually produce even worse performance later, because the free()
might not
return the heap to its original state (i.e., if there are multiple threads involved). It is true
that C structs perform a similar function as Java objects; however, that does not automatically
require the use of dynamic allocation.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 | /* Code Listing A.35:
Unnecessarily allocating a struct instance with malloc()
*/
struct user {
char *name;
int id;
};
int
main (void)
{
struct user jasmine;
jasmine.name = "Jasmine";
jasmine.id = 50;
struct user *philippe = malloc (sizeof (struct user));
philippe->name = "Philippe";
philippe->id = 75;
free (philippe);
return 0;
}
|