10.3. Basic Types and Pointers¶

C defines several basic integer types, as illustrated in Table A.2. For each of these types, there is both a signed (the default) and unsigned version (considered distinct types). [1] (Note that short, long, and long long can also be called short int, long int, and long long int, respectively.) For each type, there is a designated minimum range of values that the type must be capable of storing. The ranges are shown in the middle two columns. When printing a variable of the specified type, there are both %d (signed) and %u (unsigned) format specifiers with added h (half) or l (long) characters to vary the size. The char type can use %c to print the ASCII interpretation of the byte, whereas the %hhd and %hhu strings will produce the integer format.

Type	Unsigned range (minimum)	Signed range (minimum)	Format strings
`char`	[0, 255]	[-127, 127]	`%c`, `%hhd`, `%hhu`
`short`	[0, 65535]	[-32767, 32767]	`%hd`, `%hu`
`int`	[0, 65535]	[-32767, 32767]	`%d`, `%u`
`long`	[0, 4294967295]	[-2147483647, 2147483647]	`%ld`, `%lu`
`long long`	[0, 18446744073709551615]	[-9223372036854775807, 9223372036854775807]	`%lld`, `%llu`

Table A.2: C primitive integer types

Note

The basic char type is built on the ASCII character standard. The original version of this standard uses seven bits (the eighth bit was used as a parity check to detect errors), allowing for the representation of up to 127 unique characters (not including the reserve null byte). This limited size is clearly not sufficient to support international communication, which requires thousands of unique characters. ASCII has been succeeded by other standards, with Unicode being the current international standard. Unicode supports 16-bit (UTF-16) and 32-bit (UTF-32) character representations, exceeding the range of the char type. C has since introduced the wchar_t (wide character) to support these larger character representations.

There are some important points to note about the ranges shown in Table A.2. First, the table does not contain a typo; the short and int types are required to cover the exact same ranges. These ranges can be represented in 16 bits for either type. However, in a typical modern system, a short is a 16-bit (2 byte) variable, whereas an int is 32-bit (4 bytes). To add to the confusion, the ranges required for long can be represented in 32 bits, but a long is typically 64-bit (8 bytes). And finally, the range for long long can be represented in 64 bits, which is the typical size (8 bytes) for this type. In other words, short and int variables are required to represent the same range of values, but they are typically different sizes; on the other hand, long and long long are required to represent different ranges of values, but they are typically the same size!

Another, subtle aspect of these ranges is that there seems to be a discrepancy in the number of signed and unsigned values. For instance, take the char type; the unsigned version can represent 256 different numbers (0 through 255), whereas the signed version can only represent 255 (-127 to 0 to 127). In all of these cases, the signed version must support one’s complement integer representation in contrast to the two’s complement arithmetic typical in modern architectures. As such, the value -128 (0x80) can typically be used as a char value, but this is technically outside the range for this type.

Note

Until the C99 standard was published, C had no built-in Boolean variable type to indicate true or false. Instead, C used an integer for this purpose, with the convention that 0 means false and any non-zero value means true. This convention is still frequently manipulated, particularly as a short-hand for condition checks. For instance, strcmp() function, which is used to compare two strings, returns 0 if the strings are identical; if not, strcmp() returns either 1 or -1. Based on the convention that 0 means false, it is common to see code written as follows:

if (! strcmp (str1, str2))
  printf ("Strings are identical\n");

If the strings str1 and str2 are identical, strcmp() returns 0 (false). The logical negation operator (!) then flips this value; if the strings are identical, the condition in the if-statement is true. If the strings are not identical, strcmp() will return either 1 or -1, both of which evaluate to true; applying the ! flips this value and the condition evaluates to false, skipping the body of the if-statement. Logically, the condition shown above is equivalent to testing if (strcmp (str1, str2) == 0).

C99 introduced a new header file, stdbool.h. This header defines a bool type, as well as the constants true and false. This new type is generally preferred, as it makes for more readable code.

10.3.1. C99 Fixed-width Types¶

In systems programming, this sort of inconsistent definition can lead to serious mistakes. If an embedded microcontroller (such as a tiny wireless sensor monitoring a car’s velocity) needs to send data to a centralized controller, it would be beneficial for both devices to agree on the size of various types of data. To fix this, systems code typically uses explicitly named types. These types, defined in stdint.h, are shown in Table A.3.

Signed type	`INT`n`_MIN`	`INT`n`_MAX`	Unsigned type	`UINT`n`_MAX`
`int8_t`	-128	127	`uint8_t`	255
`int16_t`	-32768	32767	`uint16_t`	65535
`int32_t`	-INT32_MAX-1	2147483647	`uint32_t`	4294967295U
`int64_t`	-INT64_MAX-1	9223372036854775807LL	`uint64_t`	18446744073709551615ULL

Table A.3: C99 fixed-width integer types

The type name is constructed as intn_t or uintn_t, where the n is one of 8, 16, 32, or 64. For each size, there is a signed minimum (INTn_MIN), a signed maximum (INTn_MAX), and an unsigned maximum (UINTn_MAX). (The unsigned minimum is 0 for all sizes.) Note that INT32_MIN and INT64_MIN cannot be represented directly because of the way that C defines numeric constants. Instead, they have to be written in relation to the corresponding INTn_MAX values, but they are the values -2,147,483,648 and -9,223,372,036,854,775,808, respectively. Also observe that the large values need to be appended with U, LL, or ULL so that the C compiler will interpret the values correctly.

Code Listing A.5 illustrates a few key points about using these standard and fixed types. Lines 6 – 10 declare a char variable and print it both in ASCII (using %c) and as an integer value (in decimal, octal, and hexadecimal format). This code would work identically even if the declaration of the achar variable was changed to uint8_t, the 8-bit unsigned integer type. Lines 13 – 16 demonstrate the standard printing for an int type.

/* Code Listing A.5:
   Using and printing various integer types
 */

/* Use char, printing it in 4 formats */
char achar = 'A'; 
printf ("achar has value %c (character)\n", achar);
printf ("achar has value %hhd (dec)\n", achar);
printf ("achar has value %hho (oct)\n", achar);
printf ("achar has value %04hhx (hex)\n\n", achar);

/* Use basic int type, printing value in 3 formats */
int intval = 42;
printf ("intval has value %d (dec)\n", intval);
printf ("intval has value %o (oct)\n", intval);
printf ("intval has value %04x (hex)\n\n", intval);

/* Use 32-bit signed integer, printing value */
int32_t i32val = 42;
printf ("i32val has value %d (dec)\n", i32val);
printf ("i32val has value %" PRId32 " (dec)\n", i32val);
printf ("i32val has value %" PRIo32 " (oct)\n", i32val);
printf ("i32val has value %04" PRIx32 " (hex)\n\n", i32val);

/* Use 64-bit unsigned integer, printing value */
uint64_t ui64val = 42;
printf ("ui64val has value %d (dec)\n", ui64val);
printf ("ui64val has value %lld (dec)\n", ui64val);
printf ("ui64val has value %" "lld" " (dec)\n", ui64val);
printf ("ui64val has value %" PRId64 " (dec)\n", ui64val);
printf ("ui64val has value %" PRIo64 " (oct\)\n", ui64val);
printf ("ui64val has value %04" PRIx64 " (hex)\n\n", ui64val);

The remaining lines (19 – 31) demonstrate similar handling using the fixed-width integer types. These format strings initially look rather confusing, because they exploit a feature of string constants that is often overlooked (or not taught) to those new to C: consecutive string constants are concatenated automatically by the compiler. For example, the code printf ("hello world"); is treated identically to printf ("hello " "world"); by the compiler. (The several white spaces between the strings are ignored and used here only for effect.)

The format specifiers for these integer types are defined in the inttypes.h header. Within this file, there is a preprocessor definition that will translate PRId32 or PRIx64 into the appropriate character strings ("d" and "llx" in this case) from Table A.2 for the target machine. In general, the format specifier has the format PRIfn, where n is the size (8, 16, 32, or 64 bits) and f is the desired integer format (d or i for signed decimal, u for unsigned decimal, o for octal, x or X for hexadecimal). Consequently, the C compiler would treat lines 29 and 30 in Code Listing A.5 identically to line 28. The PRId64 is translated by the processor into "lld", so lines 29 and 30 literally become identical when the compiler begins to work with the code; the compiler then transparently concatenates the string constants to create the string shown in line 28. When compiling this code, observe that line 27 produces a warning, as the variable and the corresponding format specifier are specifying different sizes (64-bit for uint64_t and 32-bit for %d).

Bug Warning

The quotation marks around the C99 format strings is a common point of confusion that can lead to frustrating compiler errors. Specifically, note that there is a quote after the % character that begins the format specifier. This quote is necessary, as it terminates the string constant up to that point. Any additional characters that modify the format specifier (such as 0 to indicate leading zeros, a number to specify a fixed display width, or a - to indicate left-justification within that space) must appear before the quotation mark, as these are not part of the C99 format processing.

One common task with working with C types is using the sizeof() operator [2] to examine the number of bytes allocated for a variable or a type. In the case of a type, sizeof() indicates how many bytes would be allocated for such a variable instance. Code Listing A.6 demonstrates the use of sizeof(), based off of the variable declarations in Code Listing A.5. As the value returned from sizeof() is considered to be of type size_t, the %zd format specifier is required when printing these values. On most systems, lines 7, 8, 12, and 13 will all produce the same size result (4), as int and int32_t are typically identical.

/* Code Listing A.6:
   Using sizeof() for introspection of variable and type sizes
 */

/* Use sizeof to get the size of a variable name */
size_t size = sizeof (intval);
printf ("intval has size %zd\n", size);
printf ("i32val has size %zd\n", sizeof (i32val));
printf ("ui64val has size %zd\n\n", sizeof (ui64val));

/* Use sizeof to get the size of a variable type */
printf ("Primitive int type has size %zd\n", sizeof (int));
printf ("C99 int32_t type has size %zd\n", sizeof (int32_t));
printf ("C99 int64_t type has size %zd\n", sizeof (int64_t));
printf ("Primitive size_t type has size %zd\n", sizeof (size_t));

10.3.2. Pointer Basics¶

As this Appendix is intended as a re-introduction to C, readers are generally assumed to have had some prior exposure to key features of the language, including pointers. However, few concepts in programming cause as many problems or as much frustration as pointers; as such, we will reiterate the basic concept here, particularly as it relates to the preceding discussion on sizeof(). Later in this Appendix, we will also introduce some intermediate or advanced techniques of working with pointers. To start, we emphasize the following key idea:

A pointer is a variable that stores an address.

When a pointer variable is declared, its declaration generally indicates what type of data is stored at that address. This target type is generally included when discussing a pointer’s type, but it does not change the fact that the pointer stores an address. For instance, by declaring int *x = NULL;, we have declared the variable named x to be a pointer to an int. That is, we might write that x is an int*. It cannot be overstated that this statement is different from saying that x is an int. Part of what makes this subtly complex is that addresses are, fundamentally, integer values that have a particular meaning. Code Listing A.7 illustrates this confusion in a common manner.

/* Code Listing A.7:
   A subtle segfault bug
 */

int ival = 10;
int *iptr = &ival; // iptr points to ival

*iptr = 0; // sets ival to 0
printf ("iptr = %p; *iptr = %d\n", iptr, *iptr);

iptr = 0; // makes iptr point to nowhere (i.e., NULL)
/* next line causes a segfault since iptr is now NULL */
printf ("iptr = %p; *iptr = %d\n", iptr, *iptr);

Lines 5 and 6 declare two variables, one (ival) has the type int and the other (iptr) has the type int*. The value of ival is initialized to 10, whereas the value of iptr is the address of ival (&ival). That is, iptr is initialized so that it points to ival. We might also say that iptr is an indirect reference to ival. Line 8 dereferences the pointer (i.e., follows the arrow to the data pointed to) and sets that portion of memory to 0; specifically, by dereferencing the pointer, line 8 is changing the value of ival from 10 to 0. As such, line 9 runs without error, printing the value of iptr (which is the address of ival) along with the new value of ival (because we are again dereferencing the pointer).

Line 11 is fundamentally different, however. Since line 11 is not dereferencing the pointer (note that there is no * on this line), it is changing the value of iptr itself. Recall from above that a pointer stores an address; if we change a pointer’s value, we are changing the memory address that we are pointing to. By setting iptr to 0, the pointer now points to address 0, which is nothingness. That is, iptr no longer points to ival; it is now a NULL pointer (NULL is literally the constant 0 in C!). As such, line 13 causes a segmentation fault, because it tries to deference a null pointer (*iptr).

A common misunderstanding about this example is that setting the pointer to NULL does not necessarily cause an error. Code Listing A.8 extends A.7, with the exception that a new line is added just before the final printf() statement. In Code Listing A.7, the code set iptr to 0 and immediately called printf(); in Code Listing A.8, line 7 occurs between these steps, setting iptr to point back to ival. In restoring iptr’s value to point to a valid location, Code Listing A.8 removes the segmentation fault that occurred in A.7. In short, setting a pointer’s value to NULL does not cause a segmentation fault; the fault only occurs once the pointer is dereferenced.

/* Code Listing A.8:
   Restoring successful execution to Code Listing A.7
 */

iptr = 0; // makes iptr point to nowhere (i.e., NULL)

iptr = &ival; // makes iptr point to ival again

/* next line no longer segfaults */
printf ("iptr = %p; *iptr = %d\n", iptr, *iptr);

To return to the discussion of type sizes, a common misunderstanding arises when we combine the concepts of pointers and sizeof(). Recall two facts: sizeof() returns the number of bytes required for a variable of a particular type, and all pointers store addresses (and nothing else!). As such, calling sizeof() on a pointer will always return the same answer: the size of an address. From the perspective of sizeof(), pointer declarations are all identical. Code Listing A.9 demonstrates this fact. The variables ival and cval have distinct sizes (typically, 4 bytes for ival as an int and 1 byte for cval as a char). In contrast, iptr and cptr have different types, but they are the same size (8 bytes, assuming this code runs on a 64-bit CPU architecture). As such, the assertions on lines 17 and 18 are both true.

/* Code Listing A.9:
   Using sizeof() with pointers produces different results from basic types
 */

int ival = 10;
int *iptr = &ival;
printf ("Size of ival = %zd\n", sizeof (ival));
printf ("Size of iptr = %zd\n", sizeof (iptr));
printf ("Size of *iptr = %zd\n\n", sizeof (*iptr));

char cval = 'a';
char *cptr = &cval;
printf ("Size of cval = %zd\n", sizeof (cval));
printf ("Size of cptr = %zd\n", sizeof (cptr));
printf ("Size of *cptr = %zd\n", sizeof (*cptr));

assert (sizeof (int *) == sizeof (char *));
assert (sizeof (int) != sizeof (char));
assert (sizeof (*iptr) != sizeof (*cptr));

Lastly, observe that sizeof() can be called successfully on pointer dereferences, as shown in lines 9 and 15. In this case, the size returned is the number of bytes for the referenced type, not the pointer. Line 9 will print that the size is 4, as that is the typical size for an int (which is the type of *iptr, also known as ival). Line 15 will print that the size is 1, given that *cptr (also known as cval) is a char. Consequently, the assertion on line 19 holds, as sizeof(*iptr) gets the size of an int, whereas sizeof(*cptr) is the size of a char.

Given that a pointer is a variable that stores an address, an intuitive follow-up question is whether or not a pointer can store the address of a pointer. The answer is yes, and this is common in a variety of circumstances that we will examine later in this Appendix. Code Listing A.10 illustrates an example of using pointers to pointers. Line 6 starts by declaring iptr (abbreviation of “integer pointer”) as an int* and making it point to ival. Line 7 takes this a step farther, making pptr (“pointer pointer”) be a pointer to a pointer to an int (int**), and initializing it to point to iptr; this initialization is correct, as pptr is pointing to an int*. Line 8 introduces the use of void with a pointer structure; specifically, the declaration indicates the vptr (“void pointer”) is a pointer to a pointer to a pointer to something, but the type of that something is unknown.

/* Code Listing A.10:
   Multiple layers of indirection with pointers to pointers
 */

int ival = 10;
int *iptr = &ival;
int **pptr = &iptr;
void ***vptr = (void***) &pptr; // this is okay

/* Multiple layers of redirection to the same place */
printf ("%d = %d = %d\n", ival, *iptr, **pptr);
// NOT ALLOWED: printf ("%d\n", ***vptr);

/* The size of any pointer is always the same */
assert (sizeof (iptr) == sizeof (pptr));
assert (sizeof (iptr) == sizeof (vptr));

/* Just for the full, ridiculous effect */
assert (sizeof (char*) == sizeof (int******));
assert (sizeof (void*) == sizeof (double***));

Figure 10.3.4 illustrates the relationship of the pointers as declared in lines 5 – 8. Line 11 prints the value 10 multiple times by dereferencing the pointers as needed. Specifically, line 11 starts by printing ival directly, and the value of ival is 10. Then, the value of ival is printed again by dereferencing iptr; since iptr is an int* and the value of iptr is &ival, dereferencing it once (as *iptr) ends up at the int variable ival. Similarly, the value of ival is printed one more time, as pptr is dereferenced twice (**pptr); the first dereference * follows the arrow from pptr to iptr and the second * follows the arrow from iptr to ival.

Figure 10.3.4: The pointer structure of Code Listing A.10

The comment on line 12 indicates that we cannot simply dereference vptr three times to get the same value. The problem is that dereferencing requires knowing the type that we are pointing to. The void pointer type—regardless of how many layers of indirection are present—serves the same purpose in C as the Object class in Java; it acts as a catch-all type for passing around pointers. We could successfully dereference vptr by first casting it (i.e., by using ***((int***)vptr)).

Lines 14 – 20 highlight another key insight about pointers: All pointers have the same size. It is a very common misunderstanding to think that the type of the pointer (int* vs. char*) has some influence on the size of the pointer. This is not true. All pointers are allocated exactly the amount of space required to store an address on the target architecture. The pointer’s type is only important for dereferencing the pointer. Note that the assertion on line 20 also holds. A common misunderstanding is that void* either has no size or it points to nothing; neither statement is true (unless the pointer is specifically initialized to NULL, then the latter is true). The void* type means that we have a pointer variable and its size is the same as any other pointer, but we do not know the size of what it is pointing to. We can cast a void* to be any other type of pointer; doing so would indicate how to use the bytes being pointed to when dereferencing the pointer.

[1] To be pedantic, char and signed char are distinct types in C. The char type is defined as the smallest addressable unit for containing a machine’s basic character set; as this basic character set is ASCII for modern systems, the smallest addressable unit is an 8-bit byte. The signed char is defined in terms of the range shown.

[2] The sizeof() operator looks like a function and generally behaves like one, taking an input argument and return a value. This appearance is deceiving, though, as sizeof() does not execute according to the standard function calling semantics. Instead, sizeof() is an operator that is built into the language, like + to indicate addition.