C defines several basic integer types, as illustrated in Table A.2. For each of these types, there
is both a signed (the default) and unsigned version (considered distinct types). [1] (Note that
short
, long
, and long long
can also be called short int
, long int
, and long
long int
, respectively.) For each type, there is a designated minimum range of values that the
type must be capable of storing. The ranges are shown in the middle two columns. When printing a
variable of the specified type, there are both %d
(signed) and %u
(unsigned) format
specifiers with added h
(half) or l
(long) characters to vary the size. The char type can
use %c to print the ASCII interpretation of the byte, whereas the %hhd
and %hhu
strings will
produce the integer format.
Type | Unsigned range (minimum) | Signed range (minimum) | Format strings |
---|---|---|---|
char |
[0, 255] | [-127, 127] | %c , %hhd , %hhu |
short |
[0, 65535] | [-32767, 32767] | %hd , %hu |
int |
[0, 65535] | [-32767, 32767] | %d , %u |
long |
[0, 4294967295] | [-2147483647, 2147483647] | %ld , %lu |
long long |
[0, 18446744073709551615] | [-9223372036854775807, 9223372036854775807] | %lld , %llu |
Note
The basic char
type is built on the ASCII character standard. The original version of this
standard uses seven bits (the eighth bit was used as a parity check to detect errors), allowing for
the representation of up to 127 unique characters (not including the reserve null byte). This
limited size is clearly not sufficient to support international communication, which requires
thousands of unique characters. ASCII has been succeeded by other standards, with Unicode being the
current international standard. Unicode supports 16-bit (UTF-16) and 32-bit (UTF-32) character
representations, exceeding the range of the char
type. C has since introduced the wchar_t
(wide character) to support these larger character representations.
There are some important points to note about the ranges shown in Table A.2. First, the
table does not contain a typo; the short
and int
types are required to cover the exact same
ranges. These ranges can be represented in 16 bits for either type. However, in a typical modern
system, a short
is a 16-bit (2 byte) variable, whereas an int
is 32-bit (4 bytes). To add to
the confusion, the ranges required for long can be represented in 32 bits, but a long
is
typically 64-bit (8 bytes). And finally, the range for long long
can be represented in 64 bits,
which is the typical size (8 bytes) for this type. In other words, short
and int
variables
are required to represent the same range of values, but they are typically different sizes; on the
other hand, long
and long long
are required to represent different ranges of values, but
they are typically the same size!
Another, subtle aspect of these ranges is that there seems to be a discrepancy in the number of
signed and unsigned values. For instance, take the char
type; the unsigned version can represent
256 different numbers (0 through 255), whereas the signed version can only represent 255 (-127 to 0
to 127). In all of these cases, the signed version must support one’s complement integer
representation in contrast to the two’s complement arithmetic typical in modern architectures. As
such, the value -128 (0x80) can typically be used as a char
value, but this is technically
outside the range for this type.
Note
Until the C99 standard was published, C had no built-in Boolean variable type to indicate true or
false. Instead, C used an integer for this purpose, with the convention that 0 means false and
any non-zero value means true. This convention is still frequently manipulated, particularly as a
short-hand for condition checks. For instance, strcmp() function, which is used to compare two
strings, returns 0 if the strings are identical; if not, strcmp()
returns either 1 or -1. Based
on the convention that 0 means false, it is common to see code written as follows:
if (! strcmp (str1, str2))
printf ("Strings are identical\n");
If the strings str1
and str2
are identical, strcmp()
returns 0 (false). The logical
negation operator (!
) then flips this value; if the strings are identical, the condition in the
if-statement is true. If the strings are not identical, strcmp()
will return either 1 or -1,
both of which evaluate to true; applying the !
flips this value and the condition evaluates to
false, skipping the body of the if-statement. Logically, the condition shown above is equivalent to
testing if (strcmp (str1, str2) == 0)
.
C99 introduced a new header file, stdbool.h
. This header defines a bool
type, as well as the
constants true
and false
. This new type is generally preferred, as it makes for more readable code.
In systems programming, this sort of inconsistent definition can lead to serious mistakes. If an
embedded microcontroller (such as a tiny wireless sensor monitoring a car’s velocity) needs to send
data to a centralized controller, it would be beneficial for both devices to agree on the size of
various types of data. To fix this, systems code typically uses explicitly named types. These types,
defined in stdint.h
, are shown in Table A.3.
Signed type | INT n_MIN
INT n_MAX
Unsigned type |
UINT n_MAX
| ||
---|---|---|---|---|
int8_t |
-128 | 127 | uint8_t |
255 |
int16_t |
-32768 | 32767 | uint16_t |
65535 |
int32_t |
-INT32_MAX-1 | 2147483647 | uint32_t |
4294967295U |
int64_t |
-INT64_MAX-1 | 9223372036854775807LL | uint64_t |
18446744073709551615ULL |
Table A.3: C99 fixed-width integer types
The type name is constructed as int
n_t
or uint
n_t
, where the n is one of 8, 16,
32, or 64. For each size, there is a signed minimum (INT
n_MIN
), a signed maximum
(INT
n_MAX
), and an unsigned maximum (UINT
n_MAX
). (The unsigned minimum is 0 for
all sizes.) Note that INT32_MIN
and INT64_MIN
cannot be represented directly because of the
way that C defines numeric constants. Instead, they have to be written in relation to the
corresponding INT
n_MAX
values, but they are the values -2,147,483,648 and
-9,223,372,036,854,775,808, respectively. Also observe that the large values need to be appended
with U, LL, or ULL so that the C compiler will interpret the values correctly.
Code Listing A.5 illustrates a few key points about using these standard and fixed
types. Lines 6 – 10 declare a char
variable and print it both in ASCII (using %c
) and as an
integer value (in decimal, octal, and hexadecimal format). This code would work identically even if
the declaration of the achar variable was changed to uint8_t
, the 8-bit unsigned integer type.
Lines 13 – 16 demonstrate the standard printing for an int
type.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 | /* Code Listing A.5:
Using and printing various integer types
*/
/* Use char, printing it in 4 formats */
char achar = 'A';
printf ("achar has value %c (character)\n", achar);
printf ("achar has value %hhd (dec)\n", achar);
printf ("achar has value %hho (oct)\n", achar);
printf ("achar has value %04hhx (hex)\n\n", achar);
/* Use basic int type, printing value in 3 formats */
int intval = 42;
printf ("intval has value %d (dec)\n", intval);
printf ("intval has value %o (oct)\n", intval);
printf ("intval has value %04x (hex)\n\n", intval);
/* Use 32-bit signed integer, printing value */
int32_t i32val = 42;
printf ("i32val has value %d (dec)\n", i32val);
printf ("i32val has value %" PRId32 " (dec)\n", i32val);
printf ("i32val has value %" PRIo32 " (oct)\n", i32val);
printf ("i32val has value %04" PRIx32 " (hex)\n\n", i32val);
/* Use 64-bit unsigned integer, printing value */
uint64_t ui64val = 42;
printf ("ui64val has value %d (dec)\n", ui64val);
printf ("ui64val has value %lld (dec)\n", ui64val);
printf ("ui64val has value %" "lld" " (dec)\n", ui64val);
printf ("ui64val has value %" PRId64 " (dec)\n", ui64val);
printf ("ui64val has value %" PRIo64 " (oct\)\n", ui64val);
printf ("ui64val has value %04" PRIx64 " (hex)\n\n", ui64val);
|
The remaining lines (19 – 31) demonstrate similar handling using the fixed-width integer types.
These format strings initially look rather confusing, because they exploit a feature of string
constants that is often overlooked (or not taught) to those new to C: consecutive string constants
are concatenated automatically by the compiler. For example, the code printf ("hello world");
is
treated identically to printf ("hello " "world");
by the compiler. (The several white spaces
between the strings are ignored and used here only for effect.)
The format specifiers for these integer types are defined in the inttypes.h
header. Within this
file, there is a preprocessor definition that will translate PRId32
or PRIx64
into the
appropriate character strings ("d"
and "llx"
in this case) from Table A.2 for
the target machine. In general, the format specifier has the format PRI
fn, where n is the
size (8, 16, 32, or 64 bits) and f is the desired integer format (d
or i
for signed
decimal, u
for unsigned decimal, o
for octal, x
or X
for hexadecimal). Consequently,
the C compiler would treat lines 29 and 30 in Code Listing A.5 identically to line 28.
The PRId64
is translated by the processor into "lld"
, so lines 29 and 30 literally become
identical when the compiler begins to work with the code; the compiler then transparently
concatenates the string constants to create the string shown in line 28. When compiling this code,
observe that line 27 produces a warning, as the variable and the corresponding format specifier are
specifying different sizes (64-bit for uint64_t
and 32-bit for %d
).
Bug Warning
The quotation marks around the C99 format strings is a common point of confusion that can lead to
frustrating compiler errors. Specifically, note that there is a quote after the %
character
that begins the format specifier. This quote is necessary, as it terminates the string constant up
to that point. Any additional characters that modify the format specifier (such as 0 to indicate
leading zeros, a number to specify a fixed display width, or a -
to indicate left-justification
within that space) must appear before the quotation mark, as these are not part of the C99 format processing.
One common task with working with C types is using the sizeof()
operator [2] to examine the
number of bytes allocated for a variable or a type. In the case of a type, sizeof()
indicates
how many bytes would be allocated for such a variable instance. Code Listing A.6
demonstrates the use of sizeof()
, based off of the variable declarations in Code Listing A.5. As the value returned from sizeof()
is considered to be of type size_t
, the
%zd
format specifier is required when printing these values. On most systems, lines 7, 8, 12, and
13 will all produce the same size result (4), as int
and int32_t
are typically identical.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | /* Code Listing A.6:
Using sizeof() for introspection of variable and type sizes
*/
/* Use sizeof to get the size of a variable name */
size_t size = sizeof (intval);
printf ("intval has size %zd\n", size);
printf ("i32val has size %zd\n", sizeof (i32val));
printf ("ui64val has size %zd\n\n", sizeof (ui64val));
/* Use sizeof to get the size of a variable type */
printf ("Primitive int type has size %zd\n", sizeof (int));
printf ("C99 int32_t type has size %zd\n", sizeof (int32_t));
printf ("C99 int64_t type has size %zd\n", sizeof (int64_t));
printf ("Primitive size_t type has size %zd\n", sizeof (size_t));
|
As this Appendix is intended as a re-introduction to C, readers are generally assumed to have had
some prior exposure to key features of the language, including pointers. However, few concepts in
programming cause as many problems or as much frustration as pointers; as such, we will reiterate
the basic concept here, particularly as it relates to the preceding discussion on sizeof()
.
Later in this Appendix, we will also introduce some intermediate or advanced techniques of working
with pointers. To start, we emphasize the following key idea:
A pointer is a variable that stores an address.
When a pointer variable is declared, its declaration generally indicates what type of data is stored at that address. This target type is generally included when discussing a pointer’s type, but it does not change the fact that the pointer stores an address. For instance, by declaring int *x = NULL;
, we have declared the variable named x
to be a pointer to an int
. That is, we might write that x
is an int*
. It cannot be overstated that this statement is different from saying that x
is an int
. Part of what makes this subtly complex is that addresses are, fundamentally, integer values that have a particular meaning. Code Listing A.7 illustrates this confusion in a common manner.
1 2 3 4 5 6 7 8 9 10 11 12 13 | /* Code Listing A.7:
A subtle segfault bug
*/
int ival = 10;
int *iptr = &ival; // iptr points to ival
*iptr = 0; // sets ival to 0
printf ("iptr = %p; *iptr = %d\n", iptr, *iptr);
iptr = 0; // makes iptr point to nowhere (i.e., NULL)
/* next line causes a segfault since iptr is now NULL */
printf ("iptr = %p; *iptr = %d\n", iptr, *iptr);
|
Lines 5 and 6 declare two variables, one (ival
) has the type int
and the other (iptr
)
has the type int*
. The value of ival
is initialized to 10, whereas the value of iptr
is
the address of ival
(&ival
). That is, iptr
is initialized so that it points to ival
.
We might also say that iptr
is an indirect reference to ival
. Line 8 dereferences the
pointer (i.e., follows the arrow to the data pointed to) and sets that portion of memory to 0;
specifically, by dereferencing the pointer, line 8 is changing the value of ival
from 10 to 0.
As such, line 9 runs without error, printing the value of iptr
(which is the address of
ival
) along with the new value of ival
(because we are again dereferencing the pointer).
Line 11 is fundamentally different, however. Since line 11 is not dereferencing the pointer (note
that there is no *
on this line), it is changing the value of iptr
itself. Recall from above
that a pointer stores an address; if we change a pointer’s value, we are changing the memory
address that we are pointing to. By setting iptr
to 0, the pointer now points to address 0,
which is nothingness. That is, iptr
no longer points to ival
; it is now a NULL
pointer
(NULL
is literally the constant 0 in C!). As such, line 13 causes a segmentation fault, because
it tries to deference a null pointer (*iptr
).
A common misunderstanding about this example is that setting the pointer to NULL
does not
necessarily cause an error. Code Listing A.8 extends A.7, with the exception
that a new line is added just before the final printf() statement. In Code Listing A.7,
the code set iptr
to 0 and immediately called printf()
; in Code Listing A.8,
line 7 occurs between these steps, setting iptr
to point back to ival
. In restoring
iptr
’s value to point to a valid location, Code Listing A.8 removes the segmentation
fault that occurred in A.7. In short, setting a pointer’s value to NULL
does not
cause a segmentation fault; the fault only occurs once the pointer is dereferenced.
1 2 3 4 5 6 7 8 9 10 | /* Code Listing A.8:
Restoring successful execution to Code Listing A.7
*/
iptr = 0; // makes iptr point to nowhere (i.e., NULL)
iptr = &ival; // makes iptr point to ival again
/* next line no longer segfaults */
printf ("iptr = %p; *iptr = %d\n", iptr, *iptr);
|
To return to the discussion of type sizes, a common misunderstanding arises when we combine the
concepts of pointers and sizeof()
. Recall two facts: sizeof()
returns the number of bytes
required for a variable of a particular type, and all pointers store addresses (and nothing else!).
As such, calling sizeof() on a pointer will always return the same answer: the size of an address.
From the perspective of sizeof()
, pointer declarations are all identical.
Code Listing A.9 demonstrates this fact. The variables ival
and cval
have distinct sizes (typically, 4
bytes for ival
as an int
and 1 byte for cval
as a char
). In contrast, iptr
and
cptr
have different types, but they are the same size (8 bytes, assuming this code runs on a
64-bit CPU architecture). As such, the assertions on lines 17 and 18 are both true.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 | /* Code Listing A.9:
Using sizeof() with pointers produces different results from basic types
*/
int ival = 10;
int *iptr = &ival;
printf ("Size of ival = %zd\n", sizeof (ival));
printf ("Size of iptr = %zd\n", sizeof (iptr));
printf ("Size of *iptr = %zd\n\n", sizeof (*iptr));
char cval = 'a';
char *cptr = &cval;
printf ("Size of cval = %zd\n", sizeof (cval));
printf ("Size of cptr = %zd\n", sizeof (cptr));
printf ("Size of *cptr = %zd\n", sizeof (*cptr));
assert (sizeof (int *) == sizeof (char *));
assert (sizeof (int) != sizeof (char));
assert (sizeof (*iptr) != sizeof (*cptr));
|
Lastly, observe that sizeof()
can be called successfully on pointer dereferences, as shown in
lines 9 and 15. In this case, the size returned is the number of bytes for the referenced type, not
the pointer. Line 9 will print that the size is 4, as that is the typical size for an int
(which
is the type of *iptr
, also known as ival
). Line 15 will print that the size is 1, given that
*cptr
(also known as cval
) is a char
. Consequently, the assertion on line 19 holds, as
sizeof(*iptr)
gets the size of an int
, whereas sizeof(*cptr)
is the size of a char
.
Given that a pointer is a variable that stores an address, an intuitive follow-up question is
whether or not a pointer can store the address of a pointer. The answer is yes, and this is common
in a variety of circumstances that we will examine later in this Appendix.
Code Listing A.10 illustrates an example of using pointers to pointers. Line 6 starts by declaring
iptr
(abbreviation of “integer pointer”) as an int*
and making it point to ival
. Line 7
takes this a step farther, making pptr
(“pointer pointer”) be a pointer to a pointer to an
int
(int**
), and initializing it to point to iptr
; this initialization is correct, as
pptr
is pointing to an int*
. Line 8 introduces the use of void with a pointer structure;
specifically, the declaration indicates the vptr
(“void
pointer”) is a pointer to a pointer
to a pointer to something, but the type of that something is unknown.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 | /* Code Listing A.10:
Multiple layers of indirection with pointers to pointers
*/
int ival = 10;
int *iptr = &ival;
int **pptr = &iptr;
void ***vptr = (void***) &pptr; // this is okay
/* Multiple layers of redirection to the same place */
printf ("%d = %d = %d\n", ival, *iptr, **pptr);
// NOT ALLOWED: printf ("%d\n", ***vptr);
/* The size of any pointer is always the same */
assert (sizeof (iptr) == sizeof (pptr));
assert (sizeof (iptr) == sizeof (vptr));
/* Just for the full, ridiculous effect */
assert (sizeof (char*) == sizeof (int******));
assert (sizeof (void*) == sizeof (double***));
|
Figure 10.3.4 illustrates the relationship of the pointers as declared in lines 5 – 8.
Line 11 prints the value 10 multiple times by dereferencing the pointers as needed. Specifically,
line 11 starts by printing ival
directly, and the value of ival
is 10. Then, the value of
ival
is printed again by dereferencing iptr
; since iptr
is an int*
and the value of
iptr
is &ival
, dereferencing it once (as *iptr
) ends up at the int
variable
ival
. Similarly, the value of ival
is printed one more time, as pptr
is dereferenced
twice (**pptr
); the first dereference *
follows the arrow from pptr
to iptr
and the
second *
follows the arrow from iptr
to ival
.
The comment on line 12 indicates that we cannot simply dereference vptr
three times to get the
same value. The problem is that dereferencing requires knowing the type that we are pointing to. The
void
pointer type—regardless of how many layers of indirection are present—serves the same
purpose in C as the Object
class in Java; it acts as a catch-all type for passing around
pointers. We could successfully dereference vptr
by first casting it (i.e., by using ***((int***)vptr)
).
Lines 14 – 20 highlight another key insight about pointers: All pointers have the same size. It
is a very common misunderstanding to think that the type of the pointer (int*
vs. char*
) has
some influence on the size of the pointer. This is not true. All pointers are allocated exactly the
amount of space required to store an address on the target architecture. The pointer’s type is only
important for dereferencing the pointer. Note that the assertion on line 20 also holds. A common
misunderstanding is that void*
either has no size or it points to nothing; neither statement is
true (unless the pointer is specifically initialized to NULL
, then the latter is true). The
void*
type means that we have a pointer variable and its size is the same as any other pointer,
but we do not know the size of what it is pointing to. We can cast a void*
to be any other type
of pointer; doing so would indicate how to use the bytes being pointed to when dereferencing the pointer.
[1] | To be pedantic, char and signed char are distinct types in C. The char type is
defined as the smallest addressable unit for containing a machine’s basic character set; as this
basic character set is ASCII for modern systems, the smallest addressable unit is an 8-bit byte.
The signed char is defined in terms of the range shown. |
[2] | The sizeof() operator looks like a function and generally behaves like one, taking an
input argument and return a value. This appearance is deceiving, though, as sizeof() does not
execute according to the standard function calling semantics. Instead, sizeof() is an operator
that is built into the language, like + to indicate addition. |