The Internet provides many ways to find documentation via web searches that lead to Stack Overflow. This approach can be helpful when the provided code is easily adaptable, but it can also be frustrating when the explanation is incomplete or incorrect. [1] In particular, solutions found in this way can often demonstrate what the correct approach is but not clearly identify the source of the error or misunderstanding. System manuals (referred as man pages to in the UNIX tradition) and command-line debuggers can become powerful tools when learned.
Throughout this book, we have generally adhered to the POSIX.1-2017 specification for the C programming interface (also known officially as IEEE Std 1003.1, 2017 Edition and The Open Group Technical Standard Base Specifications, Issue 7). This specification is published freely online by The Open Group at:
Often, however, it is convenient or even necessary to access the system manual directly from the
command line. For instance, recall that macOS is a UNIX OS based on BSD UNIX, while Linux is a
UNIX-like OS that was developed independently. As such, there are slight differences between the C
interfaces (particularly in relation to IPC) between the two. When these differences arise, it is
necessary to consult the documentation that is specific to that particular OS. The man
command-line utility provides that interface. Documentation for any C function or system call can
(generally, with an exception described below) be found by typing man
followed by the name of
the function. To get started with man
, you can read its own man
page (use arrows to move
up/down and press 'q'
to quit):
$ man man
One issue that arises with man pages is that there are naming conflicts between command-line
utilities and C functions. These conflicts can be resolved by specifying the section of the
manual as an integer parameter before the function name. The most common sections of interest for
systems programming are sections 1 (executable programs and command-line utilities), 2 (system calls
provided by the kernel), and 3 (C standard library functions that are not system calls). As an
example, compare the following two man
page entries; the former brings up the page for the
bash mkdir
command, while the latter brings up the C function documentation):
$ man mkdir
$ man 2 mkdir
The header of the man
page indicates a more precise naming convention to indicate the section
under consideration. Using the examples above, the default behavior for man mkdir
is to find
mkdir(1)
, the command-line utility, as opposed to mkdir(2)
, the system call. Besides a
header and footer that document the function’s section of the manual, the structure for man
pages for C functions generally follows a specified format (may not contain all of these fields):
Field title | Purpose of the field |
---|---|
NAME |
Quick description of the function or utility |
LIBRARY |
Which libraries must be linked to the compiled code (sometimes included as part of the SYNOPSIS ) |
SYNOPSIS |
Required header #include statements and the function prototype |
DESCRIPTION |
A detailed description of what the function does, with key usage issues or considerations highlighted |
RETURN VALUES |
How to interpret possible values returned from the function |
ERRORS |
A list of constants that the function might assign to errno when an error occurs; these constants begin with 'E' and are printed in brackets |
SEE ALSO |
Other functions that serve related purposes |
STANDARDS |
Which POSIX standard defines the function |
HISTORY |
When was the function introduced to UNIX |
BUGS |
Possible input sources that cause known bugs |
man
page
Beyond just providing information about what the function is or does, these sections provide hints
for how to deal with errors. Specifically, when a problem arises with compilation or a run-time
crash occurs, the following man
page fields provide a quick solution:
SYNOPSIS
– Many functions rely on a particularstruct
declaration that is defined in a standard header file. This field enumerates all of the headers that are required to be set as#include
statements to use the function. This field is also particularly helpful for making sure that arguments are being passed in the correct order.LIBRARY
– Some functions require linking to additional libraries. For instance, thepow()
function (used to calculate raising a base to some power) is in the C math library; some systems require explicitly linking executables with the-lm
flag forgcc
.RETURN VALUES
– Some functions return a simple binary value to indicate success or failure, while others return a quantitative value (such as the number of bytes read). Treating return values incorrectly can lead to many bugs in systems code.ERRORS
– Many functions use a generic return value to indicate an error has occurred. For instance,read()
returns -1 to indicate that the requested operation failed; the global variableerrno
is set to explain why the failure occurred. In the case ofread(2)
, the possible errors includeEAGAIN
(file is marked for non-blocking I/O, but no data is ready to read),EINTR
(the device was interrupted by a signal),EINVAL
(the file descriptor was negative), orEBADF
(the file descriptor is not open for reading). Comparingerrno
with these constants in the case of a failure to read can be an important clue in debugging.
One last note about man
pages is that non-standard libraries typically provide an interface that
can be consulted as above. The primary difference is that these libraries might be installed in
places that the man command does not know to check. The -M
flag can fix this problem. As an
example, on macOS, the Homebrew [2] utility can be used to install OpenSSL. The default behavior
of Homebrew is to install such libraries in /usr/local/opt
, which is not searched by man
. As
such, the documentation for OpenSSL functions can be found by specifying the path as follows:
$ man -M /usr/local/opt/openssl/share/man EVP_EncryptInit
One hurdle that all programmers must overcome is that errors and crashes indicate that there is a difference between what the code does and what the code should do (or rather, what the programmer thinks the code should do). Debugging is the art of bridging this gap. When a program crashes or produces obviously incorrect output, the programmer must figure out where in the code this gap exists and what is causing it. The GNU debugger (GDB) is a powerful tool that can help to bridge this gap. Besides just stepping through code, GDB provides a number of built-in tools that can assist with debugging. More documentation on GDB can be found at:
In the systems programming field, particularly with languages like C, the lack of exception handling makes debugging challenging. Often, the only indication that the error has occurred is the notoriously vague “Segmentation fault.” To get started with GDB, we will use it to examine Code Listing A.1.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 | /* Code Listing A.1:
Code for tracing watchpoints and breakpoints in GDB
*/
#include <stdio.h>
int
helper (int input)
{
return input + 1;
}
int
main (int argc)
{
int x = argc;
int y = helper (x);
x = 10;
printf ("x = %d; y = %d\n", x, y);
return 0;
}
|
To get started with GDB, we must compile our program with the -g
flag for gcc
, indicating
that we want to include debugging symbols. Without debugging symbols, we cannot refer to variables
or (in certain circumstances) functions by their names. In the example code above, the steps to
compile it, run it (without GDB), and run it again (with GDB) are as follows (note that GDB requires
passing the --args
option if there are command-line arguments):
$ gcc -g -o watch watch.c
$ ./watch 5
x = 10; y = 3
$ gdb --args ./watch 5
GNU gdb (Ubuntu 8.1-0ubuntu3.2) 8.1.0.20180409-git
The following GDB session illustrates how to use watchpoints and breakpoints. In
GDB, we can set a watchpoint on a variable and be notified any time that variable’s value changes.
(In fact, we can set watchpoints for any arbitrary memory location, but that is beyond our current
scope.) We can set a breakpoint for a function name and be notified any time that function is
called. In this first GDB session, we set a breakpoint for the helper()
function and a
watchpoint for the x
variable in main()
.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 | GNU gdb (Ubuntu 8.1-0ubuntu3.2) 8.1.0.20180409-git
[GDB license info omitted...]
Reading symbols from ./watch...done.
(gdb) start
Temporary breakpoint 1 at 0x661: file watch.c, line 12.
Starting program: /home/csf/watch
Temporary breakpoint 1, main (argc=2) at watch.c:12
13 int x = 5;
(gdb) break helper
Breakpoint 2 at 0x555555554651: file watch.c, line 6.
(gdb) watch x
Hardware watchpoint 3: x
(gdb) watch input
No symbol "input" in current context.
(gdb)
|
For simplicity, we have omitted the several lines of introductory text that GDB shows regarding its
license and documentation. The GDB command prompts are shown as bolded lines, where (gdb)
indicates the prompt and our input is shown afterwards. On line 4 of this session, we use the
start
command to begin the program’s execution, which will pause when the main()
function
begins execution. When GDB executes lines of code as the result of a command, it displays the next
line of code that will be executed in the future. So on line 9, GDB is indicating that it paused
just before line 12 of Code Listing A.1, which initializes the x
variable.
Lines 10 and 12 of the GDB session set up the breakpoint and watchpoint with their respective
commands, break
and watch
. Note that these commands require the target symbols to be visible
by GDB when they are run. That means that the program must be compiled with debugging symbols
included and the variable must be in scope. On line 14 of this GDB session, we cannot set a
watchpoint for the input
variable, which is only defined within the scope of the helper()
function; GDB is currently executing inside the main()
function scope.
The main commands to execute code in GDB are step
, next
, and continue
. The step
and
next
commands (not shown here) would simply execute the next line of code. In the session above,
running either of these commands would execute the line 12 of Code Listing A.1, which sets the
variable x
to 5. The difference between these two commands arises when the next line of code is
a function call. The step
command would step into the called function’s body, whereas the
next
command treats the function call as an opaque box, executing the entire function as one
step. Based on the previous session, we use the continue
command, which allows the program to
run until it is interrupted.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 | (gdb) continue
Continuing.
Hardware watchpoint 3: x
Old value = 0
New value = 2
main () at watch.c:13
14 int y = helper (x);
(gdb) continue
Continuing.
Breakpoint 2, helper (input=2) at watch.c:6
7 return input + 1;
(gdb) continue
Continuing.
Hardware watchpoint 3: x
Old value = 2
New value = 10
main () at watch.c:15
16 printf ("x = %d; y = %d\n", x, y);
(gdb) print x
$1 = 10
(gdb)
|
In this session, we can observe the effects of the watchpoint and breakpoint that we set previously.
When GDB encounters code that changes the value of a variable being observed with a watchpoint, it
will pause the execution to indicate the old and new values (x
has changed from 0 to 5), as well
as the next line of code when execution resumes (line 13 to call the helper()
function). By
continuing a second time (line 10), this session encounters the breakpoint when the execution of
helper()
begins. Using breakpoints like this makes it easy to skip over large chunks of code and
pausing right before the execution of a function that we wish to focus on. Line 15 of this session
performs another continue, pausing on line 15 of the code; as x
’s value changes from 5 to 10,
the watchpoint is triggered again. Finally, note that we can also print in-scope variables at any
point to observe their current value (line 24).
To illustrate the end of a GDB session, we execute the continue
command two more times. The
first continue causes an interrupt to arise when the return
from main()
occurs. Once this
return
happens, the variable x
no longer exists, so the watchpoint can be deleted (lines 5
and 6). Note that this deletion implies GDB can distinguish between the variable being temporarily
out of scope (as x
is not in scope while helper()
is executing) and being no longer needed.
Lines 7 – 9 of this session can be ignored here, as it is simply GDB informing us (indirectly) that
the source code containing the next line to execute cannot be found; this typically happens when GDB
pauses during the executing of the C standard library, as these libraries are not compiled with
debugging symbols included. The last continue (line 10) runs the program until it finishes, with
line 12 informing us that the process has been terminated.
1 2 3 4 5 6 7 8 9 10 11 12 13 | (gdb) continue
Continuing.
x = 10; y = 6
Watchpoint 3 deleted because the program has left the block in
which its expression is valid.
__libc_start_main (main=0x555555554659 <main>, argc=2, argv=0x7fffffffea18, init=<optimized out>,
fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7fffffffea08) at ../csu/libc-start.c:344
344 ../csu/libc-start.c: No such file or directory.
(gdb) continue
Continuing.
[Inferior 1 (process 16005) exited normally]
(gdb)
|
One significant source of frustration for systems programming is the lack of information regarding segmentation faults. Backtrace is an essential tool for debugging segmentation faults, such as the code shown in Code Listing A.2.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | /* Code Listing A.2:
Dereferencing a NULL pointer to trigger a segmentation fault
*/
void
segfault (int value)
{
int *nullptr = (int *) value;
*ptr = 5;
}
int
main (void)
{
segfault ();
return 0;
}
|
Compiling and running this program would produce the following unhelpful results: [3]
$ gcc -g -o segfault segfault.c
segfault.c: In function ‘segfault’:
segfault.c:4:18: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
int *nullptr = (int *) value;
^
$ ./segfault
Segmentation fault
Diagnosing and fixing segmentation faults can be very difficult. One reason is that the message above provides
no information about what line of code was executing when the segmentation fault occurred. Many
programmers, particularly novices, try to address this with “printf()
debugging,” the ad hoc
approach of adding printf()
statements to the code. This approach is inefficient and ineffective
for a number of reasons. First, it requires re-editing and re-compiling the program. Second, adding
any code, including printf()
, changes the program. In some cases, this seemingly innocuous
change will cause the segmentation fault to disappear—but only if the printf()
stays in the
code; removing the printf()
brings the segmentation fault back. Lastly, adding code increases
the opportunity for bugs to cause misleading results. For instance, changing the main()
of Code
Listing A.2 to that shown in Code Listing A.3 will produce identical results;
without a '\n'
, the STDOUT
buffer will not be flushed and the message in the printf()
will never be displayed to the screen.
1 2 3 4 5 6 7 8 9 10 11 | /* Code Listing A.3:
Printing without newlines may prevent the display of the message
*/
int
main (void)
{
printf ("Hello from main()");
segfault ();
return 0;
}
|
GDB’s backtrace
utility provides a more efficient and effective approach to debugging. Line 10
of this GDB session informs us that GDB encountered a segmentation fault (SIGSEGV)
while
executing line 5 of Code Listing A.2. Line 11 also provides information about the
function that was executing at the time (segfault()
), and what its current input arguments were
(value=0
). That information might be enough if the function is only called once or if the
relationship between the arguments and the segmentation fault is clear. However, it is often the
case that more context is needed; in order to debug the segmentation fault, we need to know what
specific call—including the arguments passed and where it was called from—produced these results.
This information is provided by backtrace
, which shows the sequence of function calls that led
to the segmentation fault.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 | (gdb) start
Temporary breakpoint 1 at 0x61b: file segfault.c, line 11.
Starting program: /home/csf/segfault
Temporary breakpoint 1, main () at segfault.c:11
11 segfault (0);
(gdb) continue
Continuing.
Program received signal SIGSEGV, Segmentation fault.
0x000055555555460e in segfault (value=0) at segfault.c:5
5 *nullptr = 5;
(gdb) backtrace
#0 0x000055555555460e in segfault (value=0) at segfault.c:5
#1 0x0000555555554625 in main () at segfault.c:11
(gdb)
|
Lines 14 and 15 show that the segmentation fault occurred while executing line 5 of Code Listing
A.2. This line exists within the segfault()
function, which was specifically called
with the argument value=0
. The call to segfault(0)
occurred on line 11 of the code, which
exists as part of the call to main()
. Unless the contents of the stack have become corrupted
(which go beyond our discussion here), this history will show the full trace [4] back to the
main()
context. That information provides more context that the programmer can use to determine
what might be the root cause of the segmentation fault.
Debugging segmentation faults—and other bugs—with multiple processes can be particularly
challenging. For one thing, the asynchronous timing of the execution might intersperse output
messages in odd ways. Even more frustrating, messages written to the STDERR
stream are also lost
unless the output streams are specifically linked. Consider the example shown in Code Listing A.4, which uses a slight variant on the same segfault()
function from Code Listing A.2.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 | /* Code Listing A.4:
The child's "Segmentation fault" will not appear on screen
*/
#include <assert.h>
#include <stdio.h>
#include <sys/wait.h>
#include <unistd.h>
void
segfault (int *ptr)
{
*ptr = 5;
}
int
main (void)
{
int *nullptr = NULL;
pid_t child = fork ();
assert (child >= 0);
if (child == 0)
segfault (nullptr);
else
wait (NULL);
printf ("Goodbye\n");
return 0;
}
|
As with the previous cases, running this code should result in a segmentation fault. However, the results would actually look as follows:
$ gcc -g -o forkfault forkfault.c
$ ./forkfault
Goodbye
In this case, the parent process calls fork()
to create a child process. Based on the
nondeterministic timing of process scheduling, either the parent continues executing or the child
runs first. This nondeterminism is not an issue here, though; if the parent runs, it calls
wait()
until the child runs to completion. Consequently, when the parent executes the
printf()
call on line 24, the child has been guaranteed to run to completion; that child process
definitely terminated with a segmentation fault because of the code on line 13. The reason the
child’s error message does not appear on the terminal is that the child process’s STDERR
output
stream is thrown away.
To debug these kinds of errors in multiple processes, GDB allows you to specify which process to
follow when a fork()
occurs. Specifically, the default behavior is that GDB will follow the
parent; if you use next or continue to move past a fork()
, GDB will show a message about a new
process but continue executing the parent. To change this behavior, the set command can be used to
change GDB’s follow-fork-mode
setting as shown below. Based on this new setting, GDB will switch
to the child process after the call to fork()
; additional GDB commands will be sent to the child
process rather than the parent.
(gdb) start
Temporary breakpoint 2 at 0x6f7: file forkfault.c, line 15.
Starting program: /home/csf/forkfault
Breakpoint 1, main () at forkfault.c:15
15 int *nullptr = NULL;
(gdb) set follow-fork-mode child
(gdb) continue
Continuing.
[New process 9666]
Thread 2.1 "forkfault" received signal SIGSEGV, Segmentation fault.
[Switching to process 9666]
0x0000555555554726 in segfault (ptr=0x0) at forkfault.c:9
9 *ptr = 5;
(gdb) backtrace
#0 0x0000555555554726 in segfault (ptr=0x0) at forkfault.c:9
#1 0x000055555555477e in main () at forkfault.c:20
(gdb)
The full GDB manual provides much more information about debugging multiple processes. One
particularly useful command is info proc
, which can be used to examine information about the
status of a process, including any memory map regions, memory usage, number of threads, and many
others. Information on info proc
can be found at:
[1] | The use of web searches and sites like Stack Overflow also raise serious ethical concerns about attribution and plagiarism. Reusing others’ code, particularly without properly citing these sources or documenting that the use is permitted, can lead to legal ramifications in professional practice and to academic misconduct charges. |
[2] | Homebrew can be installed from https://brew.sh . Once Homebrew is installed, the OpenSSL library can be installed by running brew install openssl . |
[3] | First, note that the compiler tries to help us avoid the segmentation fault. Warnings are quite frequently an indication that the code is syntactically legal but semantically wrong. Technically, we’re allowed to do something, but it’s probably not going to work out correctly. |
[4] | This type of trace might look familiar to readers with experience in other languages, such as Java or Python. Those languages include built-in exception-handling mechanisms, using the same mechanisms that GDB relies on, that provide this information for free. |