10.2. Documentation and Debugging¶

The Internet provides many ways to find documentation via web searches that lead to Stack Overflow. This approach can be helpful when the provided code is easily adaptable, but it can also be frustrating when the explanation is incomplete or incorrect. [1] In particular, solutions found in this way can often demonstrate what the correct approach is but not clearly identify the source of the error or misunderstanding. System manuals (referred as man pages to in the UNIX tradition) and command-line debuggers can become powerful tools when learned.

10.2.1. Man Pages¶

Throughout this book, we have generally adhered to the POSIX.1-2017 specification for the C programming interface (also known officially as IEEE Std 1003.1, 2017 Edition and The Open Group Technical Standard Base Specifications, Issue 7). This specification is published freely online by The Open Group at:

https://publications.opengroup.org/standards/unix/c181

https://pubs.opengroup.org/onlinepubs/9699919799/

Often, however, it is convenient or even necessary to access the system manual directly from the command line. For instance, recall that macOS is a UNIX OS based on BSD UNIX, while Linux is a UNIX-like OS that was developed independently. As such, there are slight differences between the C interfaces (particularly in relation to IPC) between the two. When these differences arise, it is necessary to consult the documentation that is specific to that particular OS. The man command-line utility provides that interface. Documentation for any C function or system call can (generally, with an exception described below) be found by typing man followed by the name of the function. To get started with man, you can read its own man page (use arrows to move up/down and press 'q' to quit):

$ man man

One issue that arises with man pages is that there are naming conflicts between command-line utilities and C functions. These conflicts can be resolved by specifying the section of the manual as an integer parameter before the function name. The most common sections of interest for systems programming are sections 1 (executable programs and command-line utilities), 2 (system calls provided by the kernel), and 3 (C standard library functions that are not system calls). As an example, compare the following two man page entries; the former brings up the page for the bash mkdir command, while the latter brings up the C function documentation):

$ man mkdir
$ man 2 mkdir

The header of the man page indicates a more precise naming convention to indicate the section under consideration. Using the examples above, the default behavior for man mkdir is to find mkdir(1), the command-line utility, as opposed to mkdir(2), the system call. Besides a header and footer that document the function’s section of the manual, the structure for man pages for C functions generally follows a specified format (may not contain all of these fields):

Field title	Purpose of the field
`NAME`	Quick description of the function or utility
`LIBRARY`	Which libraries must be linked to the compiled code (sometimes included as part of the `SYNOPSIS`)
`SYNOPSIS`	Required header `#include` statements and the function prototype
`DESCRIPTION`	A detailed description of what the function does, with key usage issues or considerations highlighted
`RETURN VALUES`	How to interpret possible values returned from the function
`ERRORS`	A list of constants that the function might assign to `errno` when an error occurs; these constants begin with `'E'` and are printed in brackets
`SEE ALSO`	Other functions that serve related purposes
`STANDARDS`	Which POSIX standard defines the function
`HISTORY`	When was the function introduced to UNIX
`BUGS`	Possible input sources that cause known bugs

Table A.1: Common fields of a man page

Beyond just providing information about what the function is or does, these sections provide hints for how to deal with errors. Specifically, when a problem arises with compilation or a run-time crash occurs, the following man page fields provide a quick solution:

SYNOPSIS – Many functions rely on a particular struct declaration that is defined in a standard header file. This field enumerates all of the headers that are required to be set as #include statements to use the function. This field is also particularly helpful for making sure that arguments are being passed in the correct order.

LIBRARY – Some functions require linking to additional libraries. For instance, the pow() function (used to calculate raising a base to some power) is in the C math library; some systems require explicitly linking executables with the -lm flag for gcc.

RETURN VALUES – Some functions return a simple binary value to indicate success or failure, while others return a quantitative value (such as the number of bytes read). Treating return values incorrectly can lead to many bugs in systems code.

ERRORS – Many functions use a generic return value to indicate an error has occurred. For instance, read() returns -1 to indicate that the requested operation failed; the global variable errno is set to explain why the failure occurred. In the case of read(2), the possible errors include EAGAIN (file is marked for non-blocking I/O, but no data is ready to read), EINTR (the device was interrupted by a signal), EINVAL (the file descriptor was negative), or EBADF (the file descriptor is not open for reading). Comparing errno with these constants in the case of a failure to read can be an important clue in debugging.

One last note about man pages is that non-standard libraries typically provide an interface that can be consulted as above. The primary difference is that these libraries might be installed in places that the man command does not know to check. The -M flag can fix this problem. As an example, on macOS, the Homebrew [2] utility can be used to install OpenSSL. The default behavior of Homebrew is to install such libraries in /usr/local/opt, which is not searched by man. As such, the documentation for OpenSSL functions can be found by specifying the path as follows:

$ man -M /usr/local/opt/openssl/share/man EVP_EncryptInit

10.2.2. Debugging with GDB¶

One hurdle that all programmers must overcome is that errors and crashes indicate that there is a difference between what the code does and what the code should do (or rather, what the programmer thinks the code should do). Debugging is the art of bridging this gap. When a program crashes or produces obviously incorrect output, the programmer must figure out where in the code this gap exists and what is causing it. The GNU debugger (GDB) is a powerful tool that can help to bridge this gap. Besides just stepping through code, GDB provides a number of built-in tools that can assist with debugging. More documentation on GDB can be found at:

https://www.gnu.org/software/gdb/documentation/

10.2.2.1. Breakpoints and Watchpoints¶

In the systems programming field, particularly with languages like C, the lack of exception handling makes debugging challenging. Often, the only indication that the error has occurred is the notoriously vague “Segmentation fault.” To get started with GDB, we will use it to examine Code Listing A.1.

/* Code Listing A.1:
   Code for tracing watchpoints and breakpoints in GDB 
 */

#include <stdio.h>

int
helper (int input)
{
  return input + 1;
}

int
main (int argc)
{
  int x = argc;
  int y = helper (x);
  x = 10;
  printf ("x = %d; y = %d\n", x, y);

  return 0;
}

To get started with GDB, we must compile our program with the -g flag for gcc, indicating that we want to include debugging symbols. Without debugging symbols, we cannot refer to variables or (in certain circumstances) functions by their names. In the example code above, the steps to compile it, run it (without GDB), and run it again (with GDB) are as follows (note that GDB requires passing the --args option if there are command-line arguments):

$ gcc -g -o watch watch.c
$ ./watch 5
x = 10; y = 3
$ gdb --args ./watch 5
GNU gdb (Ubuntu 8.1-0ubuntu3.2) 8.1.0.20180409-git

The following GDB session illustrates how to use watchpoints and breakpoints. In GDB, we can set a watchpoint on a variable and be notified any time that variable’s value changes. (In fact, we can set watchpoints for any arbitrary memory location, but that is beyond our current scope.) We can set a breakpoint for a function name and be notified any time that function is called. In this first GDB session, we set a breakpoint for the helper() function and a watchpoint for the x variable in main().

GNU gdb (Ubuntu 8.1-0ubuntu3.2) 8.1.0.20180409-git
[GDB license info omitted...]
Reading symbols from ./watch...done.
(gdb) start
Temporary breakpoint 1 at 0x661: file watch.c, line 12.
Starting program:  /home/csf/watch 

Temporary breakpoint 1, main (argc=2) at watch.c:12
13	  int x = 5;
(gdb) break helper
Breakpoint 2 at 0x555555554651: file watch.c, line 6.
(gdb) watch x
Hardware watchpoint 3: x
(gdb) watch input
No symbol "input" in current context.
(gdb)

For simplicity, we have omitted the several lines of introductory text that GDB shows regarding its license and documentation. The GDB command prompts are shown as bolded lines, where (gdb) indicates the prompt and our input is shown afterwards. On line 4 of this session, we use the start command to begin the program’s execution, which will pause when the main() function begins execution. When GDB executes lines of code as the result of a command, it displays the next line of code that will be executed in the future. So on line 9, GDB is indicating that it paused just before line 12 of Code Listing A.1, which initializes the x variable.

Lines 10 and 12 of the GDB session set up the breakpoint and watchpoint with their respective commands, break and watch. Note that these commands require the target symbols to be visible by GDB when they are run. That means that the program must be compiled with debugging symbols included and the variable must be in scope. On line 14 of this GDB session, we cannot set a watchpoint for the input variable, which is only defined within the scope of the helper() function; GDB is currently executing inside the main() function scope.

The main commands to execute code in GDB are step, next, and continue. The step and next commands (not shown here) would simply execute the next line of code. In the session above, running either of these commands would execute the line 12 of Code Listing A.1, which sets the variable x to 5. The difference between these two commands arises when the next line of code is a function call. The step command would step into the called function’s body, whereas the next command treats the function call as an opaque box, executing the entire function as one step. Based on the previous session, we use the continue command, which allows the program to run until it is interrupted.

(gdb) continue
Continuing.

Hardware watchpoint 3: x

Old value = 0
New value = 2
main () at watch.c:13
14	  int y = helper (x);
(gdb) continue
Continuing.

Breakpoint 2, helper (input=2) at watch.c:6
7	  return input + 1;
(gdb) continue
Continuing.

Hardware watchpoint 3: x

Old value = 2
New value = 10
main () at watch.c:15
16	  printf ("x = %d; y = %d\n", x, y);
(gdb) print x
$1 = 10
(gdb)

In this session, we can observe the effects of the watchpoint and breakpoint that we set previously. When GDB encounters code that changes the value of a variable being observed with a watchpoint, it will pause the execution to indicate the old and new values (x has changed from 0 to 5), as well as the next line of code when execution resumes (line 13 to call the helper() function). By continuing a second time (line 10), this session encounters the breakpoint when the execution of helper() begins. Using breakpoints like this makes it easy to skip over large chunks of code and pausing right before the execution of a function that we wish to focus on. Line 15 of this session performs another continue, pausing on line 15 of the code; as x’s value changes from 5 to 10, the watchpoint is triggered again. Finally, note that we can also print in-scope variables at any point to observe their current value (line 24).

To illustrate the end of a GDB session, we execute the continue command two more times. The first continue causes an interrupt to arise when the return from main() occurs. Once this return happens, the variable x no longer exists, so the watchpoint can be deleted (lines 5 and 6). Note that this deletion implies GDB can distinguish between the variable being temporarily out of scope (as x is not in scope while helper() is executing) and being no longer needed. Lines 7 – 9 of this session can be ignored here, as it is simply GDB informing us (indirectly) that the source code containing the next line to execute cannot be found; this typically happens when GDB pauses during the executing of the C standard library, as these libraries are not compiled with debugging symbols included. The last continue (line 10) runs the program until it finishes, with line 12 informing us that the process has been terminated.

(gdb) continue
Continuing.
x = 10; y = 6

Watchpoint 3 deleted because the program has left the block in
which its expression is valid.
__libc_start_main (main=0x555555554659 <main>, argc=2, argv=0x7fffffffea18, init=<optimized out>, 
    fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7fffffffea08) at ../csu/libc-start.c:344
344	../csu/libc-start.c: No such file or directory.
(gdb) continue
Continuing.
[Inferior 1 (process 16005) exited normally]
(gdb)

10.2.2.2. Backtrace¶

One significant source of frustration for systems programming is the lack of information regarding segmentation faults. Backtrace is an essential tool for debugging segmentation faults, such as the code shown in Code Listing A.2.

/* Code Listing A.2:
   Dereferencing a NULL pointer to trigger a segmentation fault
 */

void
segfault (int value)
{
  int *nullptr = (int *) value;
  *ptr = 5;
}

int
main (void)
{
  segfault ();
  return 0;
}

Compiling and running this program would produce the following unhelpful results: [3]

$ gcc -g -o segfault segfault.c
segfault.c: In function ‘segfault’:
segfault.c:4:18: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
   int *nullptr = (int *) value;
                  ^
$ ./segfault
Segmentation fault

Diagnosing and fixing segmentation faults can be very difficult. One reason is that the message above provides no information about what line of code was executing when the segmentation fault occurred. Many programmers, particularly novices, try to address this with “printf() debugging,” the ad hoc approach of adding printf() statements to the code. This approach is inefficient and ineffective for a number of reasons. First, it requires re-editing and re-compiling the program. Second, adding any code, including printf(), changes the program. In some cases, this seemingly innocuous change will cause the segmentation fault to disappear—but only if the printf() stays in the code; removing the printf() brings the segmentation fault back. Lastly, adding code increases the opportunity for bugs to cause misleading results. For instance, changing the main() of Code Listing A.2 to that shown in Code Listing A.3 will produce identical results; without a '\n', the STDOUT buffer will not be flushed and the message in the printf() will never be displayed to the screen.

/* Code Listing A.3:
   Printing without newlines may prevent the display of the message
 */

int
main (void)
{
  printf ("Hello from main()");
  segfault ();
  return 0;
}

GDB’s backtrace utility provides a more efficient and effective approach to debugging. Line 10 of this GDB session informs us that GDB encountered a segmentation fault (SIGSEGV) while executing line 5 of Code Listing A.2. Line 11 also provides information about the function that was executing at the time (segfault()), and what its current input arguments were (value=0). That information might be enough if the function is only called once or if the relationship between the arguments and the segmentation fault is clear. However, it is often the case that more context is needed; in order to debug the segmentation fault, we need to know what specific call—including the arguments passed and where it was called from—produced these results. This information is provided by backtrace, which shows the sequence of function calls that led to the segmentation fault.

(gdb) start
Temporary breakpoint 1 at 0x61b: file segfault.c, line 11.
Starting program: /home/csf/segfault 

Temporary breakpoint 1, main () at segfault.c:11
11	  segfault (0);
(gdb) continue
Continuing.

Program received signal SIGSEGV, Segmentation fault.
0x000055555555460e in segfault (value=0) at segfault.c:5
5	  *nullptr = 5;
(gdb) backtrace
#0  0x000055555555460e in segfault (value=0) at segfault.c:5
#1  0x0000555555554625 in main () at segfault.c:11
(gdb)

Lines 14 and 15 show that the segmentation fault occurred while executing line 5 of Code Listing A.2. This line exists within the segfault() function, which was specifically called with the argument value=0. The call to segfault(0) occurred on line 11 of the code, which exists as part of the call to main(). Unless the contents of the stack have become corrupted (which go beyond our discussion here), this history will show the full trace [4] back to the main() context. That information provides more context that the programmer can use to determine what might be the root cause of the segmentation fault.

10.2.2.3. Tracing multiple processes¶

Debugging segmentation faults—and other bugs—with multiple processes can be particularly challenging. For one thing, the asynchronous timing of the execution might intersperse output messages in odd ways. Even more frustrating, messages written to the STDERR stream are also lost unless the output streams are specifically linked. Consider the example shown in Code Listing A.4, which uses a slight variant on the same segfault() function from Code Listing A.2.

/* Code Listing A.4:
   The child's "Segmentation fault" will not appear on screen
 */

#include <assert.h>
#include <stdio.h>
#include <sys/wait.h>
#include <unistd.h>

void
segfault (int *ptr)
{
  *ptr = 5;
}

int
main (void)
{
  int *nullptr = NULL;
  pid_t child = fork ();
  assert (child >= 0);

  if (child == 0)
    segfault (nullptr);
  else
    wait (NULL);

  printf ("Goodbye\n");
  return 0;
}

As with the previous cases, running this code should result in a segmentation fault. However, the results would actually look as follows:

$ gcc -g -o forkfault forkfault.c
$ ./forkfault
Goodbye

In this case, the parent process calls fork() to create a child process. Based on the nondeterministic timing of process scheduling, either the parent continues executing or the child runs first. This nondeterminism is not an issue here, though; if the parent runs, it calls wait() until the child runs to completion. Consequently, when the parent executes the printf() call on line 24, the child has been guaranteed to run to completion; that child process definitely terminated with a segmentation fault because of the code on line 13. The reason the child’s error message does not appear on the terminal is that the child process’s STDERR output stream is thrown away.

To debug these kinds of errors in multiple processes, GDB allows you to specify which process to follow when a fork() occurs. Specifically, the default behavior is that GDB will follow the parent; if you use next or continue to move past a fork(), GDB will show a message about a new process but continue executing the parent. To change this behavior, the set command can be used to change GDB’s follow-fork-mode setting as shown below. Based on this new setting, GDB will switch to the child process after the call to fork(); additional GDB commands will be sent to the child process rather than the parent.

(gdb) start
Temporary breakpoint 2 at 0x6f7: file forkfault.c, line 15.
Starting program:  /home/csf/forkfault 

Breakpoint 1, main () at forkfault.c:15
15	  int *nullptr = NULL;
(gdb) set follow-fork-mode child
(gdb) continue
Continuing.
[New process 9666]

Thread 2.1 "forkfault" received signal SIGSEGV, Segmentation fault.
[Switching to process 9666]
0x0000555555554726 in segfault (ptr=0x0) at forkfault.c:9
9	  *ptr = 5;
(gdb) backtrace
#0  0x0000555555554726 in segfault (ptr=0x0) at forkfault.c:9
#1  0x000055555555477e in main () at forkfault.c:20
(gdb) 

The full GDB manual provides much more information about debugging multiple processes. One particularly useful command is info proc, which can be used to examine information about the status of a process, including any memory map regions, memory usage, number of threads, and many others. Information on info proc can be found at:

https://sourceware.org/gdb/current/onlinedocs/gdb/Process-Information.html

[1]

The use of web searches and sites like Stack Overflow also raise serious ethical concerns about attribution and plagiarism. Reusing others’ code, particularly without properly citing these sources or documenting that the use is permitted, can lead to legal ramifications in professional practice and to academic misconduct charges.

[2]	Homebrew can be installed from `https://brew.sh`. Once Homebrew is installed, the OpenSSL library can be installed by running `brew install openssl`.

[3]	First, note that the compiler tries to help us avoid the segmentation fault. Warnings are quite frequently an indication that the code is syntactically legal but semantically wrong. Technically, we’re allowed to do something, but it’s probably not going to work out correctly.

[4]	This type of trace might look familiar to readers with experience in other languages, such as Java or Python. Those languages include built-in exception-handling mechanisms, using the same mechanisms that GDB relies on, that provide this information for free.