.. _Debugging:
.. raw:: html
.. |--| unicode:: U+2013 .. en dash
.. |---| unicode:: U+2014 .. em dash, trimming surrounding whitespace
:trim:
.. This file is part of the OpenCSF eTextbook project. It was
.. auto-generated by scripts from the OpenDSA eTextbook project.
.. See https://OpenCSF.org for more details. OpenCSF is distributed
.. under a Creative Commons Attribution-NonCommercial 4.0 International
.. License (see http://creativecommons.org/licenses/by-nc/4.0/),
.. Copyright (c) 2019-2021 by Michael S. Kirkpatrick. OpenDSA is
.. distributed under an MIT open source license, Copyright (c) 2012-2021
.. by the OpenDSA Project Contributors.
.. avmetadata::
:author: Michael S. Kirkpatrick
:requires:
:satisfies:
:topic:
Documentation and Debugging
===========================
The Internet provides many ways to find documentation via web searches that lead to Stack Overflow.
This approach can be helpful when the provided code is easily adaptable, but it can also be
frustrating when the explanation is incomplete or incorrect. [#f51]_ In particular, solutions found
in this way can often demonstrate what the correct approach is but not clearly identify the source
of the error or misunderstanding. System manuals (referred as man pages to in the UNIX tradition)
and command-line debuggers can become powerful tools when learned.
Man Pages
---------
Throughout this book, we have generally adhered to the POSIX.1-2017 specification for the C
programming interface (also known officially as IEEE Std 1003.1, 2017 Edition and The Open Group
Technical Standard Base Specifications, Issue 7). This specification is published freely online by
The Open Group at:
* https://publications.opengroup.org/standards/unix/c181
* https://pubs.opengroup.org/onlinepubs/9699919799/
Often, however, it is convenient or even necessary to access the system manual directly from the
command line. For instance, recall that macOS is a UNIX OS based on BSD UNIX, while Linux is a
UNIX-like OS that was developed independently. As such, there are slight differences between the C
interfaces (particularly in relation to IPC) between the two. When these differences arise, it is
necessary to consult the documentation that is specific to that particular OS. The ``man``
command-line utility provides that interface. Documentation for any C function or system call can
(generally, with an exception described below) be found by typing ``man`` followed by the name of
the function. To get started with ``man``, you can read its own ``man`` page (use arrows to move
up/down and press ``'q'`` to quit):
.. codeinclude:: CLang/ManMan.txt
One issue that arises with man pages is that there are naming conflicts between command-line
utilities and C functions. These conflicts can be resolved by specifying the *section* of the
manual as an integer parameter before the function name. The most common sections of interest for
systems programming are sections 1 (executable programs and command-line utilities), 2 (system calls
provided by the kernel), and 3 (C standard library functions that are not system calls). As an
example, compare the following two ``man`` page entries; the former brings up the page for the
``bash mkdir`` command, while the latter brings up the C function documentation):
.. codeinclude:: CLang/ManMkdir.txt
The header of the ``man`` page indicates a more precise naming convention to indicate the section
under consideration. Using the examples above, the default behavior for man ``mkdir`` is to find
``mkdir(1)``, the command-line utility, as opposed to ``mkdir(2)``, the system call. Besides a
header and footer that document the function's section of the manual, the structure for ``man``
pages for C functions generally follows a specified format (may not contain all of these fields):
.. _tbla_1:
.. raw:: html
Field title |
Purpose of the field |
NAME |
Quick description of the function or utility |
LIBRARY |
Which libraries must be linked to the compiled code (sometimes included as part of the SYNOPSIS ) |
SYNOPSIS |
Required header #include statements and the function prototype |
DESCRIPTION |
A detailed description of what the function does, with key usage issues or considerations highlighted |
RETURN VALUES |
How to interpret possible values returned from the function |
ERRORS |
A list of constants that the function might assign to errno when an error occurs; these constants begin with 'E' and are printed in brackets |
SEE ALSO |
Other functions that serve related purposes |
STANDARDS |
Which POSIX standard defines the function |
HISTORY |
When was the function introduced to UNIX |
BUGS |
Possible input sources that cause known bugs |
Table A.1: Common fields of a man
page
Beyond just providing information about what the function is or does, these sections provide hints
for how to deal with errors. Specifically, when a problem arises with compilation or a run-time
crash occurs, the following ``man`` page fields provide a quick solution:
* ``SYNOPSIS`` – Many functions rely on a particular ``struct`` declaration that is defined in a
standard header file. This field enumerates all of the headers that are required to be set as
``#include`` statements to use the function. This field is also particularly helpful for making
sure that arguments are being passed in the correct order.
* ``LIBRARY`` – Some functions require linking to additional libraries. For instance, the ``pow()``
function (used to calculate raising a base to some power) is in the C math library; some systems
require explicitly linking executables with the ``-lm`` flag for ``gcc``.
* ``RETURN VALUES`` – Some functions return a simple binary value to indicate success or failure,
while others return a quantitative value (such as the number of bytes read). Treating return values
incorrectly can lead to many bugs in systems code.
* ``ERRORS`` – Many functions use a generic return value to indicate an error has occurred. For
instance, ``read()`` returns -1 to indicate that the requested operation failed; the global
variable ``errno`` is set to explain why the failure occurred. In the case of ``read(2)``, the
possible errors include ``EAGAIN`` (file is marked for non-blocking I/O, but no data is ready to
read), ``EINTR`` (the device was interrupted by a signal), ``EINVAL`` (the file descriptor was
negative), or ``EBADF`` (the file descriptor is not open for reading). Comparing ``errno`` with
these constants in the case of a failure to read can be an important clue in debugging.
One last note about ``man`` pages is that non-standard libraries typically provide an interface that
can be consulted as above. The primary difference is that these libraries might be installed in
places that the man command does not know to check. The ``-M`` flag can fix this problem. As an
example, on macOS, the Homebrew [#f52]_ utility can be used to install OpenSSL. The default behavior
of Homebrew is to install such libraries in ``/usr/local/opt``, which is not searched by ``man``. As
such, the documentation for OpenSSL functions can be found by specifying the path as follows:
.. codeinclude:: CLang/ManEncrypt.txt
Debugging with GDB
------------------
One hurdle that all programmers must overcome is that errors and crashes indicate that there is a
difference between what the code *does* and what the code *should do* (or rather, what the
programmer *thinks* the code should do). Debugging is the art of bridging this gap. When a program
crashes or produces obviously incorrect output, the programmer must figure out where in the code
this gap exists and what is causing it. The GNU debugger (GDB) is a powerful tool that can help to
bridge this gap. Besides just stepping through code, GDB provides a number of built-in tools that
can assist with debugging. More documentation on GDB can be found at:
* https://www.gnu.org/software/gdb/documentation/
Breakpoints and Watchpoints
~~~~~~~~~~~~~~~~~~~~~~~~~~~
In the systems programming field, particularly with languages like C, the lack of exception handling
makes debugging challenging. Often, the only indication that the error has occurred is the
notoriously vague "Segmentation fault." To get started with GDB, we will use it to examine
`Code Listing A.1 <#cla-1>`_.
.. _cla-1:
.. codeinclude:: CLang/CodeListing-A-1.c
:linenos: true
To get started with GDB, we must compile our program with the ``-g`` flag for ``gcc``, indicating
that we want to include debugging symbols. Without debugging symbols, we cannot refer to variables
or (in certain circumstances) functions by their names. In the example code above, the steps to
compile it, run it (without GDB), and run it again (with GDB) are as follows (note that GDB requires
passing the ``--args`` option if there are command-line arguments):
.. codeinclude:: CLang/GDB.txt
The following GDB session illustrates how to use *watchpoints* and *breakpoints*. In
GDB, we can set a watchpoint on a variable and be notified any time that variable's value changes.
(In fact, we can set watchpoints for any arbitrary memory location, but that is beyond our current
scope.) We can set a breakpoint for a function name and be notified any time that function is
called. In this first GDB session, we set a breakpoint for the ``helper()`` function and a
watchpoint for the ``x`` variable in ``main()``.
.. codeinclude:: CLang/GDBSession.txt
:linenos: true
For simplicity, we have omitted the several lines of introductory text that GDB shows regarding its
license and documentation. The GDB command prompts are shown as bolded lines, where ``(gdb)``
indicates the prompt and our input is shown afterwards. On line 4 of this session, we use the
``start`` command to begin the program's execution, which will pause when the ``main()`` function
begins execution. When GDB executes lines of code as the result of a command, it displays the next
line of code that will be executed in the future. So on line 9, GDB is indicating that it paused
just before line 12 of `Code Listing A.1 <#cla-1>`_, which initializes the ``x`` variable.
Lines 10 and 12 of the GDB session set up the breakpoint and watchpoint with their respective
commands, ``break`` and ``watch``. Note that these commands require the target symbols to be visible
by GDB when they are run. That means that the program must be compiled with debugging symbols
included and the variable must be in scope. On line 14 of this GDB session, we cannot set a
watchpoint for the ``input`` variable, which is only defined within the scope of the ``helper()``
function; GDB is currently executing inside the ``main()`` function scope.
The main commands to execute code in GDB are ``step``, ``next``, and ``continue``. The ``step`` and
``next`` commands (not shown here) would simply execute the next line of code. In the session above,
running either of these commands would execute the line 12 of `Code Listing A.1 <#cla-1>`_, which sets the
variable ``x`` to 5. The difference between these two commands arises when the next line of code is
a function call. The ``step`` command would *step into* the called function's body, whereas the
``next`` command treats the function call as an opaque box, executing the entire function as one
step. Based on the previous session, we use the ``continue`` command, which allows the program to
run until it is interrupted.
.. codeinclude:: CLang/GDBSession2.txt
:linenos: true
In this session, we can observe the effects of the watchpoint and breakpoint that we set previously.
When GDB encounters code that changes the value of a variable being observed with a watchpoint, it
will pause the execution to indicate the old and new values (``x`` has changed from 0 to 5), as well
as the next line of code when execution resumes (line 13 to call the ``helper()`` function). By
continuing a second time (line 10), this session encounters the breakpoint when the execution of
``helper()`` begins. Using breakpoints like this makes it easy to skip over large chunks of code and
pausing right before the execution of a function that we wish to focus on. Line 15 of this session
performs another continue, pausing on line 15 of the code; as ``x``'s value changes from 5 to 10,
the watchpoint is triggered again. Finally, note that we can also print in-scope variables at any
point to observe their current value (line 24).
To illustrate the end of a GDB session, we execute the ``continue`` command two more times. The
first continue causes an interrupt to arise when the ``return`` from ``main()`` occurs. Once this
``return`` happens, the variable ``x`` no longer exists, so the watchpoint can be deleted (lines 5
and 6). Note that this deletion implies GDB can distinguish between the variable being temporarily
out of scope (as ``x`` is not in scope while ``helper()`` is executing) and being no longer needed.
Lines 7 – 9 of this session can be ignored here, as it is simply GDB informing us (indirectly) that
the source code containing the next line to execute cannot be found; this typically happens when GDB
pauses during the executing of the C standard library, as these libraries are not compiled with
debugging symbols included. The last continue (line 10) runs the program until it finishes, with
line 12 informing us that the process has been terminated.
.. codeinclude:: CLang/GDBSession3.txt
:linenos: true
Backtrace
~~~~~~~~~
One significant source of frustration for systems programming is the lack of information regarding
segmentation faults. *Backtrace* is an essential tool for debugging segmentation faults, such
as the code shown in `Code Listing A.2 <#cla-2>`_.
.. _cla-2:
.. codeinclude:: CLang/CodeListing-A-2.c
:linenos: true
Compiling and running this program would produce the following unhelpful results: [#f53]_
.. codeinclude:: CLang/Segfault.txt
Diagnosing and fixing segmentation faults can be very difficult. One reason is that the message above provides
no information about what line of code was executing when the segmentation fault occurred. Many
programmers, particularly novices, try to address this with "``printf()`` debugging," the ad hoc
approach of adding ``printf()`` statements to the code. This approach is inefficient and ineffective
for a number of reasons. First, it requires re-editing and re-compiling the program. Second, adding
any code, including ``printf()``, **changes the program**. In some cases, this seemingly innocuous
change will cause the segmentation fault to disappear—but only if the ``printf()`` stays in the
code; removing the ``printf()`` brings the segmentation fault back. Lastly, adding code increases
the opportunity for bugs to cause misleading results. For instance, changing the ``main()`` of `Code
Listing A.2 <#cla-2>`_ to that shown in `Code Listing A.3 <#cla-3>`_ will produce identical results;
without a ``'\n'``, the ``STDOUT`` buffer will not be flushed and the message in the ``printf()``
will never be displayed to the screen.
.. _cla-3:
.. codeinclude:: CLang/CodeListing-A-3.c
:linenos: true
GDB's ``backtrace`` utility provides a more efficient and effective approach to debugging. Line 10
of this GDB session informs us that GDB encountered a segmentation fault (``SIGSEGV)`` while
executing line 5 of `Code Listing A.2 <#cla-2>`_. Line 11 also provides information about the
function that was executing at the time (``segfault()``), and what its current input arguments were
(``value=0``). That information might be enough if the function is only called once or if the
relationship between the arguments and the segmentation fault is clear. However, it is often the
case that more context is needed; in order to debug the segmentation fault, we need to know what
specific call—including the arguments passed and where it was called from—produced these results.
This information is provided by ``backtrace``, which shows the sequence of function calls that led
to the segmentation fault.
.. codeinclude:: CLang/SegvBacktrace.txt
:linenos: true
Lines 14 and 15 show that the segmentation fault occurred while executing line 5 of `Code Listing
A.2 <#cla-2>`_. This line exists within the ``segfault()`` function, which was specifically called
with the argument ``value=0``. The call to ``segfault(0)`` occurred on line 11 of the code, which
exists as part of the call to ``main()``. Unless the contents of the stack have become corrupted
(which go beyond our discussion here), this history will show the full trace [#f54]_ back to the
``main()`` context. That information provides more context that the programmer can use to determine
what might be the root cause of the segmentation fault.
Tracing multiple processes
~~~~~~~~~~~~~~~~~~~~~~~~~~
Debugging segmentation faults—and other bugs—with multiple processes can be particularly
challenging. For one thing, the asynchronous timing of the execution might intersperse output
messages in odd ways. Even more frustrating, messages written to the ``STDERR`` stream are also lost
unless the output streams are specifically linked. Consider the example shown in `Code Listing A.4
<#cla-4>`_, which uses a slight variant on the same ``segfault()`` function from `Code Listing A.2 <#cla-2>`_.
.. _cla-4:
.. codeinclude:: CLang/CodeListing-A-4.c
:linenos: true
As with the previous cases, running this code should result in a segmentation fault. However, the
results would actually look as follows:
.. codeinclude:: CLang/Forkfault.txt
In this case, the parent process calls ``fork()`` to create a child process. Based on the
nondeterministic timing of process scheduling, either the parent continues executing or the child
runs first. This nondeterminism is not an issue here, though; if the parent runs, it calls
``wait()`` until the child runs to completion. Consequently, when the parent executes the
``printf()`` call on line 24, the child has been guaranteed to run to completion; that child process
definitely terminated with a segmentation fault because of the code on line 13. The reason the
child's error message does not appear on the terminal is that the child process's ``STDERR`` output
stream is thrown away.
To debug these kinds of errors in multiple processes, GDB allows you to specify which process to
follow when a ``fork()`` occurs. Specifically, the default behavior is that GDB will follow the
parent; if you use next or continue to move past a ``fork()``, GDB will show a message about a new
process but continue executing the parent. To change this behavior, the set command can be used to
change GDB's ``follow-fork-mode`` setting as shown below. Based on this new setting, GDB will switch
to the child process after the call to ``fork()``; additional GDB commands will be sent to the child
process rather than the parent.
.. codeinclude:: CLang/Forkfault2.txt
The full GDB manual provides much more information about debugging multiple processes. One
particularly useful command is ``info proc``, which can be used to examine information about the
status of a process, including any memory map regions, memory usage, number of threads, and many
others. Information on ``info proc`` can be found at:
* https://sourceware.org/gdb/current/onlinedocs/gdb/Process-Information.html
.. [#f51] The use of web searches and sites like Stack Overflow also raise serious ethical concerns
about attribution and plagiarism. Reusing others' code, particularly without properly citing these
sources or documenting that the use is permitted, can lead to legal ramifications in professional
practice and to academic misconduct charges.
.. [#f52] Homebrew can be installed from ``https://brew.sh``. Once Homebrew is installed, the OpenSSL library can be installed by running ``brew install openssl``.
.. [#f53] First, note that the compiler tries to help us avoid the segmentation fault. Warnings are
quite frequently an indication that the code is syntactically legal but semantically wrong.
Technically, we're allowed to do something, but it's probably not going to work out correctly.
.. [#f54] This type of trace might look familiar to readers with experience in other languages, such
as Java or Python. Those languages include built-in exception-handling mechanisms, using the same
mechanisms that GDB relies on, that provide this information for free.