.. _Strings: .. raw:: html .. |--| unicode:: U+2013 .. en dash .. |---| unicode:: U+2014 .. em dash, trimming surrounding whitespace :trim: .. This file is part of the OpenCSF eTextbook project. It was .. auto-generated by scripts from the OpenDSA eTextbook project. .. See https://OpenCSF.org for more details. OpenCSF is distributed .. under a Creative Commons Attribution-NonCommercial 4.0 International .. License (see http://creativecommons.org/licenses/by-nc/4.0/), .. Copyright (c) 2019-2021 by Michael S. Kirkpatrick. OpenDSA is .. distributed under an MIT open source license, Copyright (c) 2012-2021 .. by the OpenDSA Project Contributors. .. avmetadata:: :author: Michael S. Kirkpatrick :requires: :satisfies: :topic: Strings ======= It is often the case that a seemingly simplistic idea or design can turn out to be surprisingly complicated. We saw one example of this previously in the discussion of pointers. The definition of equating a pointer with an address seems straightforward; the implications of their usage for dynamic memory allocation, call-by-reference parameters, variable sizes, and so forth quickly become challenging for the programmer. The same can be said for strings in C. As with pointers, we start with a basic definition: .. raw:: html
To interpret the situation in a slightly different way, the C programming language does not actually have a string type in the intuitive sense that makes sense to humans. Instead, C just provides a thin veneer of interface for working with fixed-size arrays of ``char`` data. A string in the C sense consists of the array of ``char``\s that are (typically) observable to a human reader, with one additional ``char`` added to the end of the array. `Code Listing A.36 <#cla-36>`_ illustrates this fact by defining the string ``"Hello"`` in a very unusual manner: as an array of six ``uint8_t`` values. One key idea here is that everything in the machine is just a number. The meaning and interpretation of those bytes as the string ``"Hello"`` is created by the ``%s`` format specifier, which tells the display to present the ASCII interpretation of the bytes to the user instead of the numeric values. .. _cla-36: .. codeinclude:: CLang/CodeListing-A-36.c :linenos: true Since the string is an array, its individual elements can be accessed and modified; line 8 changes the ``'e'`` to ``'a',`` line 9 changes the first ``'l'`` to ``'!',`` and line 10 changes the second ``'l'`` to the null byte ``'\0'`` (literally the number 0). These changes cause line 11 to print the string as ``"Ha!"`` instead of the original ``"Hello"``. These lines did not change the ``'o'`` byte stored as ``string[4]``, nor did the original ``'\0'`` stored in ``string[5]`` change; both bytes are still there in memory as part of the original array. The only reason they do not get printed by line 11 is, again, because of the ``%s`` format specifier, which tells ``printf()`` to stop printing at the first null byte. `Table A.5 <#tblA-5>`_ illustrates the memory content of this array of chars from before and after the modifications, based on three different interpretations for formatting. Note that the ASCII interpretation ``'\0'`` is not displayed to the screen, but is shown here for completeness. .. _tblA-5: .. raw:: htmlA string is an array of characters ending in the null byte.
ASCII interpretation %c or %s |
H |
e |
l |
l |
o |
\0 |
Hexadecimal format %x |
48 |
65 |
6c |
6c |
6f |
00 |
Decimal format %d |
72 |
101 |
108 |
108 |
111 |
0 |
ASCII interpretation %c or %s |
H |
a |
! |
\0 |
o |
\0 |
Hexadecimal format %x |
48 |
61 |
21 |
00 |
6f |
00 |
Decimal format %d |
72 |
97 |
33 |
0 |
111 |
0 |
Table A.5: Three interpretations of the bytes that make up the strings from Code Listing A.36
Original string contents: | ||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
W |
e |
d |
, |
J |
a |
n |
, |
0 |
1 |
, |
2 |
0 |
2 |
0 |
, |
N |
e |
w |
|
Y |
e |
a |
r |
' |
s |
\0 |
After line 5 of Code Listing 41: | ||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
W |
e |
d |
\0 |
J |
a |
n |
\0 |
0 |
1 |
, |
2 |
0 |
2 |
0 |
, |
N |
e |
w |
|
Y |
e |
a |
r |
' |
s |
\0 |
Table A.6: The contents of the line variable before line 4 and after line 5 of Code Listing A.41