More Strings and I/O in C
In this lab, you will gain more experience with strings and I/O in C.
Objectives:
- Use common string functions within the standard
string.h
C library - Use common I/O functions scanf from
stdio.h
C library - Open, read from, write to, and close a text file
- Read input in from delimited text files using space as a delimiter
Setup
Begin this lab with an empty main.c
and the standard makefile for your operating system. You will want to add the following includes at the top of your file:
#include <stdbool.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
Of course, you will also want a basic definition of the main
function. You should follow along with the code examples in this lab, adding them to your main
function and testing them.
C Strings
As we've already mentioned, C strings work very differently than Java strings. You are probably used to writing code like this:
String name = "John";
name = name + " Smith";
String anon = "Anonymous";
if (name.equals("John Smith")) {
anon = name;
}
Most of this will not work as-written in C. This is because Java handles strings as objects, and does a lot of behind-the-scenes bookkeeping and management for you. That makes it easier to use strings in Java, but it really obscures what is happening in memory.
Recall that in C, a string is quite literally just an array of characters. Recall as well that arrays are of a fixed size in C.
Thus, it is not possible to dynamically add characters to the end of an array (as in line 2 of the above example). If you want to compare strings, you would have to use a character-by-character comparison rather than a simple equality operation (as in line 4 of the above example). And you can't assign one array to another without explicitly copying each character (as in line 5 of the above example). Also, all of these pieces of functionality would need to be aware of how much memory is allocated for the strings.
You could write functions to do all of these things, and they would mostly involve loops over the characters. Thankfully, the C standard library provides a lot of built-in string manipulation functions. For instance, the above example could be written in C as follows:
char name[128]; // allocate space for 'name'
strncpy(name, "John", 128); // copy "John" into 'name'
// add " Smith" to the end of 'name'
strncat(name, " Smith", 128-strlen(name));
char anon[256]; // allocate space for 'anon'
strncpy(anon, "Anonymous", 256); // copy "Anonymous" into 'anon'
// compare 'name' with the literal "John Smith"
if (strcmp(name, "John Smith") == 0) {
strncpy(anon, name, 256); // copy 'name' into 'anon'
}
Perhaps now you are beginning to appreciate all of the work that Java was doing for you. One goal of this course is to expose you to the inner workings of the machine. Even if you don't end up doing a lot of work in C, the experience of working in a lower-level language will help you to write more efficient code in any language.
Working with strings (and to a lesser extent I/O) in C will probably take some getting used to, but it will eventually become more natural. Until then, we encourage you to keep handy references to the string.h
and stdio.h
libraries.
Aside: The const
keyword
You can declare constants in C using the const
keyword. For example:
const int MAX_STR_LEN = 256;
We recommend that you use constants like this for the lengths of string arrays. That way you can easily increase or decrease the size later without having to update many different places in your code.
String Library Functions
Let's examine some of the standard functions available in the string.h
library.
String Length: strlen
Recall that in the last string lab, we calculated the length of the string programmatically. In fact, the standard string library provides a built-in function with the same functionality:
size_t strlen(char *str)
This returns the length of the string (as a size_t
value), as calculated by a linear search for a terminating character ('\0'
).
Example:
char string3[] = "how long has this been going on";
int size = strlen(string3);
printf("The size of your string is %d",size);
String Concatenation: strncat
char* strncat(char *destination, char *source, int num)
This function appends the first num
characters of source to destination, and adds a terminating null-character. IMPORTANT: To avoid buffer overflows, you should make sure that num
is less than the number of characters left in destination
(i.e., its capacity minus its length).
If num
is less than the length of source
, the function will only copy up to num
characters. For this reason, we prefer this function over the less safe strcat
function, which is vulnerable to buffer overflow attacks.
Example:
char string3[20] = "Now you see me ";
char string4[10];
strncat(string4, string3, 8);
printf("%s",string3);
printf("%s",string4);
printf("don't\n");
String Copying: strncpy
char* strncpy(char *destination, char *source, int num)
This function copies the first num
characters of source string into the destination string. IMPORTANT: To avoid buffer overflows, you should make sure that num
is less than the allocated size of destination
.
If the length of the source string is less than the length of the source string, then the destination is padded with null characters. Otherwise, the function copies as many characters as it can but will not overflow the destination array; it also does not add a null terminator in the latter case.
Again, it is for this reason that we prefer strncpy
over strcpy
, which is vulnerable to buffer overflows.
Example:
char string1[10] = "first";
char string2[10] = "last";
printf("The %s ", string2);
printf("shall be %s ", string1);
printf("and the %s ", string1);
strncpy(string1, string2, 10); // string1 becomes "last"
printf("shall be %s", string1);
Exercise: Partial Copy
Given the following string:
char string[] = "Madison James ";
Copy only the first 8 characters into a second string and print both strings.
String Comparison: strcmp
int strcmp(char *string1, char *string2)
Compares string1
to string2
. Returns 0 if both values are equal. Otherwise, the return is negative if the first character that does not match has a lower value in string1
than string2
. The return is positive if the first character that does not match has a greater value in string1
than in string2
.
Here is a summary:
Return | Meaning |
---|---|
<0 | string1 < string2 |
0 | string1 == string2 |
>0 | string1 > string2 |
Partial String Comparison: strncmp
int strncmp(char *string1, char *string2, int num)
Compares up to num
amount of characters of string1
to those of string2
. The return values have the same meaning as for strcmp
.
Exercise: These are not the droids you're looking for.
DO THIS: Given the following input array, write code to look for droid class names that start with C3
and R2
.
char droids[][5] = {"C3PO" , "BB8", "R5D4", "R4G9", "R5J2", "R2D2"};
Output a message to the user for each class of droids that is found.
Standard I/O Library Functions
Overview
In previous labs, we have written programs that produced output using the puts
or printf
functions. We are now going to introduce program input capabilities in C.
Most C I/O functions work with file handles, which are values of type FILE*
(defined in stdio.h
). Given a valid file handle, you can read from that file using fscanf
and fgets
, and write to that file using fprintf
.
Like Java, C has standard input and output streams that are merely special cases of general input and output file handles: stdin
and stdout
, respectively. The stdio
library provides shortcut functions for handling these functions: scanf
and printf
(which you have already used).
Here is a table-based overview of all the I/O functions we will be using:
Function | Description |
---|---|
fscanf |
formatted input |
scanf |
formatted input from stdin |
fgets |
line-based input |
fprintf |
formatted output |
printf |
formatted output to stdout |
You may notice the lack of a shortcut gets
function that corresponds to the fgets
function in the same way that printf
corresponds to fprintf
. This is because the gets
function does not perform bounds checking and is highly vulnerable to buffer overflow attacks (just like some of the string functions mentioned above). For this reason, we will not be using gets
in this course.
First, let's examine the scanf
function:
Formatted Standard Input: scanf
int scanf(char *format, ... );
This function reads data from stdin
into variables. Like printf
, this function is somewhat unusual in that it accepts a variable number of parameters, and thus it should come as no surprise that you use the same technique for specifying input formats via the format
parameter.
Here is an example:
int val1;
printf("Enter a signed integer: ");
scanf("%d", &val1);
printf("Your number is %d.\n", val1);
Note the "address-of" operator ('&
') in front of val1
in the call to scanf
. Because scanf
needs to save input values to variables, it needs to know where those variables are located.
Here is the list of specifiers given in lab 2 for printf
that also work for scanf
.
Code | Description |
---|---|
%d |
signed integer (int ) |
%lu |
unsigned long integer (size_t ) |
%f |
floating-point number (float or double ) |
%e |
scientific notation (float or double ) |
%c |
character (char ) |
%s |
character string (char[] ) |
%p |
pointer |
Consult the scanf reference for details of these formats.
IMPORTANT: The scanf
function will not discard newline characters, so you must be careful using it for input from the keyboard. You can discard whitespace characters before a value by adding a space before the format specifier. For example:
scanf(" %d", &val1);
Finally, scanf
returns an integer indicating how many items were successfully read. If the function encounters the end of the file before any items were read, then a special value (EOF
) is returned.
Exercise: Oldest, Youngest, Middle?
DO THIS: Write code that aske the user to input their their age, and their number of siblings. Your code should then prompt the user for the age of each sibling and determine if the user is the oldest, middle, or youngest sibling. The code should then print this information.
Line-based File Input: fgets
The scanf
function is useful for reading individual pieces of data, but occasionally it is useful to read an entire line of text into a single string for later processing. For this, we will use the fgets
function:
char* fgets(char *str, int num, FILE *stream)
This function reads up to num
amount of characters from the stream/file and stores them into str
. If a newline ('\n'
) is encountered, fgets
will copy it and then stop reading. In any case, the function will append a null terminator ('\0'
) to end the string.
If the function encounters the end of the file, it returns a null pointer. Otherwise, it returns str
.
The correct way to use this function is to pass the size of str
as num
. This way, the function reads a line of input unless it reaches the capacity of str
(at which point it would halt). This prevents buffer overruns (array out-of-bound accesses), to which its more dangerous gets
cousin is vulnerable.
IMPORTANT: The saved string will retain the newline character ('\n'
).
Consult the gets reference for more details.
DO THIS:Run the following code in your main():
char address [100];
printf("Enter your street address: ");
fgets(address, 100, stdin);
printf("Your address is: %s\n", address);
Play with the initial capacity of 100.
Question: What happens if you enter more characters than the capacity?
IMPORTANT: Because scanf
and fgets
have different behaviors with regard to newline characters (scanf
does not automatically trim them, but fgets
does), it is not recommended that you attempt to mix them when reading from the same source.
File I/O Library Functions
Thus far, we have only used standard input (stdin
) and standard output (stdout
). Now, we will expand our I/O capabilities and discuss reading and writing to files.
Opening and Closing Files: fopen
and fclose
You can open files on your hard drive for reading and writing in C using the functions fopen
. You should remember to close any files that you open using the fclose
function.
FILE* fopen(char *filename, char *mode)
The fopen function opens a file(filename) and depending on the mode parameter given allows the user to read ("r"
), write ("w"
), or append ("a"
) to that file. The function returns a FILE*
object to use with later calls to functions like fgets
, fprintf
, or fclose
.
Question: Why are there different modes for opening a file?
int fclose(FILE *handle)
The fclose function closes the file associated with the given handle.
DO THIS:Run the following code in your main():
FILE *fout;
fout = fopen("bottle.txt", "w"); // open your bottle
fprintf(fout, "Help!\n"); // put your message in the bottle
fclose(fout); // cork your bottle
Verify that the bottle.txt
file contains your message.
Reading and Writing Files: fscanf
, fgets
, and fprintf
To read formatted input from a file, use the fscanf
function, which works the same as scanf
except that it takes as the first parameter a file handle to read from:
int fscanf(FILE *handle, char *format, ... );
The fgets
function also works the same as with standard input; just send it your file handle instead of stdin
.
To send formatted output to a file, use the fprintf
function, which works the same as printf
except that (like fscanf
) it takes as the first parameter a file handle to print to.
Exercise: Mail Merge
This exercise will emulate a "Mail Merge" routine in Microsoft Word, taking data from a data source file and merging it into a form letter.
DO THIS: Write code to read space-delimited data from a file names.txt
.
You will read the data from three columns as follows:
Column 1 = Title
Column 2 = First name
Column 3 = Last name
Here is a sample file:
Dr. Phil Goode
Mr. John Vayne
Ms. Janet Jackson
Dr. Andre Young
Mr. Rock Deejay
Ms. Mary Sunshine
Dr. John Hu
Mr. Steve Mister
Ms. Mary Murder
Hon. Judith Sheindlin
The function should greet each person in the file in a differen way depending on their title:
Title | Greeting |
---|---|
Ms. | "Hello, Ms. LASTNAME" |
Mr. | "Hey there, FIRSTNAME" |
Dr. | "Salutations, Dr. FIRSTNAME LASTNAME" |
Hon. | "Greetings to the honorable FIRSTNAME LASTNAME" |
(Optional) More exercises
Exercise: String Validation
DO THIS: When submitting information via web forms, we often need to validate the form of that information. Write a phone number validation function using this signature:
boolean valid_string(char string[]);
The function should determine if the phone number looks like this: (540)-568-3555.
HINT: You may want to use the isdigit
function in the ctype.h
.
Exercise: Initializing turtles or strings to turtle letters
DO THIS: Write some turtle functions to draw the letters for your initials. Assume the 'font height' of 100 tall for a capital 'O' (i.e., 'O' is 100 tall by 50 wide).
void draw_a();
{
// code to draw letter 'a' using turtle graphics
}
These could be called by a function of this form:
void draw_word(char[25] string)
{
for (char *c = string; *c != '\0'; c++) {
if (*c == 'a') {
draw_a();
} else if (*c == 'b') {
draw_b();
// etc.
}
}
}
Example: if your name is James Earl Jones you would create draw_j
and draw_e
functions.
We will attempt to crowd source the entire alphabet in a thread on Piazza, thus allowing a draw_word
function to accurately print the turtle graphics representation for any word entered by the user.
(Challenge) Exercise: Search and Replace
Write a program to ask a user for two strings of the same size(like and & the) and take a file call search.txt
and then print a file called replace.txt
, with all occurrences of the first string replaced by the second string. You should use an array for this exercise.
Question: Why is it important that this exercise have you use strings of equal length?