More Strings and I/O in C

In this lab, you will gain more experience with strings and I/O in C.

Objectives:

  • Use common string functions within the standard string.h C library
  • Use common I/O functions scanf from stdio.h C library
  • Open, read from, write to, and close a text file
  • Read input in from delimited text files using space as a delimiter

Setup

Begin this lab with an empty main.c and the standard makefile for your operating system. You will want to add the following includes at the top of your file:

#include <stdbool.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

Of course, you will also want a basic definition of the main function. You should follow along with the code examples in this lab, adding them to your main function and testing them.

C Strings

As we've already mentioned, C strings work very differently than Java strings. You are probably used to writing code like this:

    String name = "John";
    name = name + " Smith";
    String anon = "Anonymous";
    if (name.equals("John Smith")) {
        anon = name;
    }

Most of this will not work as-written in C. This is because Java handles strings as objects, and does a lot of behind-the-scenes bookkeeping and management for you. That makes it easier to use strings in Java, but it really obscures what is happening in memory.

Recall that in C, a string is quite literally just an array of characters. Recall as well that arrays are of a fixed size in C.

Thus, it is not possible to dynamically add characters to the end of an array (as in line 2 of the above example). If you want to compare strings, you would have to use a character-by-character comparison rather than a simple equality operation (as in line 4 of the above example). And you can't assign one array to another without explicitly copying each character (as in line 5 of the above example). Also, all of these pieces of functionality would need to be aware of how much memory is allocated for the strings.

You could write functions to do all of these things, and they would mostly involve loops over the characters. Thankfully, the C standard library provides a lot of built-in string manipulation functions. For instance, the above example could be written in C as follows:

    char name[128];                     // allocate space for 'name'
    strncpy(name, "John", 128);         // copy "John" into 'name'

    // add " Smith" to the end of 'name'
    strncat(name, " Smith", 128-strlen(name));

    char anon[256];                     // allocate space for 'anon'
    strncpy(anon, "Anonymous", 256);    // copy "Anonymous" into 'anon'

    // compare 'name' with the literal "John Smith"
    if (strcmp(name, "John Smith") == 0) {
        strncpy(anon, name, 256);       // copy 'name' into 'anon'
    }

Perhaps now you are beginning to appreciate all of the work that Java was doing for you. One goal of this course is to expose you to the inner workings of the machine. Even if you don't end up doing a lot of work in C, the experience of working in a lower-level language will help you to write more efficient code in any language.

Working with strings (and to a lesser extent I/O) in C will probably take some getting used to, but it will eventually become more natural. Until then, we encourage you to keep handy references to the string.h and stdio.h libraries.

Aside: The const keyword

You can declare constants in C using the const keyword. For example:

    const int MAX_STR_LEN = 256;

We recommend that you use constants like this for the lengths of string arrays. That way you can easily increase or decrease the size later without having to update many different places in your code.

String Library Functions

Let's examine some of the standard functions available in the string.h library.

String Length: strlen

Recall that in the last string lab, we calculated the length of the string programmatically. In fact, the standard string library provides a built-in function with the same functionality:

size_t strlen(char *str)

This returns the length of the string (as a size_t value), as calculated by a linear search for a terminating character ('\0').

Example:

    char string3[] = "how long has this been going on";
    int size = strlen(string3);
    printf("The size of your string is %d",size);

String Concatenation: strncat

char* strncat(char *destination, char *source, int num)

This function appends the first num characters of source to destination, and adds a terminating null-character. IMPORTANT: To avoid buffer overflows, you should make sure that num is less than the number of characters left in destination (i.e., its capacity minus its length).

If num is less than the length of source, the function will only copy up to num characters. For this reason, we prefer this function over the less safe strcat function, which is vulnerable to buffer overflow attacks.

Example:

    char string3[20] = "Now you see me ";
    char string4[10];
    strncat(string4, string3, 8);
    printf("%s",string3);
    printf("%s",string4);
    printf("don't\n");

String Copying: strncpy

char* strncpy(char *destination, char *source, int num)

This function copies the first num characters of source string into the destination string. IMPORTANT: To avoid buffer overflows, you should make sure that num is less than the allocated size of destination.

If the length of the source string is less than the length of the source string, then the destination is padded with null characters. Otherwise, the function copies as many characters as it can but will not overflow the destination array; it also does not add a null terminator in the latter case.

Again, it is for this reason that we prefer strncpy over strcpy, which is vulnerable to buffer overflows.

Example:

    char string1[10] = "first";
    char string2[10] = "last";
    printf("The %s ", string2);
    printf("shall be %s ", string1);
    printf("and the %s ", string1);
    strncpy(string1, string2, 10); // string1 becomes "last"
    printf("shall be %s", string1);

Exercise: Partial Copy

Given the following string:

    char string[] = "Madison James ";

Copy only the first 8 characters into a second string and print both strings.

String Comparison: strcmp

int strcmp(char *string1, char *string2)

Compares string1 to string2. Returns 0 if both values are equal. Otherwise, the return is negative if the first character that does not match has a lower value in string1 than string2. The return is positive if the first character that does not match has a greater value in string1 than in string2.

Here is a summary:

Return Meaning
<0 string1 < string2
0 string1 == string2
>0 string1 > string2

Partial String Comparison: strncmp

int strncmp(char *string1, char *string2, int num)

Compares up to num amount of characters of string1 to those of string2. The return values have the same meaning as for strcmp.

Exercise: These are not the droids you're looking for.

DO THIS: Given the following input array, write code to look for droid class names that start with C3 and R2.

   char droids[][5] = {"C3PO" , "BB8", "R5D4", "R4G9", "R5J2", "R2D2"};

Output a message to the user for each class of droids that is found.

Standard I/O Library Functions

Overview

In previous labs, we have written programs that produced output using the puts or printf functions. We are now going to introduce program input capabilities in C.

Most C I/O functions work with file handles, which are values of type FILE* (defined in stdio.h). Given a valid file handle, you can read from that file using fscanf and fgets, and write to that file using fprintf.

Like Java, C has standard input and output streams that are merely special cases of general input and output file handles: stdin and stdout, respectively. The stdio library provides shortcut functions for handling these functions: scanf and printf (which you have already used).

Here is a table-based overview of all the I/O functions we will be using:

Function Description
fscanf formatted input
scanf formatted input from stdin
fgets line-based input
fprintf formatted output
printf formatted output to stdout

You may notice the lack of a shortcut gets function that corresponds to the fgets function in the same way that printf corresponds to fprintf. This is because the gets function does not perform bounds checking and is highly vulnerable to buffer overflow attacks (just like some of the string functions mentioned above). For this reason, we will not be using gets in this course.

First, let's examine the scanf function:

Formatted Standard Input: scanf

int scanf(char *format, ... );

This function reads data from stdin into variables. Like printf, this function is somewhat unusual in that it accepts a variable number of parameters, and thus it should come as no surprise that you use the same technique for specifying input formats via the format parameter.

Here is an example:

    int val1;
    printf("Enter a signed integer: ");
    scanf("%d", &val1);
    printf("Your number is %d.\n", val1);

Note the "address-of" operator ('&') in front of val1 in the call to scanf. Because scanf needs to save input values to variables, it needs to know where those variables are located.

Here is the list of specifiers given in lab 2 for printf that also work for scanf.

Code Description
%d signed integer (int)
%lu unsigned long integer (size_t)
%f floating-point number (float or double)
%e scientific notation (float or double)
%c character (char)
%s character string (char[])
%p pointer

Consult the scanf reference for details of these formats.

IMPORTANT: The scanf function will not discard newline characters, so you must be careful using it for input from the keyboard. You can discard whitespace characters before a value by adding a space before the format specifier. For example:

    scanf(" %d", &val1);

Finally, scanf returns an integer indicating how many items were successfully read. If the function encounters the end of the file before any items were read, then a special value (EOF) is returned.

Exercise: Oldest, Youngest, Middle?

DO THIS: Write code that aske the user to input their their age, and their number of siblings. Your code should then prompt the user for the age of each sibling and determine if the user is the oldest, middle, or youngest sibling. The code should then print this information.

Line-based File Input: fgets

The scanf function is useful for reading individual pieces of data, but occasionally it is useful to read an entire line of text into a single string for later processing. For this, we will use the fgets function:

char* fgets(char *str, int num, FILE *stream)

This function reads up to num amount of characters from the stream/file and stores them into str. If a newline ('\n') is encountered, fgets will copy it and then stop reading. In any case, the function will append a null terminator ('\0') to end the string.

If the function encounters the end of the file, it returns a null pointer. Otherwise, it returns str.

The correct way to use this function is to pass the size of str as num. This way, the function reads a line of input unless it reaches the capacity of str (at which point it would halt). This prevents buffer overruns (array out-of-bound accesses), to which its more dangerous gets cousin is vulnerable.

IMPORTANT: The saved string will retain the newline character ('\n').

Consult the gets reference for more details.

DO THIS:Run the following code in your main():

    char address [100];
    printf("Enter your street address: ");
    fgets(address, 100, stdin);
    printf("Your address is: %s\n", address);

Play with the initial capacity of 100.

Question: What happens if you enter more characters than the capacity?

IMPORTANT: Because scanf and fgets have different behaviors with regard to newline characters (scanf does not automatically trim them, but fgets does), it is not recommended that you attempt to mix them when reading from the same source.

File I/O Library Functions

Thus far, we have only used standard input (stdin) and standard output (stdout). Now, we will expand our I/O capabilities and discuss reading and writing to files.

Opening and Closing Files: fopen and fclose

You can open files on your hard drive for reading and writing in C using the functions fopen. You should remember to close any files that you open using the fclose function.

FILE* fopen(char *filename, char *mode)

The fopen function opens a file(filename) and depending on the mode parameter given allows the user to read ("r"), write ("w"), or append ("a") to that file. The function returns a FILE* object to use with later calls to functions like fgets, fprintf, or fclose.

Question: Why are there different modes for opening a file?

int fclose(FILE *handle)

The fclose function closes the file associated with the given handle.

DO THIS:Run the following code in your main():

    FILE *fout;
    fout = fopen("bottle.txt", "w");    // open your bottle
    fprintf(fout, "Help!\n");           // put your message in the bottle
    fclose(fout);                       // cork your bottle

Verify that the bottle.txt file contains your message.

Reading and Writing Files: fscanf, fgets, and fprintf

To read formatted input from a file, use the fscanf function, which works the same as scanf except that it takes as the first parameter a file handle to read from:

int fscanf(FILE *handle, char *format, ... );

The fgets function also works the same as with standard input; just send it your file handle instead of stdin.

To send formatted output to a file, use the fprintf function, which works the same as printf except that (like fscanf) it takes as the first parameter a file handle to print to.

Exercise: Mail Merge

This exercise will emulate a "Mail Merge" routine in Microsoft Word, taking data from a data source file and merging it into a form letter.

DO THIS: Write code to read space-delimited data from a file names.txt.

You will read the data from three columns as follows:

Column 1 = Title
Column 2 = First name
Column 3 = Last name

Here is a sample file:

Dr. Phil Goode
Mr. John Vayne
Ms. Janet Jackson
Dr. Andre Young
Mr. Rock Deejay
Ms. Mary Sunshine
Dr. John Hu
Mr. Steve Mister
Ms. Mary Murder
Hon. Judith Sheindlin

The function should greet each person in the file in a differen way depending on their title:

Title Greeting
Ms. "Hello, Ms. LASTNAME"
Mr. "Hey there, FIRSTNAME"
Dr. "Salutations, Dr. FIRSTNAME LASTNAME"
Hon. "Greetings to the honorable FIRSTNAME LASTNAME"

(Optional) More exercises

Exercise: String Validation

DO THIS: When submitting information via web forms, we often need to validate the form of that information. Write a phone number validation function using this signature:

boolean valid_string(char string[]);

The function should determine if the phone number looks like this: (540)-568-3555.

HINT: You may want to use the isdigit function in the ctype.h.

Exercise: Initializing turtles or strings to turtle letters

DO THIS: Write some turtle functions to draw the letters for your initials. Assume the 'font height' of 100 tall for a capital 'O' (i.e., 'O' is 100 tall by 50 wide).

void draw_a();
{
    // code to draw letter 'a' using turtle graphics
}

These could be called by a function of this form:

void draw_word(char[25] string)
{
    for (char *c = string; *c != '\0'; c++) {
        if (*c == 'a') {
            draw_a();
        } else if (*c == 'b') {
            draw_b();
        // etc.
        }
    }
}

Example: if your name is James Earl Jones you would create draw_j and draw_e functions.

We will attempt to crowd source the entire alphabet in a thread on Piazza, thus allowing a draw_word function to accurately print the turtle graphics representation for any word entered by the user.

(Challenge) Exercise: Search and Replace

Write a program to ask a user for two strings of the same size(like and & the) and take a file call search.txt and then print a file called replace.txt, with all occurrences of the first string replaced by the second string. You should use an array for this exercise.

Question: Why is it important that this exercise have you use strings of equal length?