Project 1: Building a shell

In this project, you will build a command-line shell that operates similarly to the bash shell.

Set up a repository on stu.cs.jmu.edu based on the instructions in the CS 361 Submission Procedures, using the submit directory p1-sh.git and the file name p1-sh.tar.gz.

Background information

At each implementation level, you will implement three different kinds of requirements:

Utility programs
These are stand-alone programs that are written and compiled independently of the shell. For example, consider the ls program that lists files in a directory or the gcc compiler. These programs are not part of the shell. Instead, the shell will try to locate the executable program file in a directory based on the $PATH environment variable. If the program is found, the shell uses fork() and execlp() (or better yet, posix_spawn()) to run the program in a separate process.
Shell built-ins
These are commands that look like distinct programs, but they actually are not. Instead, the shell recognizes a small number of commands and calls an internal function rather than creating a new process. For instance, consider the cd command that can be used to change to a different directory. Since the notion of "current working directory" only exists within the context of the shell, this command is a built-in that just changes an internal variable in the shell process.
Additional shell features
In addition to running programs, shells provide a number of features that create an interactive experience. For instance, when a process exits (i.e., the code does return 0; from main() or calls exit (0);), the user might wish to know this return value. As an example, if you try to run the ls program on a file, the return code 0 would indicate the file exists and the listing was successful, while 1 could indicate the file wasn't found or permission was denied. Other features include the ability to pipe output streams to other processes or using environment variables to customize the user's experience.

Project directory structure

You will mainly be working with files in the p1-sh/src directory. These source code files will be compiled as object files into p1-sh/build, then linked into the dukesh executable.

In addition, you will be implementing the utility programs in the p1-sh/utils directory. There is a Makefile in that directory, so you can run make there to build these programs. The compiled executables will be in p1-sh/bin. (Note that running make in the main p1-sh directory will also compile these programs.)

Within the p1-sh/tests directory structure, there are symbolic links to p1-sh/bin and p1-sh/data. These links help to ensure that the testing infrastructure (which runs with p1-sh/tests as the current working directory) can access these files appropriately. That is, once you have implemented the ls utility program, running ./bin/ls from either p1-sh or p1-sh/tests will run the same program.

The utility programs do not have their own test cases provided. Instead, they are tested indirectly when the shell test cases try to run them.

Lastly, the p1-sh/tests/scripts directory contain a number of text files that are shell scripts. A shell script is a file that contains a list of commands, with each line corresponding to a single command that could be typed at the shell prompt. These shell scripts can be passed to dukesh using the -b flag (for "batch processing") to launch the shell, run a few programs, then shut down.


Implementation Requirements

Your first task is to build a minimally functional shell that can run external programs and built-in commands, as well as providing access to return codes. From there, you will incrementally add features to control environment variables, pipe data between processes, and redirect output to persistent files.


Phase 1: Shell loop (partial credit)

For this stage, you will need to implement the following features:

  • Interactive shell and batch processing:
    • Shell prompt - print the "$ " string (no trailing newline) and read a command from STDIN.
    • Shell scripts - if dukesh is run with -b, read the commands from the file instead of STDIN.
  • Built-ins:
    • echo - print the given input to the screen exactly as entered. Escape sequences and environment variables are ignored.
    • quit - exit the shell.

For both the prompt and scripts, you should use fgets() to read a line of input. The difference is that the script will use fopen() to open a file as a stream while the interactive prompt uses stdin as the stream.

This phase will be graded both automated and manually. You need to test your code by running dukesh without a script. This will be helpful later because you will be able to test your code more easily.


Phase 2: Shell built-ins and processes (C requirements)

For this stage, you will need to implement the following features:

  • Built-ins:
    • cd - change the current working directory.
    • echo - print the given input to the screen. If the input contains the escape sequence "\n", then replace these two bytes with the newline character ('\n'). If the input contains '$', this is a reference to a return value or environment variable. See below.
    • pwd - print the current working directory to the screen.
    • which - print the executable source of a command. If the command is a built-in, print cmd: dukesh built-in command (assuming cmd is the command name). If the command begins with ./ (e.g., which ./bin/ls), check if it is executable. Otherwise, traverse through the $PATH for an executable file that matches the command.
  • Utilities:
    • ls - list the files in a requested directory. If no directory is specified, use the current working directory. Add support for the flags as specified in the usage() function.
    • head - print the first several lines of a file. The default is to print 5 lines, but the -n flag can specify a different number. If no file is specified on the command line, read from standard input.
    • cut - tokenize each line of input text from a file and print out a specified field. The default is to split lines based on spaces but this can be changed with the -d flag. The default is to print the first field but the -f flag changes this. For example -f 2 for the input "hello, world it's me" would print world. Using the flags -f , -f 1 for the same input would print hello. If no file is specified on the command line, read from standard input.
  • Other features:
    • Spawning processes - run utility programs in a separate process using either fork()/exec() or posix_spawn().

For the echo command, at this point, you should print the remaining arguments with a single space between the words as shown below:

$ echo hello world
hello world
$ echo hello world    how        are you
hello world how are you

Phase 3: Environment variables (B requirements)

For this stage, you will need to implement the following features:

  • Built-ins:
    • export - set an environment variable. You may use the provided hash table to keep track of all environment variables. NOTE: You will need to change the way that you execute programs from the partial implementation. You will need to use execve() or posix_spawn() so that you can pass environment variables to the new process. By default, you must always pass the $PATH. (You can use getenv() to get your real $PATH and pass it along.)
    • unset - unset an environment variable. If using the provided hash table, remove the key from the table.
    • echo - extend the implementation so that environment variables can be printed. All environment variable names must be wrapped in curly braces as shown below.
  • Utilities:
    • env - set an environment variable for the execution of one particular process. This program will create the envp array of environment variables passed as arguments (passed as VAR=VAL with no space) and pass that array as the environment variable for a new process based on the rest of the command line (see below). The envp array should contain the variables specified as arguments as well as any that had been set by the export command above.
    • repeat - print out the specified environment variables a repeated number of times.
  • Other features:
    • Return codes - using echo $? will print the return code from the previous utility or built-in command.

For the echo command, $? indicates that we are wanting to print the return code. As an example, consider the following sequence of commands (note that the $ at the beginning of the line is the prompt and other lines are output produced):

$ ./bin/ls -sa data
4096 .
4096 ..
0 empty.txt
5 .hidden.txt
9 pwd.txt
$ echo $?
0
$ ./bin/ls flubberdub
$ echo $?
1

Recall that environment variables get passed to processes along with their value as a string. If you set the $ALPHA environment variable to have the value beta, then the string ALPHA=beta gets passed in the envp array. Or, you can retrieve the variable in the new process by calling getenv ("ALPHA"), which will return the string "beta".

The following lines illustrate the expected behavior for environment variables in this project. The first echo tries to look up the $VAR variable, which does not exist yet. Then the variable gets set with export and the second echo finds the value.

$ echo VAR=${VAR}
VAR=
$ export VAR=hello
$ echo VAR=${VAR}
VAR=hello
$ ./bin/repeat 2 VAR
VAR=hello
VAR=hello
$ ./bin/env ALPHA=beta ./bin/repeat 1 ALPHA 1 VAR
ALPHA=beta
VAR=hello

Phase 4: Piping STDIN and STDOUT (A requirements)

  • Utilities:
    • cat - open a file and print out its contents.
  • Other features:
    • Piping commands together - use a pipe to redirect the STDOUT from one process to become the STDIN of the next.
    • Fixing output streams - The initial test cases for pipes do not produce output the same as running the commands manually. Fix this by making child processes write their output to a pipe instead of their normal STDOUT.

Consider the (abbreviated) contents of p1-sh/tests/expected/C_binaries.txt:

empty.txt
pwd.txt
$ ./bin/ls data
$ quit

According to this file, the output of the ./bin/ls command appears before the command is actually called. This isn't some form of magically predicting the future. Instead, it's a quirk of the file redirection used by the test cases.

Specifically, no output from a process is written to the output file until all of that process's output is produced. In this case, we are dealing with two processes: dukesh (the shell) and the child process ./bin/ls. Since the ./bin/ls program runs to completion before the quit command is entered, its output is written to the p1-sh/tests/outputs/C_binaries.txt file. Then, when the dukesh shell exits, all of its output gets appended even though it actually produced some output first.

The fix for this is conceptually straightforward: You must use dup2() to redirect the STDOUT for every child process back to the shell. Thus, instead of the ./bin/ls writing to its STDOUT (i.e., the screen), it will write to a pipe that the shell reads from. The shell then needs to print the output read to the screen.

Where this gets tricky (in a real shell) is when there are pipes involved. If the command after the pipe is a built-in, there is no separate process; instead, the shell must detect this and only use dup2() after a pipe if the command is not a built-in. Similarly, the command before the pipe might be a built-in, so the internal implementation would require (temporarily) redirecting that functionality to a pipe.

You do not need to handle these complex cases. In this project, the command before the pipe can only be one of the utilities. After the pipe, the command after the pipe must also be a program run in a separate process. As such, both the command before and the command after the pipe involve calling fork(), setting up argv (by tokenizing the arguments), using dup2() to redirect either stdin or stdout, then calling execvp().

The following examples illustrate how this would work.

$ ./bin/ls data | ./bin/head -n 1
empty.txt
$ ./bin/cat data/empty.txt | ./bin/head -n 1
pwd
$ quit

Tips

The following tips may be helpful for completing this project on time.

Using the hash table

To assist with the environment variable handling, a hash table implementation has been included in the distribution. For simplicity, the hash table only accepts strings as keys and values. Also, the hash table is a global variable, so you can access it from anywhere. You can consult src/hash.h for the variable types, but the following code demonstrates the core functionality:

1
2
3
4
5
hash_init (100); // store up to 100 items initially
hash_insert ("key", "10"); // create mapping hash[key] = 10
char *value = hash_find ("key"); // search for hash[key]
hash_remove ("key"); // removes hash[key] from the table
hash_destroy (); // remove all keys and free the resources

If you try to find a key that does not exist, hash_find() will return NULL. You need to detect the NULL return value and make sure you do not cause a segmentation fault. Instead, you just print the empty string.

The hash table also provides a hash_dump() function that you can use to print all values (for debugging purposes). You can also loop through all of the keys as follows:

1
2
3
4
5
6
7
char **keys = hash_keys (); // get all keys as a NULL-terminated array
for (size_t i = 0; keys[i] != NULL; i++) // length is unknown, so go until NULL key is found
  {
    char *key = keys[i]; // get the key
    char *value = hash_lookup (key); // entry should be found since this is a key
    printf ("hash[%s] = %s\n", key, value); // print the mapping
  }

Managing with a partner

At each stage of this project, there are several components that can be implemented independently. In particular, the utility programs can be developed without having a functioning shell, and the shell built-ins and features can be done without supporting processes. As such, there is a natural structure for dividing the work.

As a word of caution, you should agree to a completion schedule. The integration of these components is not trivial, particularly at the Basic implementation level. As such, you should set your own intermediate deadlines for each level. If you plan on 10-14 days for each level, you will have enough time to complete at least the Intermediate implementation.



James Madison University logo


© 2011-2024 Michael S. Kirkpatrick.
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.