Project 1: Building a shell

In this project, you will build a command-line shell that operates similarly to the bash shell.

Set up a repository on stu.cs.jmu.edu based on the instructions in the CS 361 Submission Procedures, using the submit directory p1-sh.git and the file name p1-sh.tar.gz.

Background information

Throughout this project, you will be implementing three different types of requirements:

Shell built-ins
These are commands that look like distinct programs, but they actually are not. Instead, the shell recognizes a small number of commands and calls an internal function rather than creating a new process. For instance, consider the cd command that can be used to change to a different directory. Since the notion of "current working directory" only exists within the context of the shell, this command is a built-in that just changes an internal variable in the shell process.
Utility programs
These are stand-alone programs that are written and compiled independently of the shell. For example, consider the ls program that lists files in a directory or the gcc compiler. These programs are not part of the shell. Instead, the shell uses fork() and execvp() (or better yet, posix_spawn()) to run the program in a separate process.
Additional shell features
In addition to running programs, shells provide a number of features that create an interactive experience. For instance, when a process exits (i.e., the code does return 0; from main() or calls exit (0);), the user might wish to know this return value. As an example, if you try to run the ls program on a file, the return code 0 would indicate the file exists and the listing was successful, while 1 could indicate the file wasn't found or permission was denied. Other features include the ability to pipe output streams to other processes or using environment variables to customize the user's experience.

Project directory structure

The project directory structure is mostly described in the Testing Documentation. You will be modifying the files in p1-sh/src to implement the shell (dukesh).

In addition to the p1-sh/src files that make up the shell source code, you will be implementing a number of command-line utilities in p1-sh/utils. These utilities can be compiled by running make utils in the p1-sh directory, which will put the compiled programs in p1-sh/bin. You can also compile a single utility by running make bin/ls (for the ls.c program).

The utility programs do not have their own test cases provided. Instead, they are tested indirectly when the shell test cases try to run them. You can (and should) test and debug them yourself by running them on the normal command line, such as:

$ ./bin/ls data

In the p1-sh/tests directory, there are subdirectories that are used as input data (tests/data, tests/code, and tests/cut_data). There is also a tests/scripts directory contain a number of text files that are shell scripts. A shell script is a text file that contains a list of commands, with each line corresponding to a single command that could be typed at the shell prompt. These shell scripts can be passed to dukesh using the -b flag (for "batch processing") to launch the shell, run a few programs, then shut down.


Implementation Requirements

Your first task is to build a minimally functional shell that can run external programs and built-in commands, as well as providing access to return codes. From there, you will incrementally add features to control environment variables, pipe data between processes, and redirect output to persistent files.

Getting started: BASIC requirements

To get started, you will build a minimally functional shell with the following components, which are all part of a single shell feature:

  • Interactive shell and batch processing:
    • Shell prompt - print the "$ " string (no trailing newline) and read a command from STDIN.
    • Shell scripts - if dukesh is run with -b, read the commands from the file instead of STDIN.
  • Built-ins:
    • quit - exit the shell.
    • echo - print the given input to the screen exactly as entered. Escape sequences and environment variables are ignored.
  • Utilities:
    • Process creation - if a provided command is not a recognized built-in, run the command as a separate process using fork() and execvp(). In this stage, you only need to handle a program with at most one command-line argument.

The provided source code includes the basic structure of the loop to execute to display the prompt and read the command line. Your first task is to parse the command line to use either STDIN or the provided shell script file as input.

MIN requirements

For this stage, you will need to implement the following features:

  • FSM-based parsing (fsm)
    • FSM implementation - integrate the parsing finite state machine from Lab 2, which creates an extensible framework for adding features in later stages. It is recommended to have the code in src/cmd.c return an array of strings for one command or (more complex) an array of arrays of strings, denoting a sequence of commands.
    • FSM transition debugging - if the -f flag is passed, print out the transition debugging messages as in the lab and do not execute the command.
  • Built-ins: (builtins)
    • cd - change the current working directory.
    • echo - print the given input to the screen. If the input contains the escape sequence "\n", then replace these two bytes with the newline character ('\n'). If the input contains '$', this is a reference to a return value or environment variable. See below.
    • pwd - print the current working directory to the screen.
    • type - print the executable source of a command. If the command is a built-in, print cmd is a shell built-in (assuming cmd is the command name). If not, traverse $PATH for an executable file that matches the command.
    • Return codes - using echo $? will print the return code from the previous built-in command.
  • Utilities: (procs)
    • Process execution - extend the execution of separate processes to pass an arbitrary number of arguments.
    • Return codes - using echo $? will print the return code from the previous child process.
    • ls - list the files in a requested directory. If no directory is specified, use the current working directory. Add support for the flags as specified in the usage() function.
    • head - print the first several lines of a file. The default is to print 5 lines, but the -n flag can specify a different number. If no file is specified on the command line, read from standard input.

For the echo command, at this point, you should print the remaining arguments with a single space between the words as shown below. If the -d flag is passed, use a character delimiter instead of a space:

$ echo hello world
hello world
$ echo hello world    how        are you
hello world how are you
$ echo -d: hello world    how        are you
hello:world:how:are:you

For the echo command, $? indicates that we are wanting to print the return code. As an example, consider the following sequence of commands (note that the $ at the beginning of the line is the prompt and other lines are output produced):

$ ./bin/ls -sa data
0 empty.txt
6 FIRST.txt
5 .hidden.txt
9 pwd.txt
22 yat.txt
$ echo $?
0
$ ./bin/ls flubberdub
$ echo $?
1

INTER requirements (choose two or three)

Both projects in this course have four INTER features each. Earning a project grade of B or A requires completing five or six of these eight features. You may choose which features to implement. For this project stage, you will need to choose from the following features:

  • Extended FSM processing (fsmext)
    • Multiple pipes - handle a sequence of commands separated by more than one pipe.
    • Background processes - the & token at the end of the line indicates a command should run in the background.
    • I/O redirections - send STDOUT or STDERR to a file or pipe.
    • Parsing errors - detect and reject invalid command-line syntax.
  • Environment variables (env)
    • export - built-in to set an environment variable. You may use the provided hash table to keep track of all environment variables. CAUTION: You will need to change the way that you execute programs from the partial implementation. You will need to use execve() or posix_spawn() so that you can pass environment variables to the new process. By default, you must always pass the $PATH. (You can use getenv() to get your real $PATH and pass it along.)
    • unset - built-in to unset an environment variable. If using the provided hash table, remove the key from the table.
    • echo - extend the implementation so that environment variables can be printed. All environment variable names must be wrapped in curly braces as shown below.
    • repeat - print out the specified environment variables a repeated number of times.
    • Single-command variables - pass environment variables to processes or built-ins without persistence. For instance, $ A=5 ./bin/repeat 2 A would pass A=5 as an environment variable to ./bin/repeat, but it would not be stored as with export.
  • File utilities: (util)
    • chmod - change the permissions on a file. User, group, and other permissions are passed as separate strings of the form "rwx" with - in place of a permission not granted. You will need to process these strings and create the appropriate mode_t value to pass to the chmod() C function.
    • file - displays information about the type of file, such as indicating if a file is binary, an executable, or a script. See below for more information.
    • rm - delete a file.
    • touch - create an empty file or update the timestamp on an existing file.
    • mkdir - create a directory.
  • I/O redirections (io)
    • STDOUT redirection - write a child process's output to a file.
    • Pipe redirection - write a child process's output to a pipe and redirect another child process's input from the pipe.
    • tee - utility program that reads input from STDIN or a file, then writes the data to both an output file and STDOUT.

Recall that environment variables get passed to processes along with their value as a string. If you set the $ALPHA environment variable to have the value beta, then the string ALPHA=beta gets passed in the envp array. Or, you can retrieve the variable in the new process by calling getenv ("ALPHA"), which will return the string "beta".

The following lines illustrate the expected behavior for environment variables in this project. The first echo tries to look up the $VAR variable, which does not exist yet. Then the variable gets set with export and the second echo finds the value.

$ echo VAR=${VAR}
VAR=
$ export VAR=hello
$ echo VAR=${VAR}
VAR=hello
$ ./bin/repeat 2 VAR
VAR=hello
VAR=hello
$ ./bin/env ALPHA=beta ./bin/repeat 1 ALPHA 1 VAR
ALPHA=beta
VAR=hello

For the file utility, you will need to open the file and examine its contents. For instance, a file is considered an ASCII file if every byte is between 0 and 127. For binary executables, you will need to examine the first few bytes of the ELF or Mach-O formats (you should not need to look at more than the first eight bytes for either). For scripts, the first line (often called the shebang operator) indicates the location of the interpreter to invoke.

ADV requirements (choose one or two)

Both projects in this course have four ADV features each. Earning a project grade of B or A requires completing two or four of these eight features. You may choose which features to implement. Some depend on specific INTER features that must be implemented first. For this project stage, you will need to choose from the following features:

  • Process control: (pcntl)
    • Background processes - if a command-line ends in &, the shell should run the program without calling waitpid(). Keep track of up to 10 background processes.
    • fg - bring a background process to the foreground (causing the shell to wait for completion).
    • kill - send a signal to a background process. Note that the command line will NOT use the standard PID to identify the process. Instead, it will be an index number based on the order of background processes created.
    • jobs - list the current background process command lines.
  • STDOUT redirections: (ioext)
    • Multiple pipes - chain together a sequence of more than two processes that are linked by pipes.
    • Pipes with built-ins - temporarily redirect the shell's STDOUT to a pipe while a built-in command runs. HINT: Use dup() to get a file descriptor to keep track of the original STDOUT, then restore it later with dup2().
    • STDOUT with built-ins - temporarily redirect the shell's STDOUT to an output file while a built-in command runs.
  • STDERR redirections: (ioerr)
    • Process STDERR merging - the 2>&1 token causes STDERR to be merged into STDERR, causing printf() and fprintf(stderr, ...) to write to STDOUT.
    • Built-in STDERR merging - cause a temporary merging of both file streams while a built-in command runs.
  • Message passing: (ipc)
    • Store up to 10 messages in a priority-based message queue between separate command lines. If any queues are still open when the quit command is entered, close them.
    • mqopen - open a named message queue, assigning an ID starting at 0.
    • mqsend - send a message to be stored in the shell's message queue. If the -p flag is passed, use a different priority other than the default value of 10. (Lower numbers mean higher priority and should be retrieved earlier.)
    • mqrecv - receive a message from the queue and store it in an environment variable.
    • mqclose - free up the message queue's resources. If there are stored messages that have not been received, print them to the display.

Tips

The following tips may be helpful for completing this project on time.

Using the hash table

To assist with the environment variable handling, a hash table implementation has been included in the distribution. For simplicity, the hash table only accepts strings as keys and values. Also, the hash table is a global variable, so you can access it from anywhere. You can consult src/hash.h for the variable types, but the following code demonstrates the core functionality:

1
2
3
4
5
hash_init (100); // store up to 100 items initially
hash_insert ("key", "10"); // create mapping hash[key] = 10
char *value = hash_find ("key"); // search for hash[key]
hash_remove ("key"); // removes hash[key] from the table
hash_destroy (); // remove all keys and free the resources

The hash table also provides a hash_dump() function that you can use to print all values (for debugging purposes). You can also loop through all of the keys as follows:

1
2
3
4
5
6
7
char **keys = hash_keys (); // get all keys as a NULL-terminated array
for (size_t i = 0; keys[i] != NULL; i++) // length is unknown, so go until NULL key is found
  {
    char *key = keys[i]; // get the key
    char *value = hash_lookup (key); // entry should be found since this is a key
    printf ("hash[%s] = %s\n", key, value); // print the mapping
  }

Managing with a partner

At each stage of this project, there are several components that can be implemented independently. In particular, the utility programs can be developed without having a functioning shell, and the shell built-ins and features can be done without supporting processes. As such, there is a natural structure for dividing the work.

As a word of caution, you should agree to a completion schedule. The integration of these components is not trivial, particularly at the Phase 2 implementation level. As such, you should set your own intermediate deadlines for each level. If you plan on 10-14 days for each level, you will have enough time to complete at least the Phase 3 implementation.



James Madison University logo


© 2011-2026 Michael S. Kirkpatrick.
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.