WARNING: Before starting on this project, you
should be familiar with the Testing Procedures
and the Project Submission Procedures, as these
are different from those used in CS 261.
Background information
Throughout this project, you will be implementing three different types of
requirements:
- Shell built-ins
- These are commands that look like distinct programs, but
they actually are not. Instead, the shell recognizes a small number
of commands and calls an internal function rather than creating a
new process. For instance, consider the
cd command that
can be used to change to a different directory. Since the notion of
"current working directory" only exists within the context of the
shell, this command is a built-in that just changes an internal
variable in the shell process.
- Utility programs
- These are stand-alone programs that are written and compiled
independently of the shell. For example, consider the
ls
program that lists files in a directory or the gcc
compiler. These programs are not part of the shell. Instead, the shell
uses fork() and execvp() (or better yet,
posix_spawn()) to run the program in a
separate process.
- Additional shell features
- In addition to running programs, shells provide a number of features
that create an interactive experience. For instance, when a process
exits (i.e., the code does
return 0; from
main() or calls exit (0);), the user might
wish to know this return value. As an example, if you try
to run the ls program on a file, the return code 0 would
indicate the file exists and the listing was successful, while 1
could indicate the file wasn't found or permission was denied. Other
features include the ability to pipe output streams to other processes
or using environment variables to customize the user's experience.
Project directory structure
The project directory structure is mostly described in the
Testing Documentation.
You will be modifying the files in p1-sh/src to implement the
shell (dukesh).
In addition to the p1-sh/src files that make up the
shell source code, you will be implementing a number of command-line utilities
in p1-sh/utils. These utilities can be compiled by running
make utils in the p1-sh directory, which will put
the compiled programs in p1-sh/bin. You can also compile a
single utility by running make bin/ls (for the ls.c
program).
The utility programs do not have their own test
cases provided. Instead, they are tested indirectly when the shell test
cases try to run them. You can (and should) test and debug
them yourself by running them on the normal command line, such as:
In the p1-sh/tests directory, there are subdirectories that
are used as input data (tests/data, tests/code, and
tests/cut_data). There is also a tests/scripts
directory contain a number of text files that are shell scripts. A
shell script is a text file that contains a list of commands, with each line
corresponding to a single command that could be typed at the shell prompt.
These shell scripts can be passed to dukesh using the
-b flag (for "batch processing") to launch the shell, run a few
programs, then shut down.
Implementation Requirements
Your first task is to build a minimally functional shell that can run
external programs and built-in commands, as well as providing access to
return codes. From there, you will incrementally add features to control
environment variables, pipe data between processes, and redirect output
to persistent files.
Getting started: BASIC requirements
To get started, you will build a minimally functional shell with the
following components, which are all part of a single shell
feature:
- Interactive shell and batch processing:
- Shell prompt - print the
"$ " string (no trailing
newline) and read a command from STDIN.
- Shell scripts - if
dukesh is run with -b,
read the commands from the file instead of STDIN.
- Built-ins:
quit - exit the shell.
echo - print the given input to the screen exactly as
entered. Escape sequences and environment variables are ignored.
- Utilities:
- Process creation - if a provided command is not a recognized
built-in, run the command as a separate process using
fork() and execvp(). In this stage, you
only need to handle a program with at most one command-line argument.
The provided source code includes the basic structure of the loop to
execute to display the prompt and read the command line. Your first task is
to parse the command line to use either STDIN or the provided shell script
file as input.
A common mistake at this point is to try to
implement all of this code in one place, such as src/main.c. Do
not do this. Instead, become familiar with the separate files in the
src directory and put separate functionality in appropriate
places. For instance, src/cmd.c should focus only on
parsing the command line, while code for executing the commands should be in
src/builtins.c or src/process.c as appropriate.
You should also start early by decomposing your code into small functions
that focus on one very specific task. Ideally, the entire body of a function
should fit on one screen of text.
MIN requirements
STOP! Before
going on to this next phase, you need to make sure that you have
correctly submitted the
BASIC claim and gotten credit for it. That includes doing an
interactive demo during my office hours.
For this stage, you will need to implement the following features:
- FSM-based parsing (
fsm)
- FSM implementation - integrate the parsing finite state machine
from Lab 2, which creates an
extensible framework for adding features in later stages. It is
recommended to have the code in
src/cmd.c return an
array of strings for one command or (more complex) an array of
arrays of strings, denoting a sequence of commands.
- FSM transition debugging - if the
-f flag is passed,
print out the transition debugging messages as in the lab and do not
execute the command.
- Built-ins: (
builtins)
cd - change the current working directory.
echo - print the given input to the screen. If the
input contains the escape sequence "\n", then replace
these two bytes with the newline character ('\n').
If the input contains '$', this is a reference to a
return value or environment variable. See below.
pwd - print the current working directory to the
screen.
type - print the executable source of a command. If the
command is a built-in, print cmd is a shell built-in
(assuming cmd is the command name). If not, traverse
$PATH for an executable file that matches the command.
- Return codes - using
echo $? will print the return
code from the previous built-in command.
- Utilities: (
procs)
- Process execution - extend the execution of separate processes to
pass an arbitrary number of arguments.
- Return codes - using
echo $? will print the return
code from the previous child process.
ls - list the files in a requested directory. If
no directory is specified, use the current working directory. Add
support for the flags as specified in the usage()
function.
head - print the first several lines of a file.
The default is to print 5 lines, but the -n flag can
specify a different number. If no file is specified on the command
line, read from standard input.
For the echo command, at this point, you should print the
remaining arguments with a single space between the words as shown below. If
the -d flag is passed, use a character delimiter instead of a
space:
$ echo hello world
hello world
$ echo hello world how are you
hello world how are you
$ echo -d: hello world how are you
hello:world:how:are:you
For the echo command, $? indicates that we are
wanting to print the return code. As an example, consider the following
sequence of commands (note that the $ at the beginning of the
line is the prompt and other lines are output produced):
$ ./bin/ls -sa data
0 empty.txt
6 FIRST.txt
5 .hidden.txt
9 pwd.txt
22 yat.txt
$ echo $?
0
$ ./bin/ls flubberdub
$ echo $?
1
WARNING:
For the ls command, the order that the files are retrieved
from the directory can vary based on when files were created. You must
print the files in alphabetical order. This can be a challenge with the
-a flag, as hidden files begin with a . but
must be ordered alphabetically. E.g., the correct order should be
a.txt, .b.txt, c.txt for those
three files. Note that . (current directory) and ..
(parent directory) should not be listed.
NOTE:
Except for repeat (next phase), all of the built-in
commands and utility programs in this project are
real. You can use man to read the documentation for each
(such as man which to learn about the which
command). However, the interfaces provided are deliberately different
from the real commands. To get acquainted with what each command should
do and produce as output, you need to look at the shell scripts used
in the test cases and the corresponding expected output files. Consult
the testing documentation if you are
unfamiliar or rusty with the testing infrastructure.
INTER requirements (choose two or three)
Both projects in this course have four INTER features each.
Earning a project grade of B or A requires completing five or six of these
eight features. You may choose which features to implement. For this project
stage, you will need to choose from the following features:
- Extended FSM processing
(
fsmext)
- Multiple pipes - handle a sequence of commands separated by more
than one pipe.
- Background processes - the
& token at the end of
the line indicates a command should run in the background.
- I/O redirections - send STDOUT or STDERR to a file or pipe.
- Parsing errors - detect and reject invalid command-line syntax.
- Environment variables
(
env)
export - built-in to set an environment variable. You
may use the provided hash table to keep track of all environment
variables. CAUTION: You will need
to change the way that you execute programs from the partial
implementation. You will need to use execve() or
posix_spawn() so that you can pass environment
variables to the new process. By default, you must always pass
the $PATH. (You can use getenv() to
get your real $PATH and pass it along.)
unset - built-in to unset an environment variable. If
using the provided hash table, remove the key from the table.
echo - extend the implementation so that environment
variables can be printed. All environment variable names must be
wrapped in curly braces as shown below.
repeat - print out the specified environment variables
a repeated number of times.
- Single-command variables - pass environment variables to processes
or built-ins without persistence. For instance,
$ A=5 ./bin/repeat 2 A would pass A=5 as an
environment variable to ./bin/repeat, but it would not be
stored as with export.
- File utilities: (
util)
chmod - change the permissions on a file. User,
group, and other permissions are passed as separate strings of the
form "rwx" with - in place of a permission
not granted. You will need to process these strings and create the
appropriate mode_t value to pass to the chmod()
C function.
file - displays information about the type of file,
such as indicating if a file is binary, an executable, or a script.
See below for more information.
rm - delete a file.
touch - create an empty file or update the timestamp
on an existing file.
mkdir - create a directory.
- I/O redirections (
io)
- STDOUT redirection - write a child process's output to a file.
- Pipe redirection - write a child process's output to a pipe and
redirect another child process's input from the pipe.
tee - utility program that reads input from STDIN or a
file, then writes the data to both an output file and STDOUT.
Recall that environment variables get passed
to processes along with their value as a string. If you set the
$ALPHA environment variable to have the value beta,
then the string ALPHA=beta gets passed in the envp
array. Or, you can retrieve the variable in the new process by calling
getenv ("ALPHA"), which will return the string "beta".
The following lines illustrate the expected behavior for environment
variables in this project. The first echo tries to look up
the $VAR variable, which does not exist yet. Then the variable
gets set with export and the second echo finds
the value.
$ echo VAR=${VAR}
VAR=
$ export VAR=hello
$ echo VAR=${VAR}
VAR=hello
$ ./bin/repeat 2 VAR
VAR=hello
VAR=hello
$ ./bin/env ALPHA=beta ./bin/repeat 1 ALPHA 1 VAR
ALPHA=beta
VAR=hello
CAUTION:
Your code must NOT
depend on manually setting environment variables before running
make test. Doing so means you are using bash's
implementation of environment variables and not your own. If you want to
make sure that your code works and is not inheriting this implementation,
you should open a new terminal window and run make clean &&
make test.
For the file utility, you will need to open the file and
examine its contents. For instance, a file is considered an ASCII file if
every byte is between 0 and 127. For binary executables, you will need to
examine the first few bytes of the
ELF
or Mach-O formats (you
should not need to look at more than the first eight bytes for either).
For scripts, the first line (often called the shebang
operator)
indicates the location of the interpreter to invoke.
ADV requirements (choose one or two)
Both projects in this course have four ADV features each.
Earning a project grade of B or A requires completing two or four of these
eight features. You may choose which features to implement. Some depend on
specific INTER features that must be implemented first. For this
project stage, you will need to choose from the following features:
- Process control: (
pcntl)
- Background processes - if a command-line ends in
&,
the shell should run the program without calling waitpid().
Keep track of up to 10 background processes.
fg - bring a background process to the foreground
(causing the shell to wait for completion).
kill - send a signal to a background process. Note that
the command line will NOT use the standard PID to
identify the process. Instead, it will be an index number based on the
order of background processes created.
jobs - list the current background process command
lines.
- STDOUT redirections: (
ioext)
- Multiple pipes - chain together a sequence of more than two processes
that are linked by pipes.
- Pipes with built-ins - temporarily redirect the shell's STDOUT to a
pipe while a built-in command runs. HINT: Use
dup() to
get a file descriptor to keep track of the original STDOUT, then
restore it later with dup2().
- STDOUT with built-ins - temporarily redirect the shell's STDOUT to an
output file while a built-in command runs.
- STDERR redirections: (
ioerr)
- Process STDERR merging - the
2>&1 token causes
STDERR to be merged into STDERR, causing printf() and
fprintf(stderr, ...) to write to STDOUT.
- Built-in STDERR merging - cause a temporary merging of both file
streams while a built-in command runs.
- Message passing: (
ipc)
- Store up to 10 messages in a priority-based message queue between
separate command lines. If any queues are still open when the
quit command is entered, close them.
mqopen - open a named message queue, assigning an ID
starting at 0.
mqsend - send a message to be stored in the shell's
message queue. If the -p flag is passed, use a different
priority other than the default value of 10. (Lower numbers mean
higher priority and should be retrieved earlier.)
mqrecv - receive a message from the queue and store it
in an environment variable.
mqclose - free up the message queue's resources. If
there are stored messages that have not been received, print them to
the display.
Tips
The following tips may be helpful for completing this project on time.
Using the hash table
To assist with the environment variable handling, a hash table
implementation has been included in the distribution. For simplicity,
the hash table only accepts strings as keys and values. Also, the hash
table is a global variable, so you can access it from anywhere. You can
consult src/hash.h for the variable types, but the following
code demonstrates the core functionality:
| hash_init (100); // store up to 100 items initially
hash_insert ("key", "10"); // create mapping hash[key] = 10
char *value = hash_find ("key"); // search for hash[key]
hash_remove ("key"); // removes hash[key] from the table
hash_destroy (); // remove all keys and free the resources
|
CAUTION: If you try to find a
key that does not exist, hash_find() will return NULL. You need
to detect the NULL return value and make sure you do not cause a
segmentation fault. Instead, you just print the empty string.
The hash table also provides a hash_dump() function that
you can use to print all values (for debugging purposes). You can also
loop through all of the keys as follows:
| char **keys = hash_keys (); // get all keys as a NULL-terminated array
for (size_t i = 0; keys[i] != NULL; i++) // length is unknown, so go until NULL key is found
{
char *key = keys[i]; // get the key
char *value = hash_lookup (key); // entry should be found since this is a key
printf ("hash[%s] = %s\n", key, value); // print the mapping
}
|
Managing with a partner
At each stage of this project, there are several components that can
be implemented independently. In particular, the utility programs can
be developed without having a functioning shell, and the shell built-ins
and features can be done without supporting processes. As such, there
is a natural structure for dividing the work.
As a word of caution, you should agree to a completion schedule.
The integration of these components is not trivial, particularly at the
Phase 2 implementation level. As such, you should set your own intermediate
deadlines for each level. If you plan on 10-14 days for each level, you
will have enough time to complete at least the Phase 3 implementation.