Background information
At each implementation level, you will implement three different kinds
of requirements:
- Utility programs
- These are stand-alone programs that are written and compiled
independently of the shell. For example, consider the
ls
program that lists files in a directory or the gcc
compiler. These programs are not part of the shell. Instead, the
shell will try to locate the executable program file in a directory
based on the $PATH
environment variable. If the program
is found, the shell uses fork()
and execlp()
(or better yet, posix_spawn()
) to run the program in a
separate process.
- Shell built-ins
- These are commands that look like distinct programs, but
they actually are not. Instead, the shell recognizes a small number
of commands and calls an internal function rather than creating a
new process. For instance, consider the
cd
command that
can be used to change to a different directory. Since the notion of
"current working directory" only exists within the context of the
shell, this command is a built-in that just changes an internal
variable in the shell process.
- Additional shell features
- In addition to running programs, shells provide a number of features
that create an interactive experience. For instance, when a process
exits (i.e., the code does
return 0;
from
main()
or calls exit (0);
), the user might
wish to know this return value. As an example, if you try
to run the ls
program on a file, the return code 0 would
indicate the file exists and the listing was successful, while 1
could indicate the file wasn't found or permission was denied. Other
features include the ability to pipe output streams to other processes
or using environment variables to customize the user's experience.
Project directory structure
You will mainly be working with files in the p1-sh/src
directory. These source code files will be compiled as object files into
p1-sh/build
, then linked into the dukesh
executable.
In addition, you will be implementing the utility programs in the
p1-sh/utils
directory. There is a Makefile
in
that directory, so you can run make
there to build these
programs. The compiled executables will be in p1-sh/bin
.
(Note that running make
in the main p1-sh
directory will also compile these programs.)
Within the p1-sh/tests
directory structure, there are
symbolic links to p1-sh/bin
and p1-sh/data
.
These links help to ensure that the testing infrastructure (which runs
with p1-sh/tests
as the current working directory) can
access these files appropriately. That is, once you have implemented
the ls
utility program, running ./bin/ls
from
either p1-sh
or p1-sh/tests
will run the same
program.
The utility programs do not have their own test
cases provided. Instead, they are tested indirectly when the shell test
cases try to run them.
Lastly, the p1-sh/tests/scripts
directory contain a
number of text files that are shell scripts. A shell script is
a file that contains a list of commands, with each line corresponding to
a single command that could be typed at the shell prompt. These shell
scripts can be passed to dukesh
using the -b
flag (for "batch processing") to launch the shell, run a few programs,
then shut down.
Implementation Requirements
Your first task is to build a minimally functional shell that can run
external programs and built-in commands, as well as providing access to
return codes. From there, you will incrementally add features to control
environment variables, pipe data between processes, and redirect output
to persistent files.
Phase 1: Shell loop (partial credit)
For this stage, you will need to implement the following features:
- Interactive shell and batch processing:
- Shell prompt - print the
"$ "
string (no trailing
newline) and read a command from STDIN.
- Shell scripts - if
dukesh
is run with -b
,
read the commands from the file instead of STDIN.
- Built-ins:
echo
- print the given input to the screen exactly as
entered. Escape sequences and environment variables are ignored.
quit
- exit the shell.
For both the prompt and scripts, you should use fgets()
to
read a line of input. The difference is that the script will use
fopen()
to open a file as a stream while the interactive prompt
uses stdin
as the stream.
This phase will be graded both automated and
manually. You need to test your code by running dukesh
without
a script. This will be helpful later because you will be able to test your
code more easily.
Phase 2: Shell built-ins and processes (C requirements)
For this stage, you will need to implement the following features:
- Built-ins:
cd
- change the current working directory.
echo
- print the given input to the screen. If the
input contains the escape sequence "\n"
, then replace
these two bytes with the newline character ('\n'
).
If the input contains '$'
, this is a reference to a
return value or environment variable. See below.
pwd
- print the current working directory to the
screen.
which
- print the executable source of a command. If the
command is a built-in, print cmd: dukesh built-in command
(assuming cmd
is the command name). If the command
begins with ./
(e.g., which ./bin/ls
),
check if it is executable. Otherwise, traverse through the
$PATH
for an executable file that matches the command.
- Utilities:
ls
- list the files in a requested directory. If
no directory is specified, use the current working directory. Add
support for the flags as specified in the usage()
function.
chmod
- change the permissions on a file. User,
group, and other permissions are passed as separate strings of the
form "rwx"
with -
in place of a permission
not granted. You will need to process these strings and create the
appropriate mode_t
value to pass to the chmod()
C function.
head
- print the first several lines of a file.
The default is to print 5 lines, but the -n
flag can
specify a different number. If no file is specified on the command
line, read from standard input.
cut
- tokenize each line of input text from a file
and print out a specified field. The default is to split lines based
on spaces but this can be changed with the -d
flag. The
default is to print the first field but the -f
flag
changes this. For example -f 2
for the input
"hello, world it's me"
would print world
.
Using the flags -f , -f 1
for the same input would print
hello
. If no file is specified on the command line, read
from standard input.
- Other features:
- Spawning processes - run utility programs in a separate process
using either
fork()/exec()
or
posix_spawn()
.
In the past, many students have implemented the command line parsing with
HUGE chunks of spaghetti code, all crammed into shell.c
. Don't
do this! Instead, integrate the code that you implemented in
Lab 2. That lab was specifically designed
to walk you through a more extensible approach to implementing the command
line. (Note that you can modify the Makefile
to add other object
modules to the MODS
line.)
Once you've got the FSM code integrated, much of your work will become
significantly easier. Instead of handling each built-in command separately,
you could simply include a function that checks if the command (the first
token from the buffer) is a built-in. If not, assume it's an executable and
run it with the argument list the FSM builds.
For the utility programs, you should not look for an
exact name of the program. You simply need to check if a command is a
built-in; if it is not, then assume it is the name of a program
that is either an absolute path or can be found with a $PATH
lookup. This is necessary for some of the later phases, as we will be
using your shell to run other standard utilities that are not part of your
implementation requirements.
For the echo
command, at this point, you should print the
remaining arguments with a single space between the words as shown below:
$ echo hello world
hello world
$ echo hello world how are you
hello world how are you
For the ls
command, the order that the files are retrieved
from the directory can vary based on when files were created. You must
print the files in alphabetical order. This can be a challenge with the
-a
flag, as hidden files begin with a .
but
must be ordered alphabetically. E.g., the correct order should be
a.txt
, .b.txt
, c.txt
for those
three files. Also note that .
(current directory) and
..
(parent directory) should be listed first.
Except for
repeat
, all of the built-in commands and utility
programs in this project are
real. You can use
man
to read the documentation for each
(such as
man which
to learn about the
which
command). However, the interfaces provided are deliberately different
from the real commands. To get acquainted with what each command should
do and produce as output, you need to look at the shell scripts used
in the test cases and the corresponding expected output files. Consult
the
testing documentation if you are
unfamiliar or rusty with the testing infrastructure.
Phase 3: Environment variables (B requirements)
For this stage, you will need to implement the following features:
- Built-ins:
export
- set an environment variable. You may use
the provided hash table to keep track of all environment variables.
CAUTION: You will need to
change the way that you execute programs from the partial
implementation. You will need to use execve()
or
posix_spawn()
so that you can pass environment
variables to the new process. By default, you must always pass
the $PATH
. (You can use getenv()
to
get your real $PATH
and pass it along.)
unset
- unset an environment variable. If using the
provided hash table, remove the key from the table.
echo
- extend the implementation so that environment
variables can be printed. All environment variable names must be
wrapped in curly braces as shown below.
- Utilities:
env
- set an environment variable for the execution
of one particular process. This program will create the envp
array of environment variables passed as arguments (passed as
VAR=VAL
with no space) and pass that array as the
environment variable for a new process based on the rest of the
command line (see below). The envp
array should contain
the variables specified as arguments as well as any that had been
set by the export
command above.
repeat
- print out the specified environment variables
a repeated number of times.
- Other features:
- Return codes - using
echo $?
will print the return
code from the previous utility or built-in command.
For the echo
command, $?
indicates that we are
wanting to print the return code. As an example, consider the following
sequence of commands (note that the $
at the beginning of the
line is the prompt and other lines are output produced):
$ ./bin/ls -sa data
0 empty.txt
6 FIRST.txt
5 .hidden.txt
9 pwd.txt
22 yat.txt
$ echo $?
0
$ ./bin/ls flubberdub
$ echo $?
1
Recall that environment variables get passed
to processes along with their value as a string. If you set the
$ALPHA
environment variable to have the value beta
,
then the string ALPHA=beta
gets passed in the envp
array. Or, you can retrieve the variable in the new process by calling
getenv ("ALPHA")
, which will return the string "beta"
.
The following lines illustrate the expected behavior for environment
variables in this project. The first echo
tries to look up
the $VAR
variable, which does not exist yet. Then the variable
gets set with export
and the second echo
finds
the value.
$ echo VAR=${VAR}
VAR=
$ export VAR=hello
$ echo VAR=${VAR}
VAR=hello
$ ./bin/repeat 2 VAR
VAR=hello
VAR=hello
$ ./bin/env ALPHA=beta ./bin/repeat 1 ALPHA 1 VAR
ALPHA=beta
VAR=hello
CAUTION: Your code must NOT
depend on manually setting environment variables before running
make test
. Doing so means you are using bash
's
implementation of environment variables and not your own. If you want to
make sure that your code works and is not inheriting this implementation,
you should open a new terminal window and run make clean &&
make test
.
Phase 4: Piping STDIN and STDOUT (A requirements)
- Utilities:
cat
- open a file and print out its contents.
- Other features:
- Piping commands together - use a pipe to redirect the STDOUT
from one process to become the STDIN of the next.
Consider the following command line
$ ./bin/ls data
empty.txt
FIRST.txt
pwd.txt
yat.txt
When this command line is executed, the shell will create a new process
to run the ./bin/ls
executable program. This program uses
printf()
to write its output to STDOUT
. The shell
(both the one you are implementing and the underlying Linux shell) are set
up so that STDOUT
refers to an internal buffer that is used to
write text to the screen.
In this phase, you will change this behavior. Consider the following
lines as examples.
$ ./bin/ls data | ./bin/head -n 1
empty.txt
$ ./bin/cat data/pwd.txt | ./bin/head -n 1
pwd
In each case, your shell is creating two new processes instead
of just one. One process executes ./bin/ls
while the other
process executes ./bin/head
. However, the added twist is that
the STDOUT
from ./bin/ls
becomes the STDIN
for ./bin/head
. Importantly, this data does not go to the shell
and it doesn't appear on the screen. Instead, the shell only receives (and
displays) the output from the last process on each line (./bin/head
).
The fix for this is conceptually straightforward: Use
dup2()
to redirect the first process's STDOUT
to use a pipe and another dup2()
to redirect the second
process's STDIN
to read from the pipe. In principle,
this can be done with any number of processes forming a chain with
pipes connecting their output/input.
Where this gets tricky (in a real shell) is when some of the commands
in this chain are shell built-ins rather than separate processes. When
this occurs, the shell must know to read from or write to the pipe
directly rather than relying on dup2()
. Doing so can be
particularly nasty once you start encountering exceptional cases where
some of the commands have errors or do not produce output.
You do not need to handle these complicated cases. In this project,
both of the commands will be utility programs and you do not have to
support pipes for built-in commands. As such, you will just need to set
up the pipe and use dup2()
as appropriate to redirect
standard I/O in the child processes.
Tips
The following tips may be helpful for completing this project on time.
Using the hash table
To assist with the environment variable handling, a hash table
implementation has been included in the distribution. For simplicity,
the hash table only accepts strings as keys and values. Also, the hash
table is a global variable, so you can access it from anywhere. You can
consult src/hash.h
for the variable types, but the following
code demonstrates the core functionality:
| hash_init (100); // store up to 100 items initially
hash_insert ("key", "10"); // create mapping hash[key] = 10
char *value = hash_find ("key"); // search for hash[key]
hash_remove ("key"); // removes hash[key] from the table
hash_destroy (); // remove all keys and free the resources
|
If you try to find a key
that does not exist, hash_find()
will return NULL. You need
to detect the NULL return value and make sure you do not cause a
segmentation fault. Instead, you just print the empty string.
The hash table also provides a hash_dump()
function that
you can use to print all values (for debugging purposes). You can also
loop through all of the keys as follows:
| char **keys = hash_keys (); // get all keys as a NULL-terminated array
for (size_t i = 0; keys[i] != NULL; i++) // length is unknown, so go until NULL key is found
{
char *key = keys[i]; // get the key
char *value = hash_lookup (key); // entry should be found since this is a key
printf ("hash[%s] = %s\n", key, value); // print the mapping
}
|
Managing with a partner
At each stage of this project, there are several components that can
be implemented independently. In particular, the utility programs can
be developed without having a functioning shell, and the shell built-ins
and features can be done without supporting processes. As such, there
is a natural structure for dividing the work.
As a word of caution, you should agree to a completion schedule.
The integration of these components is not trivial, particularly at the
Phase 2 implementation level. As such, you should set your own intermediate
deadlines for each level. If you plan on 10-14 days for each level, you
will have enough time to complete at least the Phase 3 implementation.