Project 1: Building a shell
In this project, you will build a command-line shell that operates
similarly to the bash
shell.
Set up a repository on stu.cs.jmu.edu
based on the instructions in
the CS 361 Submission Procedures,
using the submit directory p1-sh.git
and the file name
p1-sh.tar.gz
.
Background information
At each implementation level, you will implement three different kinds of requirements:
- Utility programs
- These are stand-alone programs that are written and compiled
independently of the shell. For example, consider the
ls
program that lists files in a directory or thegcc
compiler. These programs are not part of the shell. Instead, the shell will try to locate the executable program file in a directory based on the$PATH
environment variable. If the program is found, the shell usesfork()
andexeclp()
(or better yet,posix_spawn()
) to run the program in a separate process. - Shell built-ins
- These are commands that look like distinct programs, but
they actually are not. Instead, the shell recognizes a small number
of commands and calls an internal function rather than creating a
new process. For instance, consider the
cd
command that can be used to change to a different directory. Since the notion of "current working directory" only exists within the context of the shell, this command is a built-in that just changes an internal variable in the shell process. - Additional shell features
- In addition to running programs, shells provide a number of features
that create an interactive experience. For instance, when a process
exits (i.e., the code does
return 0;
frommain()
or callsexit (0);
), the user might wish to know this return value. As an example, if you try to run thels
program on a file, the return code 0 would indicate the file exists and the listing was successful, while 1 could indicate the file wasn't found or permission was denied. Other features include the ability to pipe output streams to other processes or using environment variables to customize the user's experience.
Project directory structure
You will mainly be working with files in the p1-sh/src
directory. These source code files will be compiled as object files into
p1-sh/build
, then linked into the dukesh
executable.
In addition, you will be implementing the utility programs in the
p1-sh/utils
directory. There is a Makefile
in
that directory, so you can run make
there to build these
programs. The compiled executables will be in p1-sh/bin
.
(Note that running make
in the main p1-sh
directory will also compile these programs.)
Within the p1-sh/tests
directory structure, there are
symbolic links to p1-sh/bin
and p1-sh/data
.
These links help to ensure that the testing infrastructure (which runs
with p1-sh/tests
as the current working directory) can
access these files appropriately. That is, once you have implemented
the ls
utility program, running ./bin/ls
from
either p1-sh
or p1-sh/tests
will run the same
program.
The utility programs do not have their own test cases provided. Instead, they are tested indirectly when the shell test cases try to run them.
Lastly, the p1-sh/tests/scripts
directory contain a
number of text files that are shell scripts. A shell script is
a file that contains a list of commands, with each line corresponding to
a single command that could be typed at the shell prompt. These shell
scripts can be passed to dukesh
using the -b
flag (for "batch processing") to launch the shell, run a few programs,
then shut down.
Implementation Requirements
Your first task is to build a minimally functional shell that can run external programs and built-in commands, as well as providing access to return codes. From there, you will incrementally add features to control environment variables, pipe data between processes, and redirect output to persistent files.
Phase 1: Shell loop (partial credit)
For this stage, you will need to implement the following features:
- Interactive shell and batch processing:
- Shell prompt - print the
"$ "
string (no trailing newline) and read a command from STDIN. - Shell scripts - if
dukesh
is run with-b
, read the commands from the file instead of STDIN.
- Shell prompt - print the
- Built-ins:
echo
- print the given input to the screen exactly as entered. Escape sequences and environment variables are ignored.quit
- exit the shell.
For both the prompt and scripts, you should use fgets()
to
read a line of input. The difference is that the script will use
fopen()
to open a file as a stream while the interactive prompt
uses stdin
as the stream.
This phase will be graded both automated and
manually. You need to test your code by running dukesh
without
a script. This will be helpful later because you will be able to test your
code more easily.
Phase 2: Shell built-ins and processes (C requirements)
For this stage, you will need to implement the following features:
- Built-ins:
cd
- change the current working directory.echo
- print the given input to the screen. If the input contains the escape sequence"\n"
, then replace these two bytes with the newline character ('\n'
). If the input contains'$'
, this is a reference to a return value or environment variable. See below.pwd
- print the current working directory to the screen.which
- print the executable source of a command. If the command is a built-in, printcmd: dukesh built-in command
(assumingcmd
is the command name). If the command begins with./
(e.g.,which ./bin/ls
), check if it is executable. Otherwise, traverse through the$PATH
for an executable file that matches the command.
- Utilities:
ls
- list the files in a requested directory. If no directory is specified, use the current working directory. Add support for the flags as specified in theusage()
function.chmod
- change the permissions on a file. User, group, and other permissions are passed as separate strings of the form"rwx"
with-
in place of a permission not granted. You will need to process these strings and create the appropriatemode_t
value to pass to thechmod()
C function.head
- print the first several lines of a file. The default is to print 5 lines, but the-n
flag can specify a different number. If no file is specified on the command line, read from standard input.cut
- tokenize each line of input text from a file and print out a specified field. The default is to split lines based on spaces but this can be changed with the-d
flag. The default is to print the first field but the-f
flag changes this. For example-f 2
for the input"hello, world it's me"
would printworld
. Using the flags-d , -f 1
for the same input would printhello
. If no file is specified on the command line, read from standard input. For simplicity, you may assume there are no repeated delimiters (e.g.,"a,,,,b,,,c"
) and none at the beginning or end of the line (e.g.," a b "
).
- Other features:
- Spawning processes - run utility programs in a separate process
using either
fork()/exec()
orposix_spawn()
.
- Spawning processes - run utility programs in a separate process
using either
In the past, many students have implemented the command line parsing with
HUGE chunks of spaghetti code, all crammed into shell.c
. Don't
do this! Instead, integrate the code that you implemented in
Lab 2. That lab was specifically designed
to walk you through a more extensible approach to implementing the command
line. (Note that you can modify the Makefile
to add other object
modules to the MODS
line.)
Once you've got the FSM code integrated, much of your work will become significantly easier. Instead of handling each built-in command separately, you could simply include a function that checks if the command (the first token from the buffer) is a built-in. If not, assume it's an executable and run it with the argument list the FSM builds.
$PATH
lookup. This is necessary for some of the later phases, as we will be
using your shell to run other standard utilities that are not part of your
implementation requirements.
For the echo
command, at this point, you should print the
remaining arguments with a single space between the words as shown below:
$ echo hello world hello world $ echo hello world how are you hello world how are you
ls
command, the order that the files are retrieved
from the directory can vary based on when files were created. You must
print the files in alphabetical order. This can be a challenge with the
-a
flag, as hidden files begin with a .
but
must be ordered alphabetically. E.g., the correct order should be
a.txt
, .b.txt
, c.txt
for those
three files. Also note that .
(current directory) and
..
(parent directory) should be listed first.
repeat
, all of the built-in commands and utility
programs in this project are
real. You can use man
to read the documentation for each
(such as man which
to learn about the which
command). However, the interfaces provided are deliberately different
from the real commands. To get acquainted with what each command should
do and produce as output, you need to look at the shell scripts used
in the test cases and the corresponding expected output files. Consult
the testing documentation if you are
unfamiliar or rusty with the testing infrastructure.
Phase 3: Environment variables (B requirements)
For this stage, you will need to implement the following features:
- Built-ins:
export
- set an environment variable. You may use the provided hash table to keep track of all environment variables. CAUTION: You will need to change the way that you execute programs from the partial implementation. You will need to useexecve()
orposix_spawn()
so that you can pass environment variables to the new process. By default, you must always pass the$PATH
. (You can usegetenv()
to get your real$PATH
and pass it along.)unset
- unset an environment variable. If using the provided hash table, remove the key from the table.echo
- extend the implementation so that environment variables can be printed. All environment variable names must be wrapped in curly braces as shown below.
- Utilities:
env
- set an environment variable for the execution of one particular process. This program will create theenvp
array of environment variables passed as arguments (passed asVAR=VAL
with no space) and pass that array as the environment variable for a new process based on the rest of the command line (see below). Theenvp
array should contain the variables specified as arguments as well as any that had been set by theexport
command above.repeat
- print out the specified environment variables a repeated number of times.
- Other features:
- Return codes - using
echo $?
will print the return code from the previous utility or built-in command.
- Return codes - using
For the echo
command, $?
indicates that we are
wanting to print the return code. As an example, consider the following
sequence of commands (note that the $
at the beginning of the
line is the prompt and other lines are output produced):
$ ./bin/ls -sa data 0 empty.txt 6 FIRST.txt 5 .hidden.txt 9 pwd.txt 22 yat.txt $ echo $? 0 $ ./bin/ls flubberdub $ echo $? 1
Recall that environment variables get passed
to processes along with their value as a string. If you set the
$ALPHA
environment variable to have the value beta
,
then the string ALPHA=beta
gets passed in the envp
array. Or, you can retrieve the variable in the new process by calling
getenv ("ALPHA")
, which will return the string "beta"
.
The following lines illustrate the expected behavior for environment
variables in this project. The first echo
tries to look up
the $VAR
variable, which does not exist yet. Then the variable
gets set with export
and the second echo
finds
the value.
$ echo VAR=${VAR} VAR= $ export VAR=hello $ echo VAR=${VAR} VAR=hello $ ./bin/repeat 2 VAR VAR=hello VAR=hello $ ./bin/env ALPHA=beta ./bin/repeat 1 ALPHA 1 VAR ALPHA=beta VAR=hello
CAUTION: Your code must NOT
depend on manually setting environment variables before running
make test
. Doing so means you are using bash
's
implementation of environment variables and not your own. If you want to
make sure that your code works and is not inheriting this implementation,
you should open a new terminal window and run make clean &&
make test
.
Phase 4: Piping STDIN and STDOUT (A requirements)
- Utilities:
cat
- open a file and print out its contents.
- Other features:
- Piping commands together - use a pipe to redirect the STDOUT from one process to become the STDIN of the next.
Consider the following command line
$ ./bin/ls data empty.txt FIRST.txt pwd.txt yat.txt
When this command line is executed, the shell will create a new process
to run the ./bin/ls
executable program. This program uses
printf()
to write its output to STDOUT
. The shell
(both the one you are implementing and the underlying Linux shell) are set
up so that STDOUT
refers to an internal buffer that is used to
write text to the screen.
In this phase, you will change this behavior. Consider the following lines as examples.
$ ./bin/ls data | ./bin/head -n 1 empty.txt $ ./bin/cat data/pwd.txt | ./bin/head -n 1 pwd
In each case, your shell is creating two new processes instead
of just one. One process executes ./bin/ls
while the other
process executes ./bin/head
. However, the added twist is that
the STDOUT
from ./bin/ls
becomes the STDIN
for ./bin/head
. Importantly, this data does not go to the shell
and it doesn't appear on the screen. Instead, the shell only receives (and
displays) the output from the last process on each line (./bin/head
).
The fix for this is conceptually straightforward: Use
dup2()
to redirect the first process's STDOUT
to use a pipe and another dup2()
to redirect the second
process's STDIN
to read from the pipe. In principle,
this can be done with any number of processes forming a chain with
pipes connecting their output/input.
Where this gets tricky (in a real shell) is when some of the commands
in this chain are shell built-ins rather than separate processes. When
this occurs, the shell must know to read from or write to the pipe
directly rather than relying on dup2()
. Doing so can be
particularly nasty once you start encountering exceptional cases where
some of the commands have errors or do not produce output.
You do not need to handle these complicated cases. In this project,
both of the commands will be utility programs and you do not have to
support pipes for built-in commands. As such, you will just need to set
up the pipe and use dup2()
as appropriate to redirect
standard I/O in the child processes.
Tips
The following tips may be helpful for completing this project on time.
Using the hash table
To assist with the environment variable handling, a hash table
implementation has been included in the distribution. For simplicity,
the hash table only accepts strings as keys and values. Also, the hash
table is a global variable, so you can access it from anywhere. You can
consult src/hash.h
for the variable types, but the following
code demonstrates the core functionality:
1 2 3 4 5 |
|
If you try to find a key
that does not exist, hash_find()
will return NULL. You need
to detect the NULL return value and make sure you do not cause a
segmentation fault. Instead, you just print the empty string.
The hash table also provides a hash_dump()
function that
you can use to print all values (for debugging purposes). You can also
loop through all of the keys as follows:
1 2 3 4 5 6 7 |
|
Managing with a partner
At each stage of this project, there are several components that can be implemented independently. In particular, the utility programs can be developed without having a functioning shell, and the shell built-ins and features can be done without supporting processes. As such, there is a natural structure for dividing the work.
As a word of caution, you should agree to a completion schedule. The integration of these components is not trivial, particularly at the Phase 2 implementation level. As such, you should set your own intermediate deadlines for each level. If you plan on 10-14 days for each level, you will have enough time to complete at least the Phase 3 implementation.