After completing this assignment, students should be able to:
Implement reverse-mode automatic differentiation for scalar-valued computation graphs.
Illustrate the advantages of rectified linear units (ReLUs) over sigmoid units in deep neural networks.
Complete the following stubbed-out ScalarFlow library so that all public methods and attributes correspond to the provided docstring comments:
You can read a nicely-formatted version of the scalarflow documentation here.
The scalarflow.py
module depends on the
networkx library for actually maintaining
the graph structure. You’ll need to install networkx
:
pip install networkx
Note that the starter code already includes functionality for building
and visualizing computation graphs. For example, the following
snippet should work without making any modifications to
scalarflow.py
:
import scalarflow as sf
= sf.Graph()
graph
with graph:
= sf.Variable(2.0, name='x')
x = sf.Variable(4.0, name='y')
y
= sf.Pow(x, 2)
x_squared = sf.Pow(y, 2)
y_squared
= sf.Add(x_squared, y_squared)
xy_sum
= sf.Pow(xy_sum, .5) # (Square root)
func
"sample.dot") graph.gen_dot(
This code creates a computation graph corresponding to the formula \(\sqrt{x^2 + y^2}\). The resulting dot file can be used to generate a nicely-formatted image representing the structure of the graph:
The provided code doesn’t provide any functionality for actually performing the calculations represented by the graph. You are free to add any additional helper methods or private instance variables you find helpful for providing the required functionality.
Use good OO programming style. For example, each operator class should be responsible for performing its own forward and backward calculations. If you find yourself writing long if/elif blocks checking node types, you are probably on the wrong track. There are also opportunities to avoid code repetition by putting common functionality into the appropriate superclasses.
Avoid the temptation to perform the forward and backward passes recursively. This can be made to work, but it is inefficient and error prone. The provided code includes functionality for obtaining a topologically sorted list of ancestors for a particular node. Both the forward and backward passes should involve iterating over this list.
The following files provide a simple machine learning library built on top of the ScalarFlow toolkit:
sf_util.py - Useful utility functions for developing ML algorithms using ScalarFlow.
sf_classifiers.py - MLP and Logistic Regression classifiers.
sf_classifier_examples.py - Demos of the classifiers on small synthetic datasets.
If scalarflow.py
is completed correctly, the demo methods in
sf_classifier_examples.py
should reliably learn the two
classification tasks.
Even without completing the classes in scalarflow.py
it is possible to initialize the models in sf_classifiers.py
and visualize the structure of the corresponding computation graphs:
We have discussed the fact that sigmoid nonlinearities can lead to vanishing gradients when used with deep neural networks (more than three layers). Rectified linear units, or ReLU’s, are widely used because they help to avoid the problem of vanishing gradients: while sigmoids have near-zero derivatives across much of their domain, ReLU’s have non-zero derivatives for all positive values.
For Part 2 of this assignment, complete the following steps:
Add a ReLU
node type to scalarflow.py
.
Update the MLP
class in sf_classifiers.py
to accept 'relu'
as an option for the activation
argument of the constructor.
Make sure to change the weight initialization code to He
initialization for relu
units.
Test your modified MLP implementation to make sure that it can reliably learn the xor task with ten hidden units using both sigmoid and relu activation functions. I found that it took some experimentation with learning rates to get both versions of the network to work well. It seems that the sigmoid activation function requires a significantly higher learning rate than relu.
Create two figures illustrating the learning curves for each classifier across 10 training runs. Each figure should show the epoch number on the x-axis and training loss on the y-axis. The first figure should include ten lines, each representing a single training run with sigmoid nonlinearities. The second figure should show the same information for the relu network. The captions should explain the data in the figures and should include the learning rate that was used.
The point of these figures is to illustrate that either activation function is effective for a three-layer network.
Create two additional figures by replicating the experiment above using networks with five hidden layers, each with 10 hidden units. The question here is whether the relu network does a better job of learning with a much deeper network. Again, the captions for the figures should explain the data and include the learning rates. (You should use the same learning rates here as in the previous experiment.)
Combine your figures into a single pdf document for submission.
The classification code provided above only works for binary classification tasks. It would be an interesting exercise to create a new classifier that is able to handle multiclass problems. This would require:
sf_classifiers.py
.The sf_util.py
file contains function stubs for softmax activation
and cross-entropy loss.
If you choose this option, you can demonstrate your implementation by applying to the Iris Data Set. This is a relatively tiny three-class classification task that should be manageable for our very slow classifiers.
This alternative Part 2 is likely to be significantly more challenging than the default option. Let me know if you choose this option so I can provide some help in the form of unit tests or driver code.
This assignment may be completed individually or in pairs. My expectation for pairs is that both members are actively involved, and take full responsibility for, all aspects of the project.
If you intend to work with a partner, you must inform me before you start working on the project.
The use of AI tools for code generation or analysis is strictly forbidden for this assignment. You must not copy the provided starter code into any AI tool and you must not use AI tools to generate the code that you submit. The only exception to this restriction is that you may use AI to help with the code for generating figures for Part 2.
You are welcome to consult AI tools to help address any conceptual questions that may arise as you work on the assignment.
Submit the following files through Gradescope:
Part 1
scalarflow.py
Part 2
Grades will be calculated according to the following distribution.
Readability/Style 10%
Your code should follow PEP8 conventions. It should be well documented and well organized.
Passes Functionality Tests 70%
Note that I will also test your code using sf_classifiers.py
.
For the provided examples in sf_classifier_examples.py
, each
epoch should require less than a second to complete.
Part 2 Submission 20%