ScalarFlow - Automatic Differentiation in Python

Learning Objectives

After completing this assignment, students should be able to:

Implement reverse-mode automatic differentiation for scalar-valued computation graphs.
Illustrate the advantages of rectified linear units (ReLUs) over sigmoid units in deep neural networks.

Part 1: Implementation

Complete the following stubbed-out ScalarFlow library so that all public methods and attributes correspond to the provided docstring comments:

scalarflow.py

You can read a nicely-formatted version of the scalarflow documentation here.

The scalarflow.py module depends on the networkx library for actually maintaining the graph structure. You’ll need to install networkx:

pip install networkx

Note that the starter code already includes functionality for building and visualizing computation graphs. For example, the following snippet should work without making any modifications to scalarflow.py:

import scalarflow as sf

graph = sf.Graph()

with graph:
    x = sf.Variable(2.0, name='x')
    y = sf.Variable(4.0, name='y')

    x_squared = sf.Pow(x, 2)
    y_squared = sf.Pow(y, 2)

    xy_sum = sf.Add(x_squared, y_squared)

    func = sf.Pow(xy_sum, .5) # (Square root)

graph.gen_dot("sample.dot")

This code creates a computation graph corresponding to the formula $\sqrt{x^{2} + y^{2}}$ . The resulting dot file can be used to generate a nicely-formatted image representing the structure of the graph:

The provided code doesn’t provide any functionality for actually performing the calculations represented by the graph. You are free to add any additional helper methods or private instance variables you find helpful for providing the required functionality.

Implementation Tips

Use good OO programming style. For example, each operator class should be responsible for performing its own forward and backward calculations. If you find yourself writing long if/elif blocks checking node types, you are probably on the wrong track. There are also opportunities to avoid code repetition by putting common functionality into the appropriate superclasses.
Avoid the temptation to perform the forward and backward passes recursively. This can be made to work, but it is inefficient and error prone. The provided code includes functionality for obtaining a topologically sorted list of ancestors for a particular node. Both the forward and backward passes should involve iterating over this list.

Part 2: Machine Learning with ScalarFlow

The following files provide a simple machine learning library built on top of the ScalarFlow toolkit:

sf_util.py - Useful utility functions for developing ML algorithms using ScalarFlow.
sf_classifiers.py - MLP and Logistic Regression classifiers.
sf_classifier_examples.py - Demos of the classifiers on small synthetic datasets.

If scalarflow.py is completed correctly, the demo methods in sf_classifier_examples.py should reliably learn the two classification tasks.

Even without completing the classes in scalarflow.py it is possible to initialize the models in sf_classifiers.py and visualize the structure of the corresponding computation graphs:

Logistic regression loss function with two input features

Three-layer neural network loss function with two input units and ten hidden units

Exploring ReLU’s

We have discussed the fact that sigmoid nonlinearities can lead to vanishing gradients when used with deep neural networks (more than three layers). Rectified linear units, or ReLU’s, are widely used because they help to avoid the problem of vanishing gradients: while sigmoids have near-zero derivatives across much of their domain, ReLU’s have non-zero derivatives for all positive values.

Logistic (sigmoid) function and its derivative

For Part 2 of this assignment, complete the following steps:

Add a ReLU node type to scalarflow.py.

Update the MLP class in sf_classifiers.py to accept 'relu' as an option for the activation argument of the constructor. Make sure to change the weight initialization code to He initialization for relu units.
Test your modified MLP implementation to make sure that it can reliably learn the xor task with ten hidden units using both sigmoid and relu activation functions. I found that it took some experimentation with learning rates to get both versions of the network to work well. It seems that the sigmoid activation function requires a significantly higher learning rate than relu.
Create two figures illustrating the learning curves for each classifier across 10 training runs. Each figure should show the epoch number on the x-axis and training loss on the y-axis. The first figure should include ten lines, each representing a single training run with sigmoid nonlinearities. The second figure should show the same information for the relu network. The captions should explain the data in the figures and should include the learning rate that was used.

The point of these figures is to illustrate that either activation function is effective for a three-layer network.
Create two additional figures by replicating the experiment above using networks with five hidden layers, each with 10 hidden units. The question here is whether the relu network does a better job of learning with a much deeper network. Again, the captions for the figures should explain the data and include the learning rates. (You should use the same learning rates here as in the previous experiment.)
Combine your figures into a single pdf document for submission.

Part 2 (Alternative): Multiclass Classification

The classification code provided above only works for binary classification tasks. It would be an interesting exercise to create a new classifier that is able to handle multiclass problems. This would require:

Implementing a softmax activation function.
Implementing the multi-class version of cross-entropy loss.
Writing code to convert class labels to one-hot encoded target values.
Appropriately modifying, or replacing, the classifier classes from sf_classifiers.py.

The sf_util.py file contains function stubs for softmax activation and cross-entropy loss.

If you choose this option, you can demonstrate your implementation by applying to the Iris Data Set. This is a relatively tiny three-class classification task that should be manageable for our very slow classifiers.

This alternative Part 2 is likely to be significantly more challenging than the default option. Let me know if you choose this option so I can provide some help in the form of unit tests or driver code.

Partners

This assignment may be completed individually or in pairs. My expectation for pairs is that both members are actively involved, and take full responsibility for, all aspects of the project.

If you intend to work with a partner, you must inform me before you start working on the project.

AI Usage

The use of AI tools for code generation or analysis is strictly forbidden for this assignment. You must not copy the provided starter code into any AI tool and you must not use AI tools to generate the code that you submit. The only exception to this restriction is that you may use AI to help with the code for generating figures for Part 2.

You are welcome to consult AI tools to help address any conceptual questions that may arise as you work on the assignment.

Submission

Submit the following files through Gradescope:

Part 1

scalarflow.py

Part 2

All modified Python files.
a Python file containing the code used to generate your experimental results.
A pdf document illustrating your experimental results.

Grading

Grades will be calculated according to the following distribution.

Readability/Style 10%

Your code should follow PEP8 conventions. It should be well documented and well organized.
Passes Functionality Tests 70%
- scalarflow_tests.py
- Note that I will also test your code using sf_classifiers.py. For the provided examples in sf_classifier_examples.py, each epoch should require less than a second to complete.
Part 2 Submission 20%