Enter the names of all group members in the cell below:

YOUR ANSWER HERE

# PyTorch Fundamentals

See also: https://pytorch.org/tutorials/beginner/basics/autogradqs_tutorial.html

Learning Objectives:

* Gain experience with low-level PyTorch operations


## Calculating Gradients

In a previous exercise, we practiced calculating partial derivatives on the following example:

$$f(x,y) = \sqrt{x^2 + y^2}$$

$$\frac{\partial f}{\partial x} = \frac{x}{\sqrt{x^2 + y^2}}$$

$$\frac{\partial f}{\partial y} = \frac{y}{\sqrt{x^2 + y^2}}$$

### Question
Take a second to calcuate the following by hand:

*   $\displaystyle f(3, 4) = ??$
    

*   $\displaystyle \frac{\partial f(3, 4)}{\partial x} = ??$
    
   
*   $\displaystyle \frac{\partial f(3, 4)}{\partial y} = ??$
   


YOUR ANSWER HERE

At its core, PyTorch is a library for representing mathematical operations as graphical structures and automating the process of computing partial derivatives.  We can use PyTorch to write numpy-style mathematical operations:


In [None]:
import torch
import numpy as np

def f(x, y):
    return torch.sqrt(x**2 + y**2)
    
x = torch.tensor(3.0)
y = torch.tensor(4.0)

print(f(x, y))

More interestingly, we can specify that PyTorch should track the partial derivatives of our calculations with respect to some tensors:

In [None]:
x = torch.tensor(3.0, requires_grad=True)
y = torch.tensor(4.0, requires_grad=True)

fxy = f(x, y) # Evaluate the function.

fxy.backward() # Calculate the partial derivatives.

print("f(3, 4) = {:.5}".format(fxy))
print("df(3, 4)/dx = {:.5}".format(x.grad))
print("df(3, 4)/dy = {:.5}".format(y.grad))

Once we have the partial derivatives, we can minimize our function using gradient descent.  Use the cell below to find the x and y that minimize $f(x,y) = \sqrt{x^2 + y^2}$.  Adjust the learning rate and the number of iterations (*not* the starting x and y values) until the code converges to something close to the minimum value for the function.

In [None]:
learning_rate = .01  # ADJUST THIS!
iterations = 10      # AND/OR THIS!

x = torch.tensor(3.0, requires_grad=True)
y = torch.tensor(4.0, requires_grad=True)

for iteration in range(iterations):
    fxy = f(x, y)
    
    print('current "loss": {}'.format(fxy))
    
    # One step of gradient descent...
    fxy.backward()
    x.data = x.data - learning_rate * x.grad
    y.data = y.data - learning_rate * y.grad
    
    # By default, the gradients will continue to accumulate.
    # We need to zero it out each iteration to get a fresh result.
    x.grad.zero_()
    y.grad.zero_()


print("\nx: {}".format(x))
print("y: {}".format(y))
    

### Questions:

* What learning rate and iteration count did you settle on? 
* Where does this function have its minimum? (Note that this is a case where we don't *need* to use gradient descent to find the solution. You should be able to determine the minimum value without executing the code above.)

YOUR ANSWER HERE

## Exercise

Use gradient descent to minimize the following function:

$$f(x, y) = \sin(5x + 3) + x^2 + \cos(2y + 1) +y^2$$

It looks like this:


In [None]:
import matplotlib.pyplot as plt
%matplotlib notebook
x = np.linspace(-2, 2, 100)
y = np.linspace(-2, 2, 100)
x, y = np.meshgrid(x, y)
z = np.sin(5 * x + 3) + x**2 + np.cos(2 * y + 1) + y**2
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
surf = ax.plot_surface(x, y, z, cmap=plt.cm.coolwarm,
                       linewidth=0, antialiased=False)
ax.set_xlabel('x')
ax.set_ylabel('y')
plt.show()


In [None]:
# YOUR CODE HERE
raise NotImplementedError()

In [None]:
from numpy.testing import assert_almost_equal
assert_almost_equal(x.data, .317, decimal=3)
assert_almost_equal(y.data, .690, decimal=3)

## Bonus Material: `torch.utils.data.DataLoader`

See also: https://pytorch.org/tutorials/beginner/basics/data_tutorial.html

In machine learning it is often the case that training data is too large to fit in memory on a single machine.  We may also want to perform some pre-processing on the data as it is loaded.  The `torch.utils.data.DataLoader` class provides a standard interface of feeding data to a machine learning model.  `DataLoader` objects act as Python generators. 

We can create a DataLoader from a numpy array using the `TensorDataset` class:


In [None]:
from torch.utils.data import TensorDataset, DataLoader

#Generate 6 random two-dimensional elements as column vectors:

features = np.round(np.random.random((6, 2, 1)), 2)
print("Numpy array of data:\n")
print(features)
  
# Build a dataset:

dataset = TensorDataset(torch.tensor(features))
loader = DataLoader(dataset)

# iterate over the elements in the dataset:

print("\nIterate over the corresponding Dataset:\n")
for element in dataset:
    print(element)

## Batches

It is usually more efficent to process data in *batches* than individually. Here is an example of PyTorch code that multiplies each element in our data set by an appropriately sized weight vector and sums the result.  In this example each element is processed individually.

In [None]:
total = torch.tensor([[0.0]])
weights = torch.tensor(np.random.random((2,1)))

for element in dataset:
    total = total + weights.T @ element[0]
    print("Total so far: {}".format(total))

print("\nFinal Total: {}".format(total))

Instead of processing one data element per iteration, we can batch the dataset and process multiple elements per iteration.  Many PyTorch operators, including the matrix multiplcation operator, are "batch-aware" and will recognize that the first dimension corresponds to the batch.  Let's look at a batched version of our dataset:

In [None]:
loader_batched = DataLoader(dataset, batch_size=3)
for batch in loader_batched:
    print("Shape: {}\n".format(batch[0].shape))
    print("Elements:\n {}\n".format(batch[0]))

In [None]:
total = torch.tensor([[0.0]])

for batch in loader_batched:
    batch_of_products = weights.T @ batch[0]
    total = total + torch.sum(batch_of_products)
    print("Total so far: {}".format(total))

print("\nFinal Total: {}".format(total))