Recurrent Neural Networks In Keras

Learning Objectives:

  • Improve understanding of the basic structure of recurrent neural networks.
  • Gain experience working with simple recurrent neural networks in Keras.

See Also:

In this activity we will consider the problem of making real-valued predictions from one-dimensional sequence data. For example, we might want to make a series of temperature readings every hour and then predict the temperature for the next hour from the previous 24. The cell below contains a Python function for creating a synthetic data set of numerical sequences.

(If you are running this notebook on Google Colab, make sure to select GPU acceleration under Edit->Notebook Settings.)

In [1]:
%load_ext tensorboard

import os
from datetime import datetime
import matplotlib.pyplot as plt
import numpy as np

import tensorflow as tf
from tensorflow.keras import layers

def sin_dataset(num, length, noise =.005):
    """ Return a sequnce dataset composed of many short snippets of noisy
        sin functions.  The snippets all have randomly generated periods 
        and offsets.

        Arguments:
           num - number of sequences to return
           length - length of the sequences
           noise - std of normal noise added to each point

        Returns: x, y 
                Where x is a numpy array with shape 
                (num, lengh, 1), and y is an array with shape (length,)
                The values in y are the (noisy) next values for the 
                corresponding sequences in x
    """

    x = np.zeros((num, length, 1))
    y = np.zeros(num)
    for i in range(num):
        start = np.random.random() * 2 * np.pi
        dur = np.random.random() * np.pi * 6 + np.pi
        offset = np.random.random() * 10 - 5
        scale = .1
        data = np.sin(np.linspace(start, start + dur, length + 1))
        data = (data + offset) * scale + np.random.randn(*data.shape) * noise
        x[i, :, 0] = data[:-1]
        y[i] = data[-1]
    return x, y

Training Set

Execute the cell below to generate a training data set and visualize some of the input sequences.

In [2]:
seq_length = 30
num_seqs = 200000
    
sequences, targets = sin_dataset(num_seqs, seq_length)
print(targets.shape)

plt.plot(sequences[0:10, : ,0].T, '.-')

plt.show()
(200000,)

Exploring Network Structure

The cell below constructs an extremely simple RNN for the prediction task described above. As you will see when you execute the cell, this network has exactly 11 weights.

Questions

  • On a separate sheet of paper, draw the structure of the network created in the cell below. Every weight in the network should have a corresponding arrow in your diagram. There should be exactly 11 arrows.
  • Also on a separate sheet of paper, draw the "unrolled" version of this network for three time steps. If drawn correctly, there should be exactly 33 arrows in your diagram (with four arrows pointing to the nonexistent hidden units from the fourth time step.)
In [3]:
model = tf.keras.Sequential()

# This is a simple RNN with two hidden units.  
# The input shape is (None, 1), indicating that the sequence length could be
# anything and the dimensionality of the input is 1.
model.add(layers.SimpleRNN(2, input_shape=(None, 1)))
model.add(layers.Dense(1))

model.summary() 
Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
simple_rnn (SimpleRNN)       (None, 2)                 8         
_________________________________________________________________
dense (Dense)                (None, 1)                 3         
=================================================================
Total params: 11
Trainable params: 11
Non-trainable params: 0
_________________________________________________________________
2021-11-11 11:04:21.957802: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-11-11 11:04:22.002407: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcusolver.so.11'; dlerror: libcusolver.so.11: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda-10.1/extras/CUPTI/lib64:/usr/local/cuda-10.1/lib64:
2021-11-11 11:04:22.010691: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1835] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...

Training

The cell below contains standard code for fitting a keras model. Go ahead and train the model now...

In [ ]:
model.compile(optimizer='adam', loss='mse')
logdir = os.path.join("logs", datetime.now().strftime("%Y%m%d-%H%M%S"))
tensorboard_callback = tf.keras.callbacks.TensorBoard(log_dir=logdir)

model.fit(sequences, targets, epochs=30, batch_size=32, shuffle=True, 
          validation_split=.2, callbacks=[tensorboard_callback])

TensorBoard

The cell below will start up tensorboard and allow you to examine the loss values. If you train multiple times you can refresh the page using the circular arrow on the upper-right.

In [ ]:
%tensorboard --logdir logs

Examining The Results

The cell below will allow us to visulize the predictions made by our model. First, we predict the single next value in the sequence, then we append that value to our sequence and use the resulting sequence to predict the next value. This way we can make predictions arbitrarily far into the future. Unfortunately, our predictions are likely to get farther and farther off as small initial errors compound. You can re-run the cell multiple times to see what happens with different initialial sequences.

In [ ]:
num_predict = 50

test_input, test_y = sin_dataset(1, seq_length + num_predict)
cur_input = test_input[:, 0:seq_length, :]
predictions = []
indices = []
for i in range(num_predict):
    indices.append(i + seq_length)
    y = model.predict(cur_input)[0][0]
    cur_input = np.append(cur_input[:, 1::, :],
                          np.reshape(y, (1, 1, 1)), axis=1) 
    predictions.append(y)

plt.plot(test_input[0, :, 0], '.-')
plt.plot(indices, predictions, '.-')
    
plt.show()

Improving the Model

There are many things we could try that might improve the performance of our current model:

  • Add more training data
  • Add more hidden units to our RNN
  • Try fancier RNN units like LSTM or GRU
  • Add more RNN layers

Note that this code isn't the correct way to add more layers to our current model:

model = tf.keras.Sequential()
model.add(layers.SimpleRNN(2, input_shape=(None, 1)))
model.add(layers.SimpleRNN(2)) # Doesn't work!
model.add(layers.Dense(1))

The problem is that, by default, the units in our RNN layers don't produce output on every time step. They only produce output once at the end of the input sequence. This means, in the code above, there is no output from the first layer available to the second layer. This can be fixed using the return_sequences argument:

model = tf.keras.Sequential()
model.add(layers.SimpleRNN(2, input_shape=(None, 1), return_sequences=True))
model.add(layers.SimpleRNN(2)) # OK!
model.add(layers.Dense(1))

Activity

Experiment with improving performance on the description task above. How low can you get the validation error? Can you significantly improve the quality of the predictions over multiple time steps?

In [ ]: