The goal for this project is to build a classifier that can distinguish between pictures of birds and pictures of non-birds. The training and testing data for this task is adapted from CIFAR-10 and CIFAR-100. These are widely used computer vision data sets that together contain 120,000 labeled images drawn from 110 different categories.
The subset of images that we will be working with contains 10,000 labeled training images. Half of these are images of birds while the other half have been randomly selected from the remaining 109 image categories.
Here are some examples of images that contain birds:
Here are some examples of images that do not contain birds:
The data can be downloaded from the project github page. You will submit your labels through the project Kaggle page for evaluation.
For full credit you must apply at least three different learning algorithms to this problem and provide a comparison of the results. You do not need to implement all three algorithms from scratch. There are a number of mature machine learning libraries available for Python. The most popular is:
You do need to provide your own implementation of at least one learning algorithm for this problem. You are welcome to use the single-layer neural network that we worked on as an in-class exercise, or you may implement something else if you prefer.
For full credit, you must achieve a classification rate above 80% with at least one classifier.
You must submit your completed Python code along with a README that includes clear instructions for reproducing your results.
Along with your code, you must also submit a short (2-3 page) report describing your approach to the problem and your results. Your report must include results for all three algorithms. Your report will be graded on the basis of content as well as style. Your writing should be clear, concise, well-organized, and grammatically correct. Your report should include at least one figure illustrating your results.
import numpy as np train_data = np.load('data/train_data.npy') # load all the training data train_data = train_data / 255.0 # now it is in the range 0-1
State-of-the-art solutions for tasks like this are based on convolutional neural networks. The easiest library to get started with is probably keras. Keras isn't installed on the lab machines, but you should be able to install it into your account using the following commands:
pip install --user wheel pip install --user tensorflow-gpu==1.4 export LD_LIBRARY_PATH=$LD_LIBRARY_PATH":/cs/students/cs444-s18/lib/cuda/lib64/"
This installs Tensorflow, which includes Keras. The
file keras_example.py
shows an example of using Keras
to create a simple three-layer neural network.
import numpy as np import scipy.misc from matplotlib import pyplot as plt train_data = np.load('data/train_data.npy') # load all the training data img = train_data[0, :] # grab the 0th image img = img.reshape(32, 32, 3) # "unflatten" the image small_img = scipy.misc.imresize(img, (4, 4, 3)) # rescale the image