CS444: Classification Project

Classification Assignment

Project Description

The goal for this project is to build a classifier that can distinguish between pictures of birds and pictures of non-birds. The training and testing data for this task is adapted from CIFAR-10 and CIFAR-100. These are widely used computer vision data sets that together contain 120,000 labeled images drawn from 110 different categories.

The subset of images that we will be working with contains 10,000 labeled training images. Half of these are images of birds while the other half have been randomly selected from the remaining 109 image categories.

Here are some examples of images that contain birds:

Here are some examples of images that do not contain birds:

The data can be downloaded from the project Kaggle page. That page also describes the data format in more detail, and provides some sample Python code to get you started. If you prefer, here are all of the necessary files in a single .zip file: sample_solution.zip.

Requirements

For full credit you must apply at least three different learning algorithms to this problem and provide a comparison of the results. You do not need to implement all three algorithms from scratch. There are a number of mature machine learning libraries available for Python, including:

You do need to provide your own implementation of at least one learning algorithm for this problem. You are welcome to use the single-layer neural network that we worked on as an in-class exercise, or you may implement something else if you prefer.

For full credit, you must achieve a classification rate above 68%.

You must submit your completed Python code along with clear instructions for reproducing your results.

Along with your code, you must also submit a short (2-3 page) report describing your approach to the problem and your results. Your report will be graded on the basis of content as well as style. Your writing should be clear, concise, well-organized, and grammatically correct. Your report should include at least one figure illustrating your results.

Suggestions

Since you can only upload two Kaggle submissions per day, it will be critical that you use some sort of validation to tune the parameters of your algorithms. The scikit-learn library includes some tools to help with the process of building a validation set or performing cross-validation.
Performing learning directly on the 3072 dimensional image vectors will be very computationally expensive for some algorithms. It may be beneficial to perform some sort of feature extraction prior to learning. This could be something as simple as rescaling the images from 32x32 pixels (3072 dimensions) down to 4x4 pixels (48 dimensions).
Some algorithms may benefit from data augmentation. The idea behind data augmentation is to artificially increase the size of the training set by introducing modified versions of the training images. The simplest example of this would be to double the size of the training set by introducing a flipped version of each image.
State-of-the-art solutions for tasks like this are based on convolutional neural networks. If you are want to get the best possible score (and you have lots of time to kill) you might want to try this approach. Here is a nice tutorial on convolutional neural networks that also provides some sample code. The necessary libraries are installed in the Linux lab, so it should be possible to adapt this code to the bird classification task.