Decision Tree Programming Assignment

Learning Objectives

After completing this assignment, students should be able to:

Part 1: Implementation

Complete the following stubbed-out decision tree classifier so that all public methods and attributes correspond to the provided docstring comments:

This module contains a method for drawing 2D classification trees:

You may find it useful for confirming that your finished classifier is working correctly.

Part 2: Analysis

For this part of the assignment you will apply your decision tree to the problem of determining a respondent’s age based on their answers to an online quiz. The quiz consists of 30 Yes/No questions like “Have you ever broken a bone?”, “Have you ever been on the radio or television?”, etc.

This may seem like a trivial exercise, but keep in mind that companies spend a great deal of time and money attempting to infer demographic information based on seemingly innocuous data revealed online.

For this part of the assignment you should submit a Jupyter notebook that satisfies the following requirements:

Each step of the notebook must be accompanied by text explaining the point of the provided Python code and discussing the results.

Data Set

The following files contain the training and test data you must use for your analysis:

The class labels are integers in the range 0-3 where 0: 0-18 years, 1: 19-24, 2: 25-34, 3: 35+. The attributes are 0 or 1 where 1 corresponds to an answer of “Yes” and 0 corresponds to an answer of “No”. The thirty questions are:

Have you ever…

broken a bone had your cell phone ring at an embarrassing moment
been on the radio or television gotten stitches
gotten a speeding ticket paid for a stranger’s meal
gone skydiving had a tooth pulled out
been to Europe held a butterfly in your hand
driven a boat been pooped on by a bird
been on water skis been stung by a jellyfish
been on a motorcycle had braces
been horseback riding had contacts or glasses
ridden on a bull been in a major natural disaster
ridden on a sheep fallen down the stairs
stayed awake for an entire night faked being sick
been alpine skiing sung in the shower
ran a marathon gotten locked out of your car or house
regifted a gift been shown on a JumboTron at a sporting event

This data is a processed version of the Have You Ever data set collected by Benjamin Soyka and distributed under the CC BY-SA 4.0 license.

Partners

This assignment may be completed individually or in pairs. My expectation for pairs is that both members are actively involved, and take full responsibility for, all aspects of the project. In other words, I expect that you are sitting down together to work, not that you are splitting up tasks to be completed separately.

If you intend to work with a partner, you must inform me no later than Tuesday 9/14.

Grading

Grades will be calculated according to the following distribution.