After completing this assignment, students should be able to:
Complete the following stubbed-out decision tree classifier so that all public methods and attributes correspond to the provided docstring comments:
This module contains a method for drawing 2D classification trees:
You may find it useful for confirming that your finished classifier is working correctly.
For this part of the assignment you will apply your decision tree to the problem of determining a respondent’s age based on their answers to an online quiz. The quiz consists of 30 Yes/No questions like “Have you ever broken a bone?”, “Have you ever been on the radio or television?”, etc.
This may seem like a trivial exercise, but keep in mind that companies spend a great deal of time and money attempting to infer demographic information based on seemingly innocuous data revealed online.
For this part of the assignment you should submit a Jupyter notebook that satisfies the following requirements:
Each step of the notebook must be accompanied by text explaining the point of the provided Python code and discussing the results.
The following files contain the training and test data you must use for your analysis:
The class labels are integers in the range 0-3 where 0: 0-18 years, 1: 19-24, 2: 25-34, 3: 35+. The attributes are 0 or 1 where 1 corresponds to an answer of “Yes” and 0 corresponds to an answer of “No”. The thirty questions are:
Have you ever…
broken a bone | had your cell phone ring at an embarrassing moment |
---|---|
been on the radio or television | gotten stitches |
gotten a speeding ticket | paid for a stranger’s meal |
gone skydiving | had a tooth pulled out |
been to Europe | held a butterfly in your hand |
driven a boat | been pooped on by a bird |
been on water skis | been stung by a jellyfish |
been on a motorcycle | had braces |
been horseback riding | had contacts or glasses |
ridden on a bull | been in a major natural disaster |
ridden on a sheep | fallen down the stairs |
stayed awake for an entire night | faked being sick |
been alpine skiing | sung in the shower |
ran a marathon | gotten locked out of your car or house |
regifted a gift | been shown on a JumboTron at a sporting event |
This data is a processed version of the Have You Ever data set collected by Benjamin Soyka and distributed under the CC BY-SA 4.0 license.
This assignment may be completed individually or in pairs. My expectation for pairs is that both members are actively involved, and take full responsibility for, all aspects of the project. In other words, I expect that you are sitting down together to work, not that you are splitting up tasks to be completed separately.
If you intend to work with a partner, you must inform me no later than Tuesday 9/14.
Grades will be calculated according to the following distribution.
Readability/Style 10%
Your code should follow PEP8 conventions. It should be well documented and well organized.
Efficiency 10%
Our main concern with this assignment is clarity and correctness. That said, your implementation must be efficient enough execute the provided testing code in no more than a second or two. This means you should avoid Python loops where possible.
Part 2 Submission 20%
Details TBA.