Oct 15: Parsing, Analyzing Text
Learning Objectives
After today's class, you should be able to:
- Loop through a string, one character at a time, and build a new string.
- Apply string methods (
isalpha,isspace,split, andsplitlines). - Break down a problem by writing and calling multiple helper functions.
Announcements¶
- Read: Week 9 (due Oct 21)
- Code: Project 1 (due Oct 21)
- Mid-semester grades in MyMadison
- Withdraw deadline is Tue 10/21
- VS Code workshop – 10/16 at 7pm
Before Class
Complete steps 3–6 of the Visual Studio Code Setup on your lab machine. That way, you can use VS Code both on your laptop and in the computer lab.
Haiku Lab¶
Complete the following steps in order:
[10 min]Background, Getting Started[10 min]Implement remove_symbols[15 min]Implement check_syllables[15 min]Implement validate_haiku
Background¶
Haiku is a form of Japanese poetry. A haiku consists of 3 lines of text. The first line must have 5 syllables. The second line must have 7 syllables. And the third line must, again, have 5 syllables.
Example haiku
An old silent pond
A frog jumps into the pond—
Splash! Silence again.
by Matsuo Basho
Additional haiku examples can be found online. Not all English translations follow the 5-7-5 rule.
Getting Started¶
Create a folder for today's lab. Download the following files:
Install pysyllables via the terminal: pip install pysyllables.
Run pytest to make sure everything is set up right.
remove_symbols¶
Later in the lab, you will use the get_syllable_count() function in the pysyllables module.
This function will tell you how many syllables an English word has.
However, a word may not include any punctuation or other non-alphabetic characters.
For example:
get_syllable_count("pond—")returns Noneget_syllable_count("pond")returns 1get_syllable_count("again.")returns Noneget_syllable_count("again")returns 2
The remove_symbols() function should remove all characters from a string except for letters and spaces.
Start by creating an empty result string.
Then loop through the original line, one character at a time.
If the character is alphabetic or space, append the character to the result.
(Use the isalpha() and isspace() string methods.)
Then return the result.
check_syllables¶
This helper function will be called from validate_haiku() three times.
The first parameter (str) represents one line of the haiku.
The second parameter (int) is the required number of syllabus.
The return value is:
"too short"if the number of syllables is less than the number required"valid"if the number of syllables is equal to the number required"too long"if the number of syllables is greater than the number required
Hint: Start by calling remove_symbols() on the line.
Then use the split() method to split the string into a list of words.
Use a loop to add the syllable count of each word.
Then decide which result string to return.
validate_haiku¶
This is the main function of the lab.
The parameter is a multiline string representing the haiku.
If the string is a valid haiku (based on the definition above), the function should return "Valid haiku." (with a period at the end).
If the string has more or less than three lines, the function should return "Invalid haiku, not 3 lines.".
If the string has three lines but is not a valid haiku, the function should return a detailed message about the invalid line(s).
- The format should be
"Line X too Y. ", whereXis the line number, andYis eitherlongorshort. - Each line should end with a single space after the period. Ex:
"Line 1 too short. Line 3 too long. "
Hint: Use the splitlines() method to split the string into a list of lines.
Then call the check_syllables() function on each line and store the return value in a variable.
Use these variable to build the detailed return message.