Oct 28: Parsing, Analyzing Text
Learning Objectives
After today's class, you should be able to:
- Loop through a string, one character at a time, and build a new string.
- Apply string methods (
isalpha
,isspace
,split
, andsplitlines
). - Break down a problem by writing and calling multiple helper functions.
Reminders¶
- Due tomorrow: Ch 9 on zyBooks
- Quiz 4 on Wednesday
- Based on HW 7 and PA 1 (modules)
- Review questions from Practice Quiz 4
- Study the challenge activities in zyBook
- Due Friday: PA 1 on Gradescope
PA1 Debrief¶
- utilities.py
- test_utilities.py
Haiku Lab¶
Complete the following steps in order:
[5 min]
Read all the instructions[5 min]
Download files, run pytest[10 min]
Implement remove_symbols[10 min]
Implement check_syllables[10 min]
Implement validate_haiku
Background¶
Haiku is a form of Japanese poetry. A haiku consists of 3 lines of text. The first line must have 5 syllables. The second line must have 7 syllables. And the third line must, again, have 5 syllables.
Example haiku
An old silent pond
A frog jumps into the pond—
Splash! Silence again.
by Matsuo Basho
Additional haiku examples can be found online. Not all English translations follow the 5-7-5 rule.
Getting Started¶
Create a folder for today's lab. Download the following files:
Install pysyllables via the Thonny package manager
or from the terminal: pip install pysyllables
.
Run pytest
to make sure everything is set up right.
remove_symbols¶
Later in the lab, you will use the get_syllable_count()
function in the pysyllables
module.
This function will tell you how many syllables an English word has.
However, a word may not include any punctuation or other non-alphabetic characters.
For example:
get_syllable_count("pond—")
returns Noneget_syllable_count("pond")
returns 1get_syllable_count("again.")
returns Noneget_syllable_count("again")
returns 2
The remove_symbols()
function should remove all characters from a string except for letters and spaces.
Start by creating an empty result string.
Then loop through the original line, one character at a time.
If the character is alphabetic or space, append the character to the result.
(Use the isalpha()
and isspace()
string methods.)
Then return the result.
check_syllables¶
This helper function will be called from validate_haiku()
three times.
The first parameter (str) represents one line of the haiku.
The second parameter (int) is the required number of syllabus.
The return value is:
"too short"
if the number of syllables is less than the number required"valid"
if the number of syllables is equal to the number required"too long"
if the number of syllables is greater than the number required
Hint: Start by calling remove_symbols()
on the line.
Then use the split()
method to split the string into a list of words.
validate_haiku¶
This is the main function of the lab.
The parameter is a multiline string representing the haiku.
If the string is a valid haiku (based on the definition above), the function should return "Valid haiku."
(with a period at the end).
If the string has more or less than three lines, the function should return "Invalid haiku, not 3 lines."
.
If the string has three lines but is not a valid haiku, the function should return a detailed message about the invalid line(s).
- The format should be
"Line X too Y. "
, whereX
is the line number, andY
is eitherlong
orshort
. - Each line should end with a single space after the period. Ex:
"Line 1 too short. Line 3 too long. "
Hint: Use the splitlines()
method to split the string into a list of lines.
Then call the check_syllables()
function on each line.