Skip to content

Oct 15: Parsing, Analyzing Text

Learning Objectives

After today's class, you should be able to:

  • Loop through a string, one character at a time, and build a new string.
  • Apply string methods (isalpha, isspace, split, and splitlines).
  • Break down a problem by writing and calling multiple helper functions.

Announcements

Before Class

Complete steps 3–6 of the Visual Studio Code Setup on your lab machine. That way, you can use VS Code both on your laptop and in the computer lab.

Haiku Lab

Complete the following steps in order:

  1. [10 min] Background, Getting Started
  2. [10 min] Implement remove_symbols
  3. [15 min] Implement check_syllables
  4. [15 min] Implement validate_haiku

Background

Haiku is a form of Japanese poetry. A haiku consists of 3 lines of text. The first line must have 5 syllables. The second line must have 7 syllables. And the third line must, again, have 5 syllables.

Example haiku

An old silent pond
A frog jumps into the pond—
Splash! Silence again.

by Matsuo Basho

Additional haiku examples can be found online. Not all English translations follow the 5-7-5 rule.

Getting Started

Create a folder for today's lab. Download the following files:

Install pysyllables via the terminal: pip install pysyllables.

Run pytest to make sure everything is set up right.

remove_symbols

Later in the lab, you will use the get_syllable_count() function in the pysyllables module. This function will tell you how many syllables an English word has. However, a word may not include any punctuation or other non-alphabetic characters. For example:

  • get_syllable_count("pond—") returns None
  • get_syllable_count("pond") returns 1
  • get_syllable_count("again.") returns None
  • get_syllable_count("again") returns 2

The remove_symbols() function should remove all characters from a string except for letters and spaces. Start by creating an empty result string. Then loop through the original line, one character at a time. If the character is alphabetic or space, append the character to the result. (Use the isalpha() and isspace() string methods.) Then return the result.

check_syllables

This helper function will be called from validate_haiku() three times. The first parameter (str) represents one line of the haiku. The second parameter (int) is the required number of syllabus. The return value is:

  • "too short" if the number of syllables is less than the number required
  • "valid" if the number of syllables is equal to the number required
  • "too long" if the number of syllables is greater than the number required

Hint: Start by calling remove_symbols() on the line. Then use the split() method to split the string into a list of words. Use a loop to add the syllable count of each word. Then decide which result string to return.

validate_haiku

This is the main function of the lab. The parameter is a multiline string representing the haiku. If the string is a valid haiku (based on the definition above), the function should return "Valid haiku." (with a period at the end).

If the string has more or less than three lines, the function should return "Invalid haiku, not 3 lines.". If the string has three lines but is not a valid haiku, the function should return a detailed message about the invalid line(s).

  • The format should be "Line X too Y. ", where X is the line number, and Y is either long or short.
  • Each line should end with a single space after the period. Ex: "Line 1 too short. Line 3 too long. "

Hint: Use the splitlines() method to split the string into a list of lines. Then call the check_syllables() function on each line and store the return value in a variable. Use these variable to build the detailed return message.