Skip to content

Nov 06: Parsing, Analyzing Text

Learning Objectives

After today's class, you should be able to:

  • Loop through a string, one character at a time, and build a new string.
  • Apply string methods (isalpha, isspace, split, and splitlines).
  • Break down a problem by writing and calling multiple helper functions.

Announcements

Haiku Lab

Complete the following steps in order:

  1. [5 min] Read all the instructions
  2. [5 min] Download files, run pytest
  3. [10 min] Implement remove_symbols
  4. [10 min] Implement check_syllables
  5. [10 min] Implement validate_haiku

Background

Haiku is a form of Japanese poetry. A haiku consists of 3 lines of text. The first line must have 5 syllables. The second line must have 7 syllables. And the third line must, again, have 5 syllables.

Example haiku

An old silent pond
A frog jumps into the pond—
Splash! Silence again.

by Matsuo Basho

Additional haiku examples can be found online. Not all English translations follow the 5-7-5 rule.

Getting Started

Create a folder for today's lab. Download the following files:

Install pysyllables via the Thonny package manager. (Or from the terminal: pip install pysyllables.)

Run pytest to make sure everything is set up right.

remove_symbols

Later in the lab, you will use the get_syllable_count() function in the pysyllables module. This function will tell you how many syllables an English word has. However, a word may not include any punctuation or other non-alphabetic characters. For example:

  • get_syllable_count("pond—") returns None
  • get_syllable_count("pond") returns 1
  • get_syllable_count("again.") returns None
  • get_syllable_count("again") returns 2

The remove_symbols() function should remove all characters from a string except for letters and spaces. Start by creating an empty result string. Then loop through the original line, one character at a time. If the character is alphabetic or space, append the character to the result. (Use the isalpha() and isspace() string methods.) Then return the result.

check_syllables

This helper function will be called from validate_haiku() three times. The first parameter (str) represents one line of the haiku. The second parameter (int) is the required number of syllabus. The return value is:

  • "too short" if the number of syllables is less than the number required
  • "valid" if the number of syllables is equal to the number required
  • "too long" if the number of syllables is greater than the number required

Hint: Start by calling remove_symbols() on the line. Then use the split() method to split the string into a list of words.

validate_haiku

This is the main function of the lab. The parameter is a multiline string representing the haiku. If the string is a valid haiku (based on the definition above), the function should return "Valid haiku." (with a period at the end).

If the string has more or less than three lines, the function should return "Invalid haiku, not 3 lines.". If the string has three lines but is not a valid haiku, the function should return a detailed message about the invalid line(s).

  • The format should be "Line X too Y. ", where X is the line number, and Y is either long or short.
  • Each line should end with a single space after the period. Ex: "Line 1 too short. Line 3 too long. "

Hint: Use the splitlines() method to split the string into a list of lines. Then call the check_syllables() function on each line.