Skip to content

Nov 01: Quiz 4b, Strings and Lists

Learning Objectives

After today's class, you should be able to:

  • Describe methods for manipulating strings and analyzing English text.

Announcements

PA1-B Debrief

  • score_dice.py
  • test_score_dice.py

Quiz 4b

  • Log in as student
  • Only two windows:
    1. Thonny
    2. Web browser
  • Web traffic monitored
  • Log out when finished

Ch10 Preview

Example

The following program analyzes the book Frankenstein using the Natural Language Toolkit.

from nltk.tokenize import word_tokenize
from nltk.corpus import stopwords

# Read the book into a string
file = open("pg84.txt", encoding="utf-8-sig")
text = file.read()

# Count how many times each word occurs
words = {}
for word in word_tokenize(text):
    if word.isalpha():       # Ignore non-words
        word = word.lower()  # Ignore case
        # Update dictionary
        if word in words:
            words[word] += 1
        else:
            words[word] = 1

# Remove stop words (like "the" and "you")
stop_words = set(stopwords.words("english"))
for word in sorted(words.keys()):
    if word in stop_words:
        count = words.pop(word)
        print(f"Removed {count} instances of {word}")

# Display other frequently used words
print()
for word in sorted(words.keys()):
    count = words[word]
    if count >= 50:
        print(f"Found {count} instances of {word}")