Nov 01: Quiz 4b, Strings and Lists
Learning Objectives
After today's class, you should be able to:
- Describe methods for manipulating strings and analyzing English text.
Announcements¶
- Chapter 10: Sequences
- Project 2 (due Nov 14)
- PA2Quiz due Sunday!
PA1-B Debrief¶
- score_dice.py
- test_score_dice.py
Quiz 4b¶
- Log in as
student
- Only two windows:
- Thonny
- Web browser
- Web traffic monitored
- Log out when finished
Ch10 Preview¶
- String Manipulation
- Python String Methods
- Example: splitting and joining
- Natural Language Processing
- Python's NLTK Package
- Example: parts of speech
Example
The following program analyzes the book Frankenstein using the Natural Language Toolkit.
from nltk.tokenize import word_tokenize
from nltk.corpus import stopwords
# Read the book into a string
file = open("pg84.txt", encoding="utf-8-sig")
text = file.read()
# Count how many times each word occurs
words = {}
for word in word_tokenize(text):
if word.isalpha(): # Ignore non-words
word = word.lower() # Ignore case
# Update dictionary
if word in words:
words[word] += 1
else:
words[word] = 1
# Remove stop words (like "the" and "you")
stop_words = set(stopwords.words("english"))
for word in sorted(words.keys()):
if word in stop_words:
count = words.pop(word)
print(f"Removed {count} instances of {word}")
# Display other frequently used words
print()
for word in sorted(words.keys()):
count = words[word]
if count >= 50:
print(f"Found {count} instances of {word}")