Wednesday 11/1¶
Slicing¶
What will be printed?
a = "Halloween"
print(a[2:3])
print(a[4:])
print(a[:4])
print(a[1:5:2])
Split and Join¶
What will be printed?
b = "These are the times that try men's souls"
print(b.split())
print(b.split("a"))
print("4".join(b.split("a")))
NLTK Package¶
-
NLTK is a useful package for Natural Language Processing (NLP) tasks. For example:
>>> import nltk >>> nltk.pos_tag(["tree"]) [('tree', 'NN')] >>> sentence = "That squirrel is in a tree" >>> nltk.pos_tag(sentence.split()) [('That', 'DT'), ('squirrel', 'NN'), ('is', 'VBZ'), ('in', 'IN'), ('a', 'DT'), ('tree', 'NN')] >>> sentence1 = "My dog will tree a squirrel" >>> nltk.pos_tag(sentence1.split()) [('My', 'PRP$'), ('dog', 'NN'), ('will', 'MD'), ('tree', 'VB'), ('a', 'DT'), ('squirrel', 'NN')]
-
Interesting example of processing Mary Shelly's Frankenstein (provided by Dr. Mayfield):
from nltk.tokenize import word_tokenize from nltk.corpus import stopwords # Read the book into a string file = open("pg84.txt", encoding="utf-8-sig") text = file.read() # Count how many times each word occurs words = {} for word in word_tokenize(text): if word.isalpha(): # Ignore non-words word = word.lower() # Ignore case # Update dictionary if word in words: words[word] += 1 else: words[word] = 1 # Remove stop words (like "the" and "you") stop_words = set(stopwords.words("english")) for word in sorted(words.keys()): if word in stop_words: count = words.pop(word) print(f"Removed {count} instances of {word}") # Display other frequently used words print() for word in sorted(words.keys()): count = words[word] if count >= 50: print(f"Found {count} instances of {word}")
Quiz Instructions¶
- Log into the desktop as
student
with no password. - Log into Canvas.
- Wait until the instructor says to start before accessing Part 1.
- The quiz can be found in the "Modules" section of Canvas.
- The entire quiz has a 25 minute time limit.
- Part 1 (Conceptual):
- You may not use Thonny (or any other resources) for Part 1.
- Part 2 (Coding):
- You have unlimited submissions.
- You do NOT need to provide docstrings.
- The autograder will NOT check for PEP8 violations (though you should still use good style.)
- You must use Thonny as your editor.
- You may not access any external web pages or other resources.