Skip to content

Wednesday 11/1

Slicing

What will be printed?

a = "Halloween"
print(a[2:3])
print(a[4:])
print(a[:4])
print(a[1:5:2])

Split and Join

What will be printed?

b = "These are the times that try men's souls"
print(b.split())
print(b.split("a"))
print("4".join(b.split("a")))

NLTK Package

  • NLTK is a useful package for Natural Language Processing (NLP) tasks. For example:

    >>> import nltk
    >>> nltk.pos_tag(["tree"])
    [('tree', 'NN')]
    >>> sentence = "That squirrel is in a tree"
    >>> nltk.pos_tag(sentence.split())
    [('That', 'DT'),
     ('squirrel', 'NN'),
     ('is', 'VBZ'),
     ('in', 'IN'),
     ('a', 'DT'),
     ('tree', 'NN')]
    >>> sentence1 = "My dog will tree a squirrel"
    >>> nltk.pos_tag(sentence1.split())
     [('My', 'PRP$'),
      ('dog', 'NN'),
      ('will', 'MD'),
      ('tree', 'VB'),
      ('a', 'DT'),
      ('squirrel', 'NN')]
    
  • Interesting example of processing Mary Shelly's Frankenstein (provided by Dr. Mayfield):

    from nltk.tokenize import word_tokenize
    from nltk.corpus import stopwords
    
    # Read the book into a string
    file = open("pg84.txt", encoding="utf-8-sig")
    text = file.read()
    
    # Count how many times each word occurs
    words = {}
    for word in word_tokenize(text):
        if word.isalpha():       # Ignore non-words
            word = word.lower()  # Ignore case
            # Update dictionary
            if word in words:
                words[word] += 1
            else:
                words[word] = 1
    
    # Remove stop words (like "the" and "you")
    stop_words = set(stopwords.words("english"))
    for word in sorted(words.keys()):
        if word in stop_words:
            count = words.pop(word)
            print(f"Removed {count} instances of {word}")
    
    # Display other frequently used words
    print()
    for word in sorted(words.keys()):
        count = words[word]
        if count >= 50:
            print(f"Found {count} instances of {word}")
    

Quiz Instructions

  • Log into the desktop as student with no password.
  • Log into Canvas.
  • Wait until the instructor says to start before accessing Part 1.
  • The quiz can be found in the "Modules" section of Canvas.
  • The entire quiz has a 25 minute time limit.
  • Part 1 (Conceptual):
    • You may not use Thonny (or any other resources) for Part 1.
  • Part 2 (Coding):
    • You have unlimited submissions.
    • You do NOT need to provide docstrings.
    • The autograder will NOT check for PEP8 violations (though you should still use good style.)
    • You must use Thonny as your editor.
    • You may not access any external web pages or other resources.