Project 1: Text Simplifier¶

word cloud with "text" and "simplifier"
Image generated by wordart.com

Motivation¶

Around the world, millions of people struggle to read complex written materials due to differences in literacy level, language background, and cognitive load. Simplifying text, while keeping the meaning the same, can make information easier to understand and more inclusive.

In this project, you will build tools that analyze and simplify English text using loops and string methods. By the end, your program will take a paragraph of text and produce an easier-to-read version, along with readability statistics before and after simplification.

Learning Objectives¶

After completing this project, you will be able to:

Use string methods: strip, replace, split, join, etc.
Apply string slicing and loops to modify and analyze text.
Write test functions that confirm correct program behavior.
Structure a program into multiple modules that work together.
Appreciate how software can support accessibility and clarity.

Project Overview¶

The project consists of five Python modules:

analysis.py – Functions for analyzing text (count sentences, words, average word length).
simplify.py – Functions for simplifying text (normalize spacing, remove ALL CAPS, replace complex words, shorten sentences).
main.py (complete) – Example program that shows how the functions work together.
test_analysis.py – Unit tests for analysis functions.
test_simplify.py – Unit tests for simplify functions.

Stubs and docstrings are provided for all functions. Download and extract PA1.zip into your CS149 folder. Your goal is to write code according to the docstrings and instructions below.

Info

The stubs include type hints, which show the types of values a function expects and returns. For example:

def count_sentences(text: str) -> int:

A colon (:) specifies an argument's type, and an arrow (->) specifies the return type. These types are omitted from the docstring if specified in the def statement.

Type hints help readers and tools check types before running the code, but Python does not enforce these types at runtime.

Part A: Write Unit Tests¶

Before writing the main functions, you will write unit tests using pytest. Writing tests first helps clarify what each function should do. The unit tests from Part A will give you confidence that your code in Part B works.

Implement each function in test_analysis.py and test_simplify.py by writing one or more assert statements. When you submit these two files for Part A, Gradescope will run your tests against the sample solution. That way, you will know if your tests are correct and complete.

Note

Docstrings are generally not required for test functions. However, we included docstrings in the stubs to give you hints for writing each test.

Part B: Solve Problems¶

After your tests are ready and running, implement the functions defined in analysis.py and simplify.py. Test each function as you go using the unit tests you wrote for Part A.

You are given a completed main.py program that shows how everything fits together. When you are finished with Part B, this program should output:

Original text:
This IS AN EXAMPLE paragraph that will commence our analysis to demonstrate the potential benefits of the system. The process will UTILIZE advanced methods to provide assistance to users; approximately across several departments.

Original text statistics:
  Sentences: 2
  Words: 33
  Average word length: 5.9

Simplified text:
This is an example paragraph that will begin our analysis to show the potential benefits of the system. The process will use advanced methods to provide help to users. About across several departments.

Simplified text statistics:
  Sentences: 3
  Words: 33
  Average word length: 5.0

Warning

Each function should use basic string methods, iteration, and/or slicing to transform the text. Do not use advanced Python features that we have not yet learned in class. This includes, but is not limited to:

Submissions that contain any of these language features will be rejected. Please stick to the basics so we can assess whether you have met the learning objectives.

Submission¶

You will be limited to 10 submissions for each part. The purpose of this limit is to encourage you to do your own testing. Set a goal to get full credit on Part B on your first submission!

Part A (40 pts)¶

Submit test_analysis.py and test_simply.py to Gradescope.

Your tests will be run against the instructor's sample solution. The autograder verifies that:

Each test uses an assert to check the result of the function call.
Each test fails if the function being tested returns the wrong value.
Each test passes if the function being tested returns the correct value.
Every statement in the sample solution is run at least once during testing.

The last item is known as code coverage. You should get 100% coverage by implementing all tests provided in the stubs. Gradescope will report any line numbers that you didn't test. The line numbers won't be meaningful, given you don't have the sample solution. But if you get stuck, you can ask your instructor to explain a line number.

Part B (60 pts)¶

Submit analysis.py and simply.py to Gradescope.

Your code will be run against many tests written by the faculty. If any of these tests fail, we recommend you add a similar test to your own code. Make sure your test fails in the same way that the Gradescope test fails. Then use the debugger to find and fix the problem until your own test passes. Doing this will help ensure your solution will also pass on Gradescope.

Note

This is a two-week assignment. The Gradescope submission for Part B will not open until the second week. In the meantime, you can (and should) test your code using your own tests.

Reflection¶

Submit the graded survey on Canvas after completing the assignment.

Clarification¶

Updated 10/16/2025

The original docstrings had two minor defects. Please replace the following in your code:

simplify_case()
- OLD: Returns: A string with uppercase words simplified.
- NEW: Returns: A string with ALL CAPS words made lowercase.
average_word_length()
- OLD: while "U.S.A." counts as 3 (because the periods are stripped).
- NEW: while "U.S.A." counts as 5 (because the end period is stripped).

Updated 10/17/2025

In main.py, you can change the first word of the sample paragraph from THIS to This. The first word should not be in ALL CAPS, according to the docstring of simplify_case().