CS 240: Algorithms and Data Structures
James Madison University, Spring 2024

Introduction

The Java collections framework provides interfaces corresponding to some of the most commonly used abstract data types: List, Map, Set, etc. One abstract data type that is not included is the Multiset.

The Multiset ADT is a generalization of the Set ADT that allows duplicate elements. Multisets are useful in cases where we don’t care about the order of the elements but we do need to maintain a count of how many times each element appears. For example, in a document analysis domain we may need to count the number of occurrences of each word in the document. Such distributions of word frequencies can be useful for problems like authorship detection and topic classification.

For this project, you will be making several modifications to the following Multiset implementation. These modifications will improve the usability and efficiency of this collection. Make sure you understand the provided code before attempting to implement the rest of the project.

Multiset.java

The project stages below should be completed in order. Each part builds on the work completed in the earlier stages.

Part 1 - Generics

Your first task is to modify the provided Multiset class so that it makes appropriate use of generics. The functionality should be unchanged, except that your improved class should accept a type parameter specifying an element type.

Once this phase of the project it complete, code like the following should compile and execute:

    Multiset<Character> chars = new Multiset<>();
    
    chars.add('a');
    chars.add('b', 3);
    chars.add('a');
    
    System.out.println(chars); // prints "[a x 2, b x 3]"

You should test your updated class by creating a test driver or some simple JUnit tests.

There is nothing to submit for Part 1.

Part 2 - Multiset Application

For this part of the project you will build a simple utility class that makes use of your generic multiset. Complete the following dice rolling class so that the methods conform to the provided Javadoc comments:

The submission tests will make use of a seed value to create predictable results, so be careful not to make any extraneous calls to the random number generator.

Update 1/28 There are two reasonable ways to use java.util.Random to simulate rolling a die. For example, to generate a random number in the range 1-6, we might do either of the following:

int result = rand.nextInt(6) + 1; // Do this!
int result = rand.nextInt(1, 7);  // Not this.

Both are correct, but they will not give the same sequence of values given the same seed. The autograder will test your code under the assumption that you have implemented dice rolling using the single-argument version of the nextInt method.

The following example illustrates the required format for the output of the plotRolls method. In this example the call was roller.plotRolls(1000, 3, 6, 10) and the seed value was 10.

Rolling 3d6 1000 Times
3 (0.40%)   :
4 (1.60%)   :#
5 (3.10%)   :###
6 (4.70%)   :####
7 (7.50%)   :#######
8 (8.00%)   :########
9 (12.10%)  :############
10 (12.60%) :############
11 (13.90%) :#############
12 (11.50%) :###########
13 (9.10%)  :#########
14 (7.20%)  :#######
15 (4.30%)  :####
16 (2.30%)  :##
17 (1.40%)  :#
18 (0.30%)  :

Notice that the percentages are reported to two decimal places and that the labels for each row are left-justified within a field of length 12.

Once you have completed Part 2, submit Roller.java and your updated Multiset.java to Gradescope.

Part 3 - Interfaces

ADT vs Data Structure

At this point, your Multiset collection is type-safe, but it violates the principle of separating implementation from interface. The Java Collections Framework provides some nice examples of putting this principle into action. The UML below shows a small subset of the classes and interfaces included in the collections framework. In this diagram the List interface expresses the methods that are associated with the List ADT. LinkedList and ArrayList are concrete classes that implement the List interface, each using a different underlying data structure. The various abstract classes provide default implementations for some methods to simplify the implementation of the concrete classes.

List UML

In this diagram, interfaces are colored gray, abstract classes are yellow, and concrete classes are white.

Multiset Interface

Your goal for this stage of the project is to split the existing Multiset class into an interface named Multiset and a concrete implementation named ArrayListMultiset. Your finished code should conform to the UML diagram below. The entities you need to create are enclosed by the dashed red line. The remaining classes are already part of the Java Collections Framework. You should not change the way any of the existing classes are implemented.

ArrayListMultiset UML

Notice that your completed Multiset interface must extend the java.util.Collection interface. The advantage of this is that any method anywhere that expects a Collection object will now work correctly when passed a Multiset. The challenge is that the Collection interface requires several methods that are not included in the provided Multiset class.

The optional methods removeAll and retainAll must return an UnsupportedOperationException.

You may use the following implementation of the hashCode method:

      /**
       * The role of the hashCode method will be addressed later in the semester.
       * 
       * @return 1, always 1.
       */
      @Override
      public int hashCode() {
        return 1;
      }

All other Collection methods must be implemented, including those that are described as optional. Note that the java.util.AbstractCollection class provides default implementations for some of these methods. You should take advantage of that fact.

You may use the following JUnit tests to confirm that your code matches the specification.

Part 4 - Building a Better Data Structure

The modifications so far have improved the usability of our original Multiset class, but they have not improved the efficiency. The current implementation is terribly inefficient in terms of both space and time.

Consider the following code snippet:

    Multiset<String> set = new ArrayListMultiset<>();
    for (int i = 0; i < 10000; i) {
       set.add("Harrisonburg");
    }

After this code executes the ArrayList inside set will contain 10,000 copies of the string "Harrisonburg".

It would be much more efficient to store a single copy of "Harrisonburg" along with a counter. Subsequent calls to add would then increment the counter without actually increasing the number of objects stored in the ArrayList.

For this stage of the project you will create a new implementation of the Multiset interface named CounterMultiset. Instead of storing duplicate objects, this class should store Pair objects representing counters. The resulting class hierarchy should match the following UML:

Multiset UML

Once again, you must create the files inside the dashed red line.

You may test your finished implementation using the following JUnit tests:

Implementation Tips

  1. Don’t be fooled by the fact that most of the methods in Multiset.java are simple one-liners. This will not be the case for CounterMultiset.java. For example, you will need to create a custom Iterator class for CounterMultiset. You won’t be able to use the existing iterator of the ArrayList class.
  2. Use the @Override annotation where appropriate! This has several benefits. It will prevent you from making some sneaky coding errors (overloading/duplicating methods instead of overriding them). It will also allow you to avoid writing Javadoc comments for overridden methods: it is generally not necessary to provide Javadoc comments for overridden methods, as long as the description in the superclass or interface provides an adequate description of the methods behavior.
  3. Don’t forget everything you learned in CS 149 and CS 159!
    • Code duplication is bad: use helper methods where appropriate.
    • Code duplication is bad: use abstract superclasses to handle functionality that is the same across multiple sibling classes. For example, Do ArrayListMultiset and CounterMultiset both need identical hashCode methods?
    • Unnecessary code is bad: Do you need to implement the addAll method? Or can you inherit from AbstractCollection to take advantage of an existing implementation?

Ignoring the advice above will negatively impact your grade, even if your code passes all of the submission tests.

Part 5 Demonstration Workload

It was claimed above that the counter implementation is much more efficient than the ArrayList implementation for the Multiset ADT. Strictly speaking, this claim depends on the actual sequence of operations performed. Take a few minutes to think about scenarios where the CounterMultiset will have an advantage and scenarios where it will not, then complete the following driver class so that it conforms to the Javadoc comments.

WorkloadDriver.java

Submission and Grading

The project will be graded as follows:

Gradescope Functionality Tests for Parts 1 & 2 10%
Gradescope Functionality Tests for Parts 3 & 4 60%
Gradescope Style Checks for Parts 3 & 4 10%
Gradescope Timing Tests for Part 5 10%
Gradescope Instructor Style Points 10%

Submissions that fail any functionality tests will receive at most 75% of the available points for that component of the grade.

Looking Ahead

The counter-based Multiset that you developed is, in general, much more efficient than the terrible implementation we provided. This raises the question of whether it is possible to develop an even more efficient implementation. It is. We’ll see several data structures later in the semester that could be used to create much better implementations of this ADT.

Acknowledgments

The Multiset interface used for this project is based on a similar interface included with the the Guava Google Java libraries.