The Java collections framework provides interfaces corresponding to some of the most commonly used abstract data types: List, Map, Set, etc. One abstract data type that is not included is the Multiset.
The Multiset ADT is a generalization of the Set ADT that allows duplicate elements. Multisets are useful in cases where we don’t care about the order of the elements but we do need to maintain a count of how many times each element appears. For example, in a document analysis domain we may need to count the number of occurrences of each word in the document. Such distributions of word frequencies can be useful for problems like authorship detection and topic classification.
For this project, you will be making several modifications to the following Multiset implementation. These modifications will improve the usability and efficiency of this collection. Make sure you understand the provided code before attempting to implement the rest of the project.
The project stages below should be completed in order. Each part builds on the work completed in the earlier stages.
Your first task is to modify the provided Multiset
class so that it makes
appropriate use of generics. The functionality should be unchanged,
except that your improved class should accept a type parameter
specifying an element type.
Once this phase of the project it complete, code like the following should compile and execute:
Multiset<Character> chars = new Multiset<>();
chars.add('a');
chars.add('b', 3);
chars.add('a');
System.out.println(chars); // prints "[a x 2, b x 3]"
You should test your updated class by creating a test driver or some simple JUnit tests.
There is nothing to submit for Part 1.
At this point, your Multiset collection is type-safe, but it violates the
principle of separating implementation from interface.
The
Java Collections Framework
provides some nice examples of putting this principle into action.
The UML below shows a small subset of the classes and interfaces
included in the collections framework. In this diagram the List
interface expresses the methods that are associated with the List ADT.
LinkedList
and ArrayList
are concrete classes that implement the
List
interface, each using a different underlying data structure. The
various abstract classes provide default implementations for some
methods to simplify the implementation of the concrete classes.
In this diagram, interfaces are colored gray, abstract classes are yellow, and concrete classes are white.
Your goal for this stage of the project is to split the existing
Multiset
class into an interface named Multiset
and a concrete
implementation named ArrayListMultiset
. Your finished code should
conform to the UML diagram below. The entities you need to create are
enclosed by the dashed red line. The remaining classes are already
part of the Java Collections Framework. You should not change the way
any of the existing classes are implemented.
Notice that your completed Multiset
interface must extend the
java.util.Collection
interface. The advantage of this is that any
method anywhere that expects a Collection
object will now work
correctly when passed a Multiset
. The challenge is that the
Collection
interface requires several methods that are not included
in the provided Multiset
class.
The optional methods removeAll
and retainAll
must return an
UnsupportedOperationException
.
You may use the following implementation of the hashCode
method:
/**
* The role of the hashCode method will be addressed later in the semester.
*
* @return 1, always 1.
*/
@Override
public int hashCode() {
return 1;
}
All other Collection
methods must be implemented, including those
that are described as optional. Note that the
java.util.AbstractCollection
class provides default implementations
for some of these methods. You should take advantage of that fact.
You may use the following JUnit tests to confirm that your code matches the specification.
Multiset
interface. Any correct implementation should pass these tests.MultisetTest
that can be used to test your ArrayListMultiset
implementation.The modifications so far have improved the usability of our original
Multiset
class, but they have not improved the efficiency. The
current implementation is terribly inefficient in terms of both space
and time.
Consider the following code snippet:
Multiset<String> set = new ArrayListMultiset<>();
for (int i = 0; i < 10000; i) {
set.add("Harrisonburg");
}
After this code executes the ArrayList
inside set
will contain
10,000 copies of the string "Harrisonburg"
.
It would be much more efficient to store a single copy of
"Harrisonburg"
along with a counter. Subsequent calls to add
would
then increment the counter without actually increasing the number of
objects stored in the ArrayList
.
For this stage of the project you will create a new implementation of
the Multiset
interface named CounterMultiset
. Instead of storing
duplicate objects, this class should store
Pair
objects representing counters. The resulting class hierarchy should match
the following UML:
Once again, you must create the files inside the dashed red line.
You may test your finished implementation using the following JUnit tests:
MultisetTest
.Multiset.java
are simple one-liners. This will not be the case
for CounterMultiset.java
. For example, you will need to create a
custom Iterator
class for CounterMultiset
. You won’t be able to
use the existing iterator of the ArrayList
class.@Override
annotation where appropriate! This has several
benefits. It will prevent you from making some sneaky coding errors
(overloading/duplicating methods instead of overriding them). It
will also allow you to avoid writing Javadoc comments for overridden
methods: it is generally not necessary to provide Javadoc
comments for overridden methods, as long as the description in the
superclass or interface provides an adequate description of the methods
behavior.ArrayListMultiset
and
CounterMultiset
both need identical hashCode
methods?addAll
method? Or can you inherit from AbstractCollection
to take
advantage of an existing implementation?Ignoring the advice above will negatively impact your grade, even if your code passes all of the submission tests.
It was claimed above that the counter implementation is much more
efficient than the ArrayList
implementation for the Multiset ADT.
Strictly speaking, this claim depends on the actual sequence of
operations performed. Take a few minutes to think about scenarios
where the CounterMultiset
will have an advantage and scenarios where
it will not, then complete the following driver class so that it
conforms to the Javadoc comments.
The project will be graded as follows:
Gradescope Functionality Tests for Parts 2 & 3 | 70% |
Gradescope Style Checks for Parts 2 & 3 | 10% |
Gradescope Timing Tests for Part 4 | 10% |
Gradescope Instructor Style Points | 10% |
Submissions that fail any functionality tests will receive at most 75% of the available points for that component of the grade.
The counter-based Multiset that you developed is, in general, much more efficient than the terrible implementation we provided. This raises the question of whether it is possible to develop an even more efficient implementation. It is. We’ll see several data structures later in the semester that could be used to create much better implementations of this ADT.
The Multiset
interface used for this project is based on a similar
interface included with the the
Guava Google Java libraries.