PA4: BatchGeo
Categories:
18 minute read
Programming Assignment 4
![BatchGeo](BatchGeo_Logo.png)
Learning Objectives
This assignment is designed to help you learn several things. First, it will help you learn how to use type-safe collections. Second, it will help you learn about the capabilities of different kinds of collections (and when each is appropriate). Third, it will help you learn more about file I/O. Finally, it well help you understand the importance of testing and debugging.
Overview
Nearby is a (fictitious) company that develops software in three closely related areas: personal navigation systems, en-route and mobile commerce, and location-based services. You have been contracted to develop the classes they will need to complete an application named BatchGeo.
BatchGeo is an application that can be used to geocode street addresses (i.e., convert street addresses to longitude/latitude coordinates).
Background
Before you can start implementing the system, you need to learn a little bit about the terminology and file formats used by Nearby.
Acronyms
- TIGER
- Topologically Integrated Geographic Encoding and Referencing
Definitions
- Address
- A street name and a house/lot number.
- Equator
- The imaginary line on the surface of and encircling the Earth that is equidistant to the North and South Poles.
- High End of the Segment
- The end of the segment with the largest house/lot number
- House/Lot Number
- A number (for our purposes, an integer) ascribed to a particular location (often a vacant lot, house, or business) on a segment.
- Interpolation Parameter for a House Number
- Given a low house number, L, a high house number, H, and a house number of interest, A, in [L,H], the interpolation parameter for A is defined as m=(A-L)/(H-L), where the house numbers are treated like real numbers (not integers).
- Latitude
- The angular distance (in degrees) north (positive) or south (negative) of the Equator.
- Line of Latitude (Parallel)
- An imaginary line on the surface of and encircling the Earth drawn parallel to the Equator.
- Line of Longitude (Meridian)
- An imaginary great circle on the surface of the Earth connecting the North and South Poles and perpendicular to the Equator.
- Linear Interpolation of Two Numbers
- Given two numbers, a and b, and an interpolation parameter, m ∈ [0,1], the linear interpolation of a and b is denoted by v and defined as v=(1-m) ⋅ a + m ⋅ b.
- Linear Interpolation of Two Points
- Given two points, a = < ax, ay > and b = < bx, by >, and an interpolation parameter, m ∈ [0,1], the linear interpolation of a and b is denoted by v and defined as v = < vx, vy > where vx=(1-m) ⋅ ax + m ⋅ bx and vy=(1-m) ⋅ ay + m ⋅ by .
- Location
- A point in which the coordinates are a longitude and latitude.
- Longitude
- The angular distance (in degrees) east (negative) or west (positive) of the Prime (Greenwich) Meridian.
- Low End of the Segment
- The end of the segment with the smallest house/lot number.
- Point
- A (2-dimensional) point is an ordered pair. The point a is typically written as the ordered pair < ax, ay >
- Prime (Greenwich) Meridian
- The line of longitude designated to be at 0 degrees.
- Segment
- A portion of a street. For example, Main Street might consist of several segments, one from Oak Street to Elm Street, one from Elm Street to Pine Street, and one from Pine Street to Broad Street. A segment has a low end house/lot number and high end house/lot number.
- Segment Base
- A “database” (i.e., group or collection) containing all of the segments of interest.
- Street
- A named roadway that consists of one or more segments (that define the street). For example, Main Street might consist of several segments, one from Oak Street to Elm Street, one from Elm Street to Pine Street, and one from Pine Street to Broad Street. A street has a name (e.g., “Main Street”).
- Street Base
- A “database” (i.e., group or collection) containing all of the streets of interest.
Data Files
The data files used in BatchGeo are “human readable” text files. Each line in a data file is called a record, and each record may contain one or more fields (which are, conceptually, columns or components of the record). Fields in a record are separated by a delimiter character.
BatchGeo currently uses two kinds of data files, .seg
files and .str
files, each of which is described below. A geographic region will consist
of two files that have the same name and these two extensions. So, for example,
the data files for the JMU area are named jmuarea.seg
and jmuarea.str
.
.seg
Files
.seg
files contain information about segments, as they are defined by
the U.S. Census Bureau’s TIGER system (though using a slightly
different format).
The Format of .seg
Files
Records in
.seg
files use the tab character (i.e.,'\t'
) as the delimiter.seg
files contain a variable number of records.seg
files are terminated by an end-of-stream (sometimes called end-of-file) characterEach record contains nine fields as follows:
- The segment ID (
String
) - The longitude at the low end of the segment (
double
) - The latitude at the low end of the segment (
double
) - The longitude at the high end of the segment (
double
) - The latitude at the high end of the segment (
double
) - The length in KM (
double
) - The TIGER road type code (
String
) - The house/lot number at the low end of the segment (
int
) - The house/lot number at the high end of the segment (
int
)
An Example .seg
File Containing Seven Segments
![Example .seg File](seg_Example.png)
A Description of One Record in a .seg
File
![Description of a .seg Record](seg_Description.png)
.str
Files
.str
files contain information about streets.
The Format of .str
Files
Records in
.str
files use the tab character (i.e.,'\t'
) as the delimiter (Note: A tab character is used because street names can contain commas).str
files contain a variable number of records.str
files are terminated by an end-of-stream (sometimes called an end-of-file) character.str
files contain two kinds of records, name records and component recordsEach name record contains two fields as follows:
- The street name (
String
) - The number of segments in the street (
int
)
- The street name (
Each component record contains one field, the segment ID (
String
)
An Example .str
File Containing Four Streets
![Example .str File](str_Example.png)
A Description of One Street in a .str
File
![Description of a .str Street](str_Description.png)
A Geocoding Example
Using the example data files above, we can geocode the address 1970
Creek Ct (which consists of the int
house number 1970
and the street
name "Creek Ct"
).
From the .str
file we see that Creek Ct consists of a single segment
with ID 75739596. From the .seg
file we see that the low end number
on this segment is 1900 and the high end number on this segment is
1999. So, treating the house numbers as double
values (not int
values),
the interpolation parameter for house/lot number 1970 is given
by: m = (1970 - 1900) / (1999 - 1900) = 70/99 ≈ 0.707070
From the .seg
file we see that the low end location is <-78.737162,
+38.380782> and the high end location is <-78.736349,
+38.381549>. Using m to interpolate between these two points yields:
(1.0 - 0.707070) · -78.73716 + 0.707070 · -78.736349
= 0.292930 · -78.73716 + 0.707070 · -78.736349
= -23.064476 + -55.672110
= -78.736586
and
(1.0 - 0.707070) · 38.380782 + 0.707070 · -38.381549
= 0.292930 · 38.380782 + 0.707070 · 38.381549
= 11.242882 + 27.138442
= 38.381324
So, 1970 Creek Ct is geocoded to <-78.736586, 38.381324>, where, using the Nearby convention, the longitude is listed before the latitude.
Note that, in this example, the interpolation parameter was truncated
at 0.707070. So, you may get slightly different answers if you use
more digits of accuracy. Your code must use the full accuracy of double
values.
The Components to be Written
The components to be written are illustrated in black in the following UML class diagram. (The components in jade green are part of the Java API.)
![Class Diagram](Design.png)
Note that attributes and methods that are inherited are not made explicit in this diagram. If a method appears in both a base class and a derived class it means that the method in the derived class overrides the method in the base class. Note also that methods that are in an interface are not made explicit in classes that realize/implement that interface.
In addition to the specifications in the UML class diagram, your classes must conform to the following specifications.
The Location
Class
The toString()
method must return the longitude and latitude formatted using
a format String
of "%+9.6f,%+9.6f"
.
The OnSegmentLocation
Class
An OnSegmentLocation
is a Location
that knows the ID of the Segment
it is on.
This class implements the Comparable
interface so that the sort()
method
in the Collections
class can be used to sort collections of such objects based
on their segment IDs (which are String
objects that, themselves, implement the
Comparable
interface).
The Segment
Class
The default constructor must create a Segment
object with default
values for all of the attributes.
The contains()
method must return true
if the owning Segment
contains the given house number, and must return false
otherwise.
The interpolate()
method must geocode the given house number
using linear interpolation. (Hint: Be careful to use the longitude/latitude
for the low end of the segment as a and the longitude/latitude of the
high end of the segemnt as b.) It must return null
if the Segment
does not
contain the house/lot number.
the fromTSV()
method must set the value of all of the attributes
from a tab-separated-value representation (as in .seg
files). It
may need to throw one or more exceptions (e.g., if the String
has the wrong
number of fields, if a numeric field is non-numeric, etc.). It must return the ID of the segment.
The Street
Class
The constructor is passed the name of the Street
and must allocate memory for the segments
attribute.
The add()
method must add the given Segment
object to the segments
attribute.
The geocode()
method must return a List
of OnSegmentLocation
objects that contain the house number of interest. This List
need
not be ordered in any particular way. The returned List
may be empty but this
method must not return null
. (Hint: This method should use the interpolate()
method
in the Segment
class.)
The MapReader
Class
The readSegments()
method must read a collection of Segment
objects from a .seg
file. Any records in the .seg
file that are
incorrectly formatted (e.g., contain the wrong number of fields,
or contain non-numeric values for fields that must be numeric) must
be ignored. The Map
that is returned must use the ID of the
Segment
as they key. (Hint: The readLine()
method in the BufferedReader
class will return null
when the end-of-stream/end-of-file character has been
reached. The Scanner
class has hasNext()
and hasNextLine()
methods that
can be used to determine if the end-of-stream/end-of-file character
has been reached.)
The readStreets()
method must read a collection of Street
objects from a .str
file, using the collection of Segment
objects that it is passed. The Map
that is returned must use the
name of the Street
as the key.
The Geocoder
Class
The Constructor
The constructor will be passed the name (with no extension) of the
.seg
and .str
files to use (with no path information, meaning the file
can be found in the working directory). So, for example, for the JMU
area this method will be passed the String
"jmuarea"
(not
"jmuarea.seg"
or "jmuarea.str"
, and not "/cs159/eclipse/pa4/src/testing/jmuarea"
).
The constructor must use the MapReader
class to read the
.seg
and .str
files (from the working directory) and store the
information they contain in the Map
attribute named streets
.
Your implementation must not read either data file more than
once per execution. If it does, you will receive a grade of 0. In
other words, your implementation must read the file once (essentially
at the start of the execution), store all of the information needed in
the Map
named streets
, and then use that Map
to geocode the
addresses entered by the user. (Hint: There is no reason to keep the segments
after the constructor returns. They are only needed to read the streets.)
The fromAddress(String name, int number)
Method
The fromAddress()
method is passed an address (i.e., a street name and a
house/lot number) to geocode. The String
parameter will contain the
street name. The int
parameter will contain the house/lot number on
the street.
The fromAddress()
method must return a List
of OnSegmentLocation
objects that contains
all of the longitude/latitude pairs that correspond to the given
address (and nothing else).
If the address can’t be geocoded, the
List
must contain no elements (but must be non-null
).If the address can be geocoded, the calculated longitude and latitude must be a linear interpolation of the longitudes and latitudes at the low and high end of the relevant segment. If the address can be geocoded using more than one segment (i.e., there is more than one segment with the given street name that contains the given house/lot number), the
List
must be sorted in ascending order by segment ID. (Hint: This method should use thegeocode()
method in theStreet
class.)
Testing
Nearby has provided information for two areas that you can use for testing, each of which consists of two files. They are available for download from the following URLs:
and
You must put these files in the project directory so that your Gecoder
will find them in the working directory when the code is executed. (See the
section on “Help with Data Files” below.)
Obviously, for all of your tests, you will need a way to calculate the expected results (so that you can compare them with the actual results). While you can use a calculator for this purpose, it’s probably better to use a spreadsheet. Nearby has provided one that is available for download at:
It contains one row for each segment in the tiny
map. If you enter an address
in the appropriate row of the “Address” column, it will calculate the interpolation
parameter for that address, as well as the longitude and latitude for that address.
For example, entering the address 1970 in cell L5 will yield the same results
(in cell N5, O5 and P5) as in the example above.
Unit Testing
You should by now realize that you should test individual components before testing the entire system. Hence, you should write JUnit tests for all of the classes your write as you complete them (or as you complete indiviual methods in them). However, you are not required to do so. In other words, you will not be submitting your tests, you should write them because you know that doing so is part of a good process.
Nearby has provided a few unit tests to help you get started. They are available for download at:
These tests may need to be modified/corrected to be useful, and certainly need to be expanded.
Note that it may not be possible to completely cover all of the statements in the classes your write because it may not be possible to generate all of the exceptions that might be thrown. Nonetheless, you should try to cover as much of the code as possible.
Integration Testing
After you are confident that all of the methods in all of your classes pass your unit
tests, you should perform integration testing (in which you ensure that they
all work correctly together). You should do this by creating JUnit tests for the Geocoder
class
(which, directly or indirectly, uses all of the other classes). Note that there is no
main class included in this assignment. So, if you do not want to use JUnit for
integration testing, you will need to design and write a main class on your own.
Regardless, make sure you use an appropriate tolerance when comparing double
values.
Submission
You must submit (using Gradescope) a .zip
file named pa4.zip
that contains:
- Your implementation of all of the classes
packaged appropriately. The analytics
and geog
directories/folders must be at the top of the .zip file.
(See the first programming assignment if you have forgotten how to do this.)
Do not submit your JUnit tests and do not submit any data files.
Your grade will be reduced by 5 points for each submission after the 10th submission. So, you should try to ensure that your code is correct before you submit it the first time.
Grading
Your code will first be graded by Gradescope and then by the Professor. The grade you receive from Gradescope is the maximum grade that you can receive on the assignment
Gradescope Grading
Your code must compile (in Gradescope, this will be indicated in the section on “Does your code compile?”) and all class names and method signatures must comply with the specifications (in Gradescope, this will be indicated in the section on “Do your class names, method signatures, etc. comply with the specifications?”) for you to receive any points on this assignment. Gradescope will then grade your submission as follows:
Criterion | Points | Details |
---|---|---|
Conformance to the Style Guide | 0 | All or Nothing; Success Required |
Correctness | 100 | Partial Credit Possible |
As discussed above, 5 points will be deducted for every submission after the 10th. Gradescope will provide you with hints, but may not completely identify the defects in your submission.
Manual Grading
After the due date, the Professor may manually review your code. At this time, points may be deducted for duplicate code, inelegant code, inappropriate variable names, bad comments, etc.
Recommended Process
At this point in the semester, you should be able to create a good process and use it.
You should probably think about which classes use which other classes and work from the bottom up. In other
words, start with the classes that stand alone (the bottom), then move up to the classes that use them,
and keep moving up until you get to the Geocoder
class.
Help
You might find the following discussions helpful.
Help with Data Files
You should download the data files into the downloads
directory/folder that you created for this course.
After you have downloaded the data files you must open a file explorer or finder, select all of the files, and drag them
into Eclipse. Specifically, you must drag them into the project (not the src
directory/folder or anything underneath it). When you do so, make sure you select “Copy files” not “Link to files”.
(It is possible to put the data files elsewhere, but then, when you submit your solution, your code
will not be able to find them on the Gradescope server. In other words, for your code to work both on your computer and on Gradescope,
the data files must be dragged into the Eclipse project.)
Then, in your code that needs to use these files, you must use only the file name and extension (i.e., do not include a path). For example,
you might construct a BufferedReader
named in
as follows:
BufferedReader in;
in = new BufferedReader(new FileReader("tiny.seg"));
All of the data files will be available on Gradescope for you to use in your tests (assuming you follow the instructions above).
Help with Testing
The jmuarea
files are about 3Mb in total, which means that it will
be very difficult for you to debug your code using them. Hence, you
should use the smaller tiny
files for initial testing and debugging.
These files contain four streets, two with one segment each and two
with multiple segments.
Even with the small files, you will need to be systematic in the tests you use. Think carefully about the kinds of faults your implementation might contain and the kinds of trigger conditions that might allow you to detect them. Some important addresses to geocode are addresses at the low and high ends of the segment which need not be interpolated (i.e., will have interpolation parameters of 0 and 1 respectively), and addresses that are in the interior of the segment and must be interpolated (i.e., will have an interpolation parameter between 0 and 1).
For tiny.str
and tiny.seg
you should certainly start with the
following test inputs. Of course, you’ll first need to calculate the
expected answer for each of these inputs.
Addresses at Ends of Segments (i.e., No Interpolation Required)
- 748 Milk Spring Rd
- 798 Milk Spring Rd
- 674 Milk Spring Rd
- 746 Milk Spring Rd
- 1900 Creek Ct
- 1999 Creek Ct
- 1 Creek Loop
- 246 Creek Loop
- 387 Hill Dr
- 150 Hill Dr
Addresses in the Interior of Segments (i.e., Must Be Interpolated)
- 750 Milk Spring Rd
- 701 Milk Spring Rd
- 1935 Creek Ct
- 200 Creek Loop
- 155 Hill Dr
- 380 Hill Dr
No Such Street
- 747 Buttermilk Spring Rd
- 1 Creekside Ct
- 700 Spring Rd
- -1 Hilliard Dr
No Such Number
- 800 Milk Spring Rd
- 1750 Creek Ct
- -1 Hill Dr
Multiple “Hits” (i.e., Segments Containing the Address) Involving Addresses at the End and in the Interior
- 165 Hill Dr
- 303 Hill Dr
- 305 Hill Dr
- 359 Hill Dr
Multiple “Hits” Involving Addresses in the Interior
- 177 Hill Dr
- 351 Hill Dr
Questions to Think About
You don’t have to submit your answers to these questions, but you should try to answer them before you start writing the code because they will help you understand how to do so.
- Why is the
streets
attribute in theGeocoder
class aMap<String,Street>
rather than some other kind of collection?
- Why is the
segments
attribute in theStreet
class aList<Segment>
rather than some other kind of collection?
- Why does the
readSegments()
method in theMapReader
class return aMap<String,Segment>
rather than some other kind of collection?
Can a defect in a test (e.g., an incorrect expected value, an inappropriate tolerance) make you think code is correct when it isnt?
What should you do when your code fails an integration tests after passing your unit tests?