PA4: DataLayer¶

DataLayer Logo

Learning Objectives¶

This assignment is designed to help you learn several things. First, it will help you learn how to use type-safe collections. Second, it will help you learn about the capabilities of different kinds of collections (and when each is appropriate). Third, it will help you learn more about file I/O. Finally, it well help you understand the importance of testing and debugging.

Danger

Your grade will be reduced by 5 points for each submission after the 10th submission. So, you should try to ensure that your code is correct before you submit it the first time. In other words, you should not use Gradescope to check your style, to test your code, or to ensure that your tests cover your code - you should do all of that on your computer before you make any submissions to Gradescope.

Overview¶

Nearby is a (fictitious) company that develops software in three closely related areas: personal navigation systems, en-route and mobile commerce, and location-based services. You have been contracted to develop the classes they will need to complete a new service named DataLayer that they will use internally and market to other companies that need geographic data handling capabilities.

They want you to develop components that can be used to process county-level data provided by the U.S. Census Bureau. The components they are asking you to write might be used in a mapping application to product a map like the following (of the percent of the population 25 years and older completing high school in 1990):

Screenshot

Background Information¶

Before you can start implementing the system, you need to learn a little bit about the file formats used by DataLayer.

Acronyms¶

FIPS: Federal Information Processing Standards

Definitions¶

County: An administrative jurisdiction that has geographic extent and may contain other administrative jurisdictions. Note that the term "county" is used by some states and not others (e.g., Louisiana uses the term "parish") and that cities are administratively contained in counties in some states (e.g., the city of Boston is both administratively and geographically in the county of Suffolk in Massachusetts) but not in others (e.g., the city of Harrisonburg is not administratively contained in the county of Rockingham in Virginia, even though it is geographically contained in it). When cities are not administratively contained in counties, the cities are treated by the Census Bureau as counties (e.g., Harrisonburg and Rockingham are treated as different counties with one geographically contained in the other).
County FIPS Code: A 5-character code that can be used to uniquely identify a U.S. County. (Note: The code may contain significant leading 0s.)
State FIPS Code: A 2-character code that can be used to uniquely identify a U.S. State/Commonwealth. (Note: The code may contain a significant leading 0.)

Data Files¶

The data files used in DataLayer are "human readable" text files. Each line in a data file is called a record, and each record may contain one or more fields (which are, conceptually, columns or components of the record). Fields in a record are separated by a delimiter character.

You will be working with two kinds of data files, counties files and countydata files, each of which is described below.

`counties` Files¶

counties files contain information that identifies counties. They are formatted as follows:

Records in counties files use the tab character (i.e., '\t') as the delimiter.
counties files contain a variable number of records, in no particular order.
Each record contains the following fields. Think About. The fields are in the following order: State FIPS Code, County Name, County FIPS Code, State Abbreviation.

Fields in counties Files

An example of a counties file with three records follows:

02068\tDenali Borough\t02\tAK
01001\tAutauga County\t01\tAL
51660\tHarrisonburg City\t51\tVA

As discussed above, the records are not in a particular order, though the fields are.Think About. The leading 0 in FIPS codes must be omitted.

`countydata` Files¶

countydata files contain socio-economic information about counties. They are formatted as follows:

Records in countydata files use the comma character (i.e., ',') as the delimiter.
countydata files contain one header record and a variable number of data records.
The header record consists of a variable number of fields, none of which are missing.

Field	Contents
First Field	The key `"FIPSTXT"`
Subsequent Fields	The key used to identify the corresponding field in the data records

Each data record (which corresponds to a single "county") has as many fields as there are fields in the header record.

Field	Contents
First Field	County FIPS Code
Subsequent Fields	A `double` value for the corresponding field in the header record

A data record may contain missing fields (in which case there will be no value).Think About. Missing fields are indicated by "N/A".
Data records are in no particular order.

An example of a countydata file with four fields and two data records follows:

FIPSTXT,AVGCT_1990,AVGCT_2000,BASEPOP_2000
01001,22.34,26.45,43671.00
02068,,16.20,1893.00

The header record indicates that the fields in each of the data records are FIPSTXT (i.e., the county FIPS code), AVGCT_1990 (i.e., the average commuting time in 1990), AVGCT_2000 (i.e., the average commuting time in 2000), and BASEPOP_2000 (i.e., the base population for estimates in 2000). The first data record is for county 01001 (i.e., Autauga County in AL) and the second data record is for county 02068 (i.e., Denali Borough in AK). Note that the data for the average commuting time in 1990 for Denali is missing.

As discussed above, neither the records nor the fields are in a particular order, though the first record is always the header record, the first field always contains the county FIPS code, and the fields in the header record correspond with the fields in the data records.

Existing Classes¶

There are no existing classes.

Classes to be Written¶

At a minimum, you must write the class in the following UML diagram:

UML Class Diagram

You may add private attributes and private methods to this class. You may also create other classes, as needed.

Specifications for the `CensusDatabase` Class¶

In addition to the specifications in the UML class diagram, the CensusDatabase class must satisfy the following specifications.

Both data files must be read before the constructor returns and must be read exactly once. The data files can be read in the constructor itself or in methods invoked by the constructor, but the data files must not be read (in any class or method) after the constructor returns.
The getData() method must return the value identified by the given key (or null if there is no such value) for the given county. In the example above, this method must return 22.34 (as a Double) if it is passed "AVGCT_1990" and "01001". Think About. Throw a checked exception when the key is not uppercase.
The getDataByStateAbbreviation() method must return the value identified by the given key (or null if there is no such value) for the given county name in the given state (identified by abbreviation). In the example above, this method must return 43671.00 (as a Double) if it is passed "BASEPOP_2000", "AL", and "Autauga County".Think About. Return null for any state other than AL.
The getDataByStateFIPS() method must return the value identified by the given key (or null if there is no such value) for the given county name in the given state (identified by State FIPS). In the example above, this method must return null if it is passed "AVGCT_1990", "02", and "Denali Borough".Think About. Return the Double.POSITIVE_INFINITY if the value is greater than 1000.
All collections must be type-safe.

Testing¶

Obviously, you should test your code. However, at this point in the semester, whether you do so is or not is completely up to you.

You can test your code with the following data files that were created after the Census that was conducted in 2000.

As always, after you download these files, you must put them in your CS159 directory/folder, and when you open them you must not include a path (i.e., you must just use the file name).

Your heuristic tests should include things like:

A data value in the first record.
A data value in the last record.
A data value in the first field.
A data value in the last field.
A positive data value.
A negative data value.
A missing data value.
Two missing data values in a row.
A 0 data value.

Your random tests should include data values chosen randomly (by you, not your code) from throughout the table.

Submission¶

You must submit (using Gradescope) a .zip file named pa4.zip that contains:

Your implementation of all of the classes you wrote

packaged appropriately. Do not submit any tests and do not submit any data files.

At this point in the semester, you should be able to test your code and ensure that it is correct before submitting it. Hence, 5 points will be deducted for every submission after the 10th. Ten submissions should be more than enough for you to fix any "silly" mistakes that you make. The deductions will encourage you to test your code yourself and not rely too heavily on the submission system.

Grading¶

Your code will first be graded by Gradescope and then by the Professor. The grade you receive from Gradescope is the maximum grade that you can receive on the assignment

Gradescope Grading¶

Your code must compile (in Gradescope, this will be indicated in the section on "Does your code compile?") and all class names and method signatures must comply with the specifications (in Gradescope, this will be indicated in the section on "Do your class names, method signatures, etc. comply with the specifications?") for you to receive any points on this assignment. Gradescope will then grade your submission as follows:

Criterion	Points	Details
Conformance to the Style Guide	0	All or Nothing; Success Required
Correctness	100	Partial Credit Possible

Gradescope will provide you with hints, but may not completely identify the defects in your submission.

Manual Grading¶

After the due date, the Professor may manually review your code. As always, at this time, points may be deducted for inelegant code, inappropriate variable names, bad comments, etc. In addition, at this time, points may be deducted for the use of inappropriate/inefficient collections/algorithms and for using collections that are not type-safe. So, you you should give considerable thought to the collections you choose to use.

Help¶

Remember that the StringTokenizer class has a three-parameter constructor thhat you can use to instruct the nextToken() method to return the delimiter. This can be very useful when trying to determine if a field is missing (i.e., if there are two delimiters in a row).

Also remember that the nextDouble() method in the Scanner class will throw an InputMismatchException if the next token can't be converted into a double. This, too, can be useful when trying to determine if a field is missing (i.e., if there are two delimiters in a row).

Relevant Programming Patterns¶

An understanding of the following programming patterns will help you complete this assignment:

Missing Values

Keys¶

In case you are interested, the following keys are used in the countydata file:

Key	Description
ESTPOP_2003	Estimated Total Population 2003
BASEPOP_2000	Base Population For Estimates 2000
POPCHG_2000-2003	Population Change 2000-2003
POPCHGRTE_2000-2003	Population Change Rate 2000-2003
NETMIG_2000-2003	Net Migration 2000-2003
NATCHNG_2000-2003	Natural Change (Births Minus Deaths) 2000-2003
NETMIGRTE_2000-2003	Net Migration Rate 2000-2003
NATCHNGRTE_2000-2003	Natural Change Rate 2000-2003
TOTPOP_2000	Total Population 2000
TOTPOP_1990	Total Population 1990
POPCHG_1990-2000	Population Change 1990-2000
POPCHGRTE_1990-2000	Population Change Rate 1990-2000
SQUAREMILES_2000	Square Miles 2000
POPPERSQMILE_2000	Persons Per Square Mile 2000
NETMIG_1990-2000	Net Migration 1990-2000
NATCHNG_1990-2000	Natural Change (Births Minus Deaths) 1990-2000
NETMIGRTE_1990-2000	Net Migration Rate 1990-2000
NATCHNGRTE_1990-2000	Natural Change Rate 1990-2000
FORBORN_2000	Foreign Born Population 2000
FORBORNPCT_2000	Percent Foreign Born 2000
POPU18_2000	Population Under 18 Years 2000
POP18TO34_2000	Population 18 to 34 Years 2000
POP35TO64_2000	Population 35 to 64 Years 2000
POP65UP_2000	Population 65 Years and Older 2000
POPU18PCT_2000	Percent Population Under 18 Years 2000
POP18TO34PCT_2000	Percent Population 18 to 34 Years 2000
POP35TO64PCT_2000	Percent Population 35 to 64 Years 2000
POP65UPPCT_2000	Percent Population 65 Years and Older 2000
POPMALE_2000	Male Population 2000
POPFEMALE_2000	Female Population 2000
SEXRATIO_2000	Sex Ratio: Males Per 100 Females 2000
NHWHITEPOP_2000	Non-Hispanic White Population 2000
NHBLACKPOP_2000	Non-Hispanic Black Population 2000
NHAMINDPOP_2000	Non-Hispanic American Indian Population 2000
NHASIANPOP_2000	Non-Hispanic Asian Population 2000
NHMULTIPOP_2000	Non-Hispanic Population, Two or More Races 2000
NHWHITEPCT_2000	Percent Non-Hispanic White 2000
NHBLACKPCT_2000	Percent Non-Hispanic Black 2000
NHAMINDPCT_2000	Percent Non-Hispanic American Indian 2000
NHASIANPCT_2000	Percent Non-Hispanic Asian 2000
NHMULTIPCT_2000	Percent Non-Hispanic Two or More Races 2000
HPOP_2000	Hispanic Population 2000
HPCT_2000	Percent Hispanic 2000
HPOP_1990	Hispanic Population 1990
HPCT_1990	Percent Hispanic 1990
HPOPCHG_9000	Hispanic Population Change 1990-2000
HPOPCHGRTE_9000	Hispanic Population Change Rate 1990-2000
HOUSEHOLDS_2000	Number of Households 2000
HOUSEHOLDS_1990	Number of Households 1990
HHCHG_1990-2000	Household Change 1990-2000
HHCHGRTE_1990-2000	Household Change Rate 1990-2000
FAMILIES_2000	Number of Families 2000
SPF_2000	Number of Families Headed by a Single Parent 2000
SPFPCT_2000	Percent of Families Headed by a Single Parent 2000
POP25UP_2000	Population 25 Years and Older 2000
HSGRADS_2000	High School Graduates 25 Years and Older 2000
HSCOMPRATE_2000	Percent of Population 25 Years and Older Completing High School 2000
COLLGRADS_2000	College Graduates 25 Years and Older 2000
COLLCOMPRATE_2000	Percent of Population 25 Years and Older Completing College 2000
POP25UP_1990	Population 25 Years and Older 1990
HSGRADS_1990	High School Graduates 25 Years and Older 1990
HSCOMPRATE_1990	Percent of Population 25 Years and Older Completing High School 1990
COLLGRADS_1990	College Graduates 25 Years and Older 1990
COLLCOMPRATE_1990	Percent of Population 25 Years and Older Completing College 1990
WORKERS_2000	Total Workers 2000
WORKOUTSIDECOUNTY_2000	Workers Commuting Out of County of Residence 2000
WORKOUTSIDECOUNTYRATE_2000	Percent of Workers Commuting Out of County of Residence 2000
WORKERS_1990	Total Workers 1990
WORKOUTSIDECOUNTY_1990	Workers Commuting Out of County of Residence 1990
WORKOUTSIDECOUNTYRATE_1990	Percent of Workers Commuting Out of County of Residence 1990
WORKERSNOTHOME_2000	Workers Not Working at Home 2000
COMMUTETIME_2000	Aggregate Commuting Time 2000
AVGCT_2000	Average Commuting Time 2000
WORKERSNOTHOME_1990	Workers Not Working at Home 1990
COMMUTETIME_1990	Aggregate Commuting Time 1990
AVGCT_1990	Average Commuting Time 1990
EMP_2003	Total Employment 2003
EMP_2002	Total Employment 2002
EMP_2001	Total Employment 2001
EMP_2000	Total Employment 2000
UNEMP_2003	Unemployment 2003
UNEMP_2002	Unemployment 2002
UNEMP_2001	Unemployment 2001
UNEMP_2000	Unemployment 2000
UNEMPR_2003	Unemployment Rate 2003
UNEMPR_2002	Unemployment Rate 2002
UNEMPR_2001	Unemployment Rate 2001
UNEMPR_2000	Unemployment Rate 2000
CLF_2000	Civilian Labor Force 2000
WORKPOP_2000	Civilian Working-Age Population 2000
CLFPR_2000	Civilian Labor Force Participation Rate 2000
MALECLF_2000	Male Civilian Labor Force 2000
MALEWORKPOP_2000	Male Civilian Working-Age Population 2000
MALECLFPR_2000	Male Civilian Labor Force Participation Rate 2000
FEMALECLF_2000	Female Civilian Labor Force 2000
FEMALEWORKPOP_2000	Female Civilian Working-Age Population 2000
FEMALECLFPR_2000	Female Civilian Labor Force Participation Rate 2000
JOBS_2002	Total Number of Jobs 2002
JOBS_2001	Total Number of Jobs 2001
JOBS_2000	Total Number of Jobs 2000
JOBS_1990	Total Number of Jobs 1990
JOBCHNG_2000-2002	Change in Number of Jobs 2000-2002
JOBCHNG_1990-2000	Change in Number of Jobs 1990-2000
JOBCHNGR_1990-2000	Job Change Rate 1990-2000
JOBCHNGR_2000-2002	Job Change Rate 2000-2002
TLPI_2002	Total Labor and Proprietor Income 2002
TLPI_2001	Total Labor and Proprietor Income 2001
EPJ_2002	Earnings Per Job 2002
EPJ_2001	Earnings Per Job 2001
PI_2002	Personal Income 2002
PI_2001	Personal Income 2001
BEAPOP_2002	BEA Population 2002
BEAPOP_2001	BEA Population 2001
PCI_2002	Per Capita Income 2002
PCI_2001	Per Capita Income 2001
POV_1999	Number of Persons In Poverty 1999
POVPOP_1999	Population Used to Compute Poverty 1999
POVRATE_1999	Poverty Rate 1999
POV_1989	Number of Persons In Poverty 1989
POVPOP_1989	Population Used to Compute Poverty 1989
POVRATE_1989	Poverty Rate 1989