PA4: DataLayer¶
Learning Objectives¶
This assignment is designed to help you learn several things. First, it will help you learn how to use type-safe collections. Second, it will help you learn about the capabilities of different kinds of collections (and when each is appropriate). Third, it will help you learn more about file I/O. Finally, it well help you understand the importance of testing and debugging.
Danger
Your grade will be reduced by 5 points for each submission after the 10th submission. So, you should try to ensure that your code is correct before you submit it the first time. In other words, you should not use Gradescope to check your style, to test your code, or to ensure that your tests cover your code - you should do all of that on your computer before you make any submissions to Gradescope.
Overview¶
Nearby is a (fictitious) company that develops software in three closely related areas: personal navigation systems, en-route and mobile commerce, and location-based services. You have been contracted to develop the classes they will need to complete a new service named DataLayer that they will use internally and market to other companies that need geographic data handling capabilities.
They want you to develop components that can be used to process county-level data provided by the U.S. Census Bureau. The components they are asking you to write might be used in a mapping application to product a map like the following (of the percent of the population 25 years and older completing high school in 1990):
Background Information¶
Before you can start implementing the system, you need to learn a little bit about the file formats used by DataLayer.
Acronyms¶
- FIPS
- Federal Information Processing Standards
Definitions¶
- County
- An administrative jurisdiction that has geographic extent and may contain other administrative jurisdictions. Note that the term "county" is used by some states and not others (e.g., Louisiana uses the term "parish") and that cities are administratively contained in counties in some states (e.g., the city of Boston is both administratively and geographically in the county of Suffolk in Massachusetts) but not in others (e.g., the city of Harrisonburg is not administratively contained in the county of Rockingham in Virginia, even though it is geographically contained in it). When cities are not administratively contained in counties, the cities are treated by the Census Bureau as counties (e.g., Harrisonburg and Rockingham are treated as different counties with one geographically contained in the other).
- County FIPS Code
- A 5-character code that can be used to uniquely identify a U.S. County. (Note: The code may contain significant leading 0s.)
- State FIPS Code
- A 2-character code that can be used to uniquely identify a U.S. State/Commonwealth. (Note: The code may contain a significant leading 0.)
Data Files¶
The data files used in DataLayer are "human readable" text files. Each line in a data file is called a record, and each record may contain one or more fields (which are, conceptually, columns or components of the record). Fields in a record are separated by a delimiter character.
You will be working with two kinds of data files, counties
files and countydata
files, each of which is described below.
counties
Files¶
counties
files contain information that identifies counties. They are formatted as follows:
-
Records in
counties
files use the tab character (i.e.,'\t'
) as the delimiter. -
counties
files contain a variable number of records, in no particular order. -
Each record contains the following fields. Think About. The fields are in the following order: State FIPS Code, County Name, County FIPS Code, State Abbreviation.
An example of a counties
file with three records follows:
02068\tDenali Borough\t02\tAK
01001\tAutauga County\t01\tAL
51660\tHarrisonburg City\t51\tVA
As discussed above, the records are not in a particular order, though the fields are.Think About. The leading 0 in FIPS codes must be omitted.
countydata
Files¶
countydata
files contain socio-economic information about counties. They are formatted as follows:
-
Records in
countydata
files use the comma character (i.e.,','
) as the delimiter. -
countydata
files contain one header record and a variable number of data records. -
The header record consists of a variable number of fields, none of which are missing.
Field | Contents |
---|---|
First Field | The key "FIPSTXT" |
Subsequent Fields | The key used to identify the corresponding field in the data records |
- Each data record (which corresponds to a single "county") has as many fields as there are fields in the header record.
Field | Contents |
---|---|
First Field | County FIPS Code |
Subsequent Fields | A double value for the corresponding field in the header record |
-
A data record may contain missing fields (in which case there will be no value).Think About. Missing fields are indicated by "N/A".
-
Data records are in no particular order.
An example of a countydata
file with four fields and two data records follows:
FIPSTXT,AVGCT_1990,AVGCT_2000,BASEPOP_2000
01001,22.34,26.45,43671.00
02068,,16.20,1893.00
FIPSTXT
(i.e., the county FIPS code), AVGCT_1990
(i.e., the average commuting time in 1990), AVGCT_2000
(i.e., the
average commuting time in 2000), and BASEPOP_2000
(i.e., the base
population for estimates in 2000). The first data record is for county
01001
(i.e., Autauga County in AL) and the second data record is for
county 02068
(i.e., Denali Borough in AK). Note that the data for
the average commuting time in 1990 for Denali is missing.
As discussed above, neither the records nor the fields are in a particular order, though the first record is always the header record, the first field always contains the county FIPS code, and the fields in the header record correspond with the fields in the data records.
Existing Classes¶
There are no existing classes.
Classes to be Written¶
At a minimum, you must write the class in the following UML diagram:
You may add private attributes and private methods to this class. You may also create other classes, as needed.
Specifications for the CensusDatabase
Class¶
In addition to the specifications in the UML class diagram, the CensusDatabase
class must satisfy the following specifications.
-
Both data files must be read before the constructor returns and must be read exactly once. The data files can be read in the constructor itself or in methods invoked by the constructor, but the data files must not be read (in any class or method) after the constructor returns.
-
The
getData()
method must return the value identified by the givenkey
(ornull
if there is no such value) for the given county. In the example above, this method must return22.34
(as aDouble
) if it is passed"AVGCT_1990"
and"01001"
. Think About. Throw a checked exception when the key is not uppercase. -
The
getDataByStateAbbreviation()
method must return the value identified by the givenkey
(ornull
if there is no such value) for the given county name in the given state (identified by abbreviation). In the example above, this method must return43671.00
(as aDouble
) if it is passed"BASEPOP_2000"
,"AL"
, and"Autauga County"
.Think About. Return null for any state other than AL. -
The
getDataByStateFIPS()
method must return the value identified by the givenkey
(ornull
if there is no such value) for the given county name in the given state (identified by State FIPS). In the example above, this method must returnnull
if it is passed"AVGCT_1990"
,"02"
, and"Denali Borough"
.Think About. Return the Double.POSITIVE_INFINITY if the value is greater than 1000. -
All collections must be type-safe.
Testing¶
Obviously, you should test your code. However, at this point in the semester, whether you do so is or not is completely up to you.
You can test your code with the following data files that were created after the Census that was conducted in 2000.
As always, after you download these files, you must put them in your CS159
directory/folder, and when you open them you must not include a path
(i.e., you must just use the file name).
Your heuristic tests should include things like:
- A data value in the first record.
- A data value in the last record.
- A data value in the first field.
- A data value in the last field.
- A positive data value.
- A negative data value.
- A missing data value.
- Two missing data values in a row.
- A 0 data value.
Your random tests should include data values chosen randomly (by you, not your code) from throughout the table.
Submission¶
You must submit (using Gradescope) a .zip
file named pa4.zip
that contains:
- Your implementation of all of the classes you wrote
packaged appropriately. Do not submit any tests and do not submit any data files.
At this point in the semester, you should be able to test your code and ensure that it is correct before submitting it. Hence, 5 points will be deducted for every submission after the 10th. Ten submissions should be more than enough for you to fix any "silly" mistakes that you make. The deductions will encourage you to test your code yourself and not rely too heavily on the submission system.
Grading¶
Your code will first be graded by Gradescope and then by the Professor. The grade you receive from Gradescope is the maximum grade that you can receive on the assignment
Gradescope Grading¶
Your code must compile (in Gradescope, this will be indicated in the section on "Does your code compile?") and all class names and method signatures must comply with the specifications (in Gradescope, this will be indicated in the section on "Do your class names, method signatures, etc. comply with the specifications?") for you to receive any points on this assignment. Gradescope will then grade your submission as follows:
Criterion | Points | Details |
---|---|---|
Conformance to the Style Guide | 0 | All or Nothing; Success Required |
Correctness | 100 | Partial Credit Possible |
Gradescope will provide you with hints, but may not completely identify the defects in your submission.
Manual Grading¶
After the due date, the Professor may manually review your code. As always, at this time, points may be deducted for inelegant code, inappropriate variable names, bad comments, etc. In addition, at this time, points may be deducted for the use of inappropriate/inefficient collections/algorithms and for using collections that are not type-safe. So, you you should give considerable thought to the collections you choose to use.
Help¶
Remember that the StringTokenizer
class has a three-parameter
constructor thhat you can use to instruct the nextToken()
method to
return the delimiter. This can be very useful when trying to determine
if a field is missing (i.e., if there are two delimiters in a row).
Also remember that the nextDouble()
method in the Scanner
class
will throw an InputMismatchException
if the next token can't be
converted into a double
. This, too, can be useful when trying
to determine if a field is missing (i.e., if there are two delimiters
in a row).
Relevant Programming Patterns¶
An understanding of the following programming patterns will help you complete this assignment:
Keys¶
In case you are interested, the following keys are used in the
countydata
file:
Key | Description |
---|---|
ESTPOP_2003 | Estimated Total Population 2003 |
BASEPOP_2000 | Base Population For Estimates 2000 |
POPCHG_2000-2003 | Population Change 2000-2003 |
POPCHGRTE_2000-2003 | Population Change Rate 2000-2003 |
NETMIG_2000-2003 | Net Migration 2000-2003 |
NATCHNG_2000-2003 | Natural Change (Births Minus Deaths) 2000-2003 |
NETMIGRTE_2000-2003 | Net Migration Rate 2000-2003 |
NATCHNGRTE_2000-2003 | Natural Change Rate 2000-2003 |
TOTPOP_2000 | Total Population 2000 |
TOTPOP_1990 | Total Population 1990 |
POPCHG_1990-2000 | Population Change 1990-2000 |
POPCHGRTE_1990-2000 | Population Change Rate 1990-2000 |
SQUAREMILES_2000 | Square Miles 2000 |
POPPERSQMILE_2000 | Persons Per Square Mile 2000 |
NETMIG_1990-2000 | Net Migration 1990-2000 |
NATCHNG_1990-2000 | Natural Change (Births Minus Deaths) 1990-2000 |
NETMIGRTE_1990-2000 | Net Migration Rate 1990-2000 |
NATCHNGRTE_1990-2000 | Natural Change Rate 1990-2000 |
FORBORN_2000 | Foreign Born Population 2000 |
FORBORNPCT_2000 | Percent Foreign Born 2000 |
POPU18_2000 | Population Under 18 Years 2000 |
POP18TO34_2000 | Population 18 to 34 Years 2000 |
POP35TO64_2000 | Population 35 to 64 Years 2000 |
POP65UP_2000 | Population 65 Years and Older 2000 |
POPU18PCT_2000 | Percent Population Under 18 Years 2000 |
POP18TO34PCT_2000 | Percent Population 18 to 34 Years 2000 |
POP35TO64PCT_2000 | Percent Population 35 to 64 Years 2000 |
POP65UPPCT_2000 | Percent Population 65 Years and Older 2000 |
POPMALE_2000 | Male Population 2000 |
POPFEMALE_2000 | Female Population 2000 |
SEXRATIO_2000 | Sex Ratio: Males Per 100 Females 2000 |
NHWHITEPOP_2000 | Non-Hispanic White Population 2000 |
NHBLACKPOP_2000 | Non-Hispanic Black Population 2000 |
NHAMINDPOP_2000 | Non-Hispanic American Indian Population 2000 |
NHASIANPOP_2000 | Non-Hispanic Asian Population 2000 |
NHMULTIPOP_2000 | Non-Hispanic Population, Two or More Races 2000 |
NHWHITEPCT_2000 | Percent Non-Hispanic White 2000 |
NHBLACKPCT_2000 | Percent Non-Hispanic Black 2000 |
NHAMINDPCT_2000 | Percent Non-Hispanic American Indian 2000 |
NHASIANPCT_2000 | Percent Non-Hispanic Asian 2000 |
NHMULTIPCT_2000 | Percent Non-Hispanic Two or More Races 2000 |
HPOP_2000 | Hispanic Population 2000 |
HPCT_2000 | Percent Hispanic 2000 |
HPOP_1990 | Hispanic Population 1990 |
HPCT_1990 | Percent Hispanic 1990 |
HPOPCHG_9000 | Hispanic Population Change 1990-2000 |
HPOPCHGRTE_9000 | Hispanic Population Change Rate 1990-2000 |
HOUSEHOLDS_2000 | Number of Households 2000 |
HOUSEHOLDS_1990 | Number of Households 1990 |
HHCHG_1990-2000 | Household Change 1990-2000 |
HHCHGRTE_1990-2000 | Household Change Rate 1990-2000 |
FAMILIES_2000 | Number of Families 2000 |
SPF_2000 | Number of Families Headed by a Single Parent 2000 |
SPFPCT_2000 | Percent of Families Headed by a Single Parent 2000 |
POP25UP_2000 | Population 25 Years and Older 2000 |
HSGRADS_2000 | High School Graduates 25 Years and Older 2000 |
HSCOMPRATE_2000 | Percent of Population 25 Years and Older Completing High School 2000 |
COLLGRADS_2000 | College Graduates 25 Years and Older 2000 |
COLLCOMPRATE_2000 | Percent of Population 25 Years and Older Completing College 2000 |
POP25UP_1990 | Population 25 Years and Older 1990 |
HSGRADS_1990 | High School Graduates 25 Years and Older 1990 |
HSCOMPRATE_1990 | Percent of Population 25 Years and Older Completing High School 1990 |
COLLGRADS_1990 | College Graduates 25 Years and Older 1990 |
COLLCOMPRATE_1990 | Percent of Population 25 Years and Older Completing College 1990 |
WORKERS_2000 | Total Workers 2000 |
WORKOUTSIDECOUNTY_2000 | Workers Commuting Out of County of Residence 2000 |
WORKOUTSIDECOUNTYRATE_2000 | Percent of Workers Commuting Out of County of Residence 2000 |
WORKERS_1990 | Total Workers 1990 |
WORKOUTSIDECOUNTY_1990 | Workers Commuting Out of County of Residence 1990 |
WORKOUTSIDECOUNTYRATE_1990 | Percent of Workers Commuting Out of County of Residence 1990 |
WORKERSNOTHOME_2000 | Workers Not Working at Home 2000 |
COMMUTETIME_2000 | Aggregate Commuting Time 2000 |
AVGCT_2000 | Average Commuting Time 2000 |
WORKERSNOTHOME_1990 | Workers Not Working at Home 1990 |
COMMUTETIME_1990 | Aggregate Commuting Time 1990 |
AVGCT_1990 | Average Commuting Time 1990 |
EMP_2003 | Total Employment 2003 |
EMP_2002 | Total Employment 2002 |
EMP_2001 | Total Employment 2001 |
EMP_2000 | Total Employment 2000 |
UNEMP_2003 | Unemployment 2003 |
UNEMP_2002 | Unemployment 2002 |
UNEMP_2001 | Unemployment 2001 |
UNEMP_2000 | Unemployment 2000 |
UNEMPR_2003 | Unemployment Rate 2003 |
UNEMPR_2002 | Unemployment Rate 2002 |
UNEMPR_2001 | Unemployment Rate 2001 |
UNEMPR_2000 | Unemployment Rate 2000 |
CLF_2000 | Civilian Labor Force 2000 |
WORKPOP_2000 | Civilian Working-Age Population 2000 |
CLFPR_2000 | Civilian Labor Force Participation Rate 2000 |
MALECLF_2000 | Male Civilian Labor Force 2000 |
MALEWORKPOP_2000 | Male Civilian Working-Age Population 2000 |
MALECLFPR_2000 | Male Civilian Labor Force Participation Rate 2000 |
FEMALECLF_2000 | Female Civilian Labor Force 2000 |
FEMALEWORKPOP_2000 | Female Civilian Working-Age Population 2000 |
FEMALECLFPR_2000 | Female Civilian Labor Force Participation Rate 2000 |
JOBS_2002 | Total Number of Jobs 2002 |
JOBS_2001 | Total Number of Jobs 2001 |
JOBS_2000 | Total Number of Jobs 2000 |
JOBS_1990 | Total Number of Jobs 1990 |
JOBCHNG_2000-2002 | Change in Number of Jobs 2000-2002 |
JOBCHNG_1990-2000 | Change in Number of Jobs 1990-2000 |
JOBCHNGR_1990-2000 | Job Change Rate 1990-2000 |
JOBCHNGR_2000-2002 | Job Change Rate 2000-2002 |
TLPI_2002 | Total Labor and Proprietor Income 2002 |
TLPI_2001 | Total Labor and Proprietor Income 2001 |
EPJ_2002 | Earnings Per Job 2002 |
EPJ_2001 | Earnings Per Job 2001 |
PI_2002 | Personal Income 2002 |
PI_2001 | Personal Income 2001 |
BEAPOP_2002 | BEA Population 2002 |
BEAPOP_2001 | BEA Population 2001 |
PCI_2002 | Per Capita Income 2002 |
PCI_2001 | Per Capita Income 2001 |
POV_1999 | Number of Persons In Poverty 1999 |
POVPOP_1999 | Population Used to Compute Poverty 1999 |
POVRATE_1999 | Poverty Rate 1999 |
POV_1989 | Number of Persons In Poverty 1989 |
POVPOP_1989 | Population Used to Compute Poverty 1989 |
POVRATE_1989 | Poverty Rate 1989 |