PA - CSV Parser


Introduction

Comma-Separated Value (CSV) files are text files representing a collection of records, each with zero or more fields. Each line is a record. The fields are separated by a single-character delimiter, typically a comma (hence the name of the format). CSV files are common; for example, you can generate them from Excel spreadsheets.

Your assignment is to write a Ruby script containing a method that takes a delimiter character and a path to a CSV file, and returns an array whose elements are themselves arrays. These inner arrays are the records from each line of the CSV file. Each inner array contains strings, and each string is a field from a line of the CSV file.

The CSV File Format

As noted, a CSV file has one record per line, and the fields of the record are separated by a delimiter character. The delimiter can be any character except a double-quote ("). The fields between the delimiters and the beginning and end of a line may not contain the delimiter or the double-quote, and any whitespace at the beginning and end of a field is removed.

However, if double-quotes, starting or ending whitespace, or delimiters are desired in a field, a field may begin and end with double-quotes. Between the double-quotes, spaces, delimiters, and a pair of doublequotes may appear freely (in this context, a pair of double-quotes represent a single double-quote character). For example, if we wanted a field to contain the string

˽Say˽"Who,˽me?"

where "˽" represents a hard space, it can appear in the CSV file as

" Say ""Who, me?"""

Empty fields appear as nothing or whitespace between delimiters.

You can read more about the CSV file format at the Wikipedia article.

The following table shows a short CSV file and the Ruby array that should result when it is processed (note that the array formatting has been modified slightly from the default Ruby output).

    Student, H1, H2, Total, Notes
    "Faraday, Michael", 10, 10, 20,
    "Watt, James", 9, 10, 20, Frequently absent
    "Newton, Isaac", 3, 6, 9,

    [
      ["Student", "H1", "H2", "Total", "Notes"],
      ["Faraday, Michael", "10", "10", "20", ""],
      ["Watt, James", "9", "10", "20", "Frequently absent"]
      ["Newton, Isaac", "3", "6", "9", ""]
    ]

Ruby File I/O

As with everything, there are "101 ways" to read files in Ruby. Here is one way that is sufficient for this assignment:

    file = File.Open(file_path,"r")
    file.each_line do |line|
      # process one line (\n is removed)
    end
    file.close

Deliverable Requirements

Your program must contain a method parse_CSV_file(delim, file_name) that takes a delimiter character and a path to a CSV file, and returns an array whose elements are themselves arrays.

Please name your ruby script file "csv_parser.rb". Be sure to put your name in a comment at the top of the file.

You may not use the CSV class that is part of the Ruby standard library; you must write the parsing routines yourself.

As always, you should decompose long methods to make them more readable, use good variable and function names, indent properly (indentation is conventionally two spaces in Ruby), use consistent spacing, and so on. You will partly be graded on the readability of your code and partly on whether it works.

For this assignment you do not need to worry about handling errors. If the file does not exist, it is ok for the script to fail. If the input file does not contain legitimate CSV-formatted data, it is ok for the method to fail or to return garbage.

Submit your Ruby script file on Canvas by 11:59 pm on Friday, January 30, 2015.

Acknowledgement

This PA was originally designed by Dr. Chris Fox; much of the wording on this page was originally his.