PA - JSON Parser


Introduction

JavaScript Object Notation (JSON) is a standard format for data interchange and a popular alternative to Extensible Markup Language (XML). JSON is more complex than CSV (the format you used in the previous programming assignment), but not as complex as full XML. A JSON document is based on a few simple data types, arrays, and key-value pairs, supporting a wide variety of applications with a minimal amount of syntax.

Your assignment is to write a Ruby script to parse JSON text, returning Ruby objects representing the same data. See the section below for a description of each JSON construct and the corresponding Ruby construct. Please read the entire specification carefully and study the diagrams closely to avoid misunderstandings. If any part of the specification seems unclear to you, please post on Piazza or come to office hours.

You may find it helpful to begin by reviewing your completed code for the CSV parsing assignment and thinking about how it could be modified to accomodate the various JSON constructs. However, once you have a plan for tackling this assignment, I do recommend that you begin the implementation for this project from scratch to avoid errors in copying and modifying.

The JSON Format

JSON files are stored in plain Unicode text using special formatting to delimit various data constructs. Each of these constructs is described below. Each construct is accompanied by an image (from the official JSON website) of a state machine describing the structure as it appears in JSON code.

Here is an example JSON file for illustration purposes:

    {
      "firstName": "John",
      "lastName": "Smith",
      "isAlive": true,
      "age": 25,
      "height_cm": 167.6,
      "address": {
        "streetAddress": "21 2nd Street",
        "city": "New York",
        "state": "NY",
        "postalCode": "10021-3100"
      },
      "phoneNumbers": [
        {
          "type": "home",
          "number": "212 555-1234"
        },
        {
          "type": "office",
          "number": "646 555-4567"
        }
      ],
      "children": [],
      "spouse": null
    }

Object

A JSON object is an unordered set of key-value pairs, enclosed by braces and delimited using commas. Each key-value pair is separated using a colon. Keys must be strings, but values can be one of several possibilities (see the "Value" section below).

JSON Object

When parsed, the object should be represented as a Ruby Hash object, with each JSON key-value pair stored as key-value entries in the Hash. In the example above, there are four objects: the top-level object representing the personal information for John Smith, a subobject representing an address, and two other subobjects representing phone number entries.

For the purposes of this assignment, every JSON file should contain exactly one top-level object, although that object may contain many other objects as values in the key-value set.

Array

A JSON array is an ordered collection of values, enclosed by brackets and delimited using commas. The elements of the array may be of any valid JSON value type (see below).

JSON Array

When parsed, the object should be represented as a Ruby Array object, with the values stored using their appropriate representations in the array. In the example above, there are two arrays: the collection of phone numbers and the (empty) collection of children.

Value

A JSON value can be a variety of different things: a string, a number, an object, an array, or one of three literals ("true", "false", or "null").

JSON Value

When parsed, the value should be represented using a Ruby object as appropriate for the JSON entity. The first four possibilities (strings, numbers, objects, and arrays) should be represented using their Ruby equivalents as described in other sections. The remaining three possibilities should be represented using their corresponding Ruby singleton objects: "true" (of type TrueClass), "false" (FalseClass), or "nil" (NilClass).

String

A JSON string consists of a sequence of Unicode characters enclosed in double-quote marks ("). JSON supports the embedding of special characters using escape codes, similar to C, Java, and Ruby. Notable escape sequences are "\"" (double-quote mark), "\\" (backward slash), and "\n" (newline).

JSON String

When parsed, the string should be represented using a Ruby String object, with all escaped characters expanded. You may wish to use the following function to convert parsed unicode numbers to their corresponding characters:

  # generate a string from a parsed unicode value
  def convert_unicode(str)
    return [str.hex].pack('U')
  end

Number

JSON supports both integer and real-valued numbers (including exponential notation for the latter), but it does not support octal or hexadecimal formats.

JSON Number

When parsed, the number should be represented using a Ruby Fixnum object (for integers) or Float object (for real numbers). In the example above, the age is an integer and the height is a real number.

Note about Grammars (Added Feb 2)

The syntax diagrams above are really just a visual form of a context-free grammar. For example, the diagram for a JSON object is really just a picture of the following grammar rule:

    object -> '{' STRING ':' value ( ',' STRING ':' value )* '}'
            | '{' '}'

There are similar rules for JSON arrays and values:

    array -> '[' value ( ',' value )* '}'
           | '[' ']'

    value -> STRING
           | NUMBER
           | object
           | array
           | 'true'
           | 'false'
           | 'null'

One way to implement a parser for these rules is to write a method for each one. For example, we might have a parse_object method that recognizes a JSON object and returns a Ruby hash representing the JSON object. In doing its job, it would call a parse_value method to parse JSON values (which might itself need to recursively call parse_object). If we do this for all the rules in the grammar, we will have implemented a recursive descent (LL) parser.

Implementation Details

You must implement your JSON parser in a single Ruby script file named "json_parser.rb".

Your program must contain a JSONParser class with implementations for the following methods:

    # constructor: parses the given JSON text
    def initialize(text)

    # returns the parsed Ruby object
    def get_object

Your implementation should NOT re-parse the JSON every time get_object is called. Your implementation should also NOT use the JSON module in the Ruby standard library, nor should it rely on any third-party library.

Your implementation should reject any invalid JSON text; i.e., any text that does not conform to the specifications above. If you have any questions about whether specific text is valid JSON, please post on Piazza. If parsing fails because of invalid JSON, your code should raise an InvalidJSONError exception. You may use the following empty definition of that exception class:

    class InvalidJSONError < IOError; end

As mentioned above, I recommend writing three major recursive-descent parsing methods to recognize JSON objects, arrays, and values, respectively; I also recommend writing Ruby regular expressions to recognize JSON strings and numbers. These regular expressions may be used in multiple parsing methods.

As always, you should decompose long methods into helpers to make them more readable, use good variable and function names, indent properly (indentation is conventionally two spaces in Ruby), use consistent line spacing, and so on. Part of your grade will depend on the readability of your code. Please see the resources page for an example style guide.

You should test your own code thoroughly. Your submission will be tested with an automated test suite, so double-check that all of your class and method names are exactly as specified above. Here is a sample unit test file that you can use as a basis for writing your own tests.

Deliverable Requirements

Please name your Ruby script file "json_parser.rb". Be sure to put your name in a comment at the top of the file. Submit your Ruby script file on Canvas by 11:59 pm on Friday, February 6, 2015.

Part of your grade will be determined by the results of an automated test suite, and part of your grade will be determined by a manual inspection of your code for correctness and style.

Acknowledgement

This PA was designed in collaboration with Dr. Chris Fox.