Skip to content

Project 3: Music Data

last.fm logo
credit: last.fm

last.fm is a social network based on audio scrobbling: tracking your music plays across any streaming service that users allow their listening to be tracked. According to last.fm:

We bring together your favourite music services and join up listening, watching and sharing to connect your musical world.

In this project, you will manipulate real data from last.fm.

Data content warning

The data files provided for this project contain real data accessed from last.fm APIs for popular songs and tags. Like the songs themselves, the data, which includes song titles and song descriptions, may contain profanity or other language and themes that do not represent this course or JMU.

Objectives

  • Read and write json files.
  • Search and edit complex nested lists and dictionaries.
  • Use recursion to search data.
  • Use an API to retrieve live data from a website.
  • Format output in HTML.

Background

An Application Programming Interface (API) defines the ways one program can interact with another. A Web API is an API that is accessed over the Web, using Web protocols (https). Web APIs give programs access to a huge amount of data and services. last.fm is one of the many social networks that give programs access to their public data: last.fm API. Their API methods give access to a wealth of data, including album, artist, track and user data.

In part A of this project, you will write tests for part B of the project.

In part B, you will develop functions that manipulate and display last.fm data that has been stored in json files. These functions will power a small application that allows its users to create playlists that match their needs.

In part C, you will use recursion to solve a problem. You will also use the last.fm APIs to get live data and display some of that data in HTML instead of simple plain text output.

Part A

For this assignment, instead of completing a "readiness quiz" on Canvas, you will write unit tests. In other words, you are "ready" to begin Part B if you know how to call each function and what the expected return values are.

The following files are provided:

The test_playlist.py module includes examples of how to call the functions in Part B. Read the requirements (in the next section of the spec) and the docstrings for more details. You must write at least two more assert statements for each test function. In the end, your test code must cover 100% of the sample solution for Part B.

When you submit test_playlist.py to Gradescope, you will get feedback about your tests. If any tests fail or don't fully cover the solution, you should revise and resubmit. That way, you will have a good set of tests when working on Part B. If you don't complete Part A on time, you will have a disadvantage on Part B.

Part B

A JSON file is composed of key-value pair data (like a Python dictionary) and lists: a combination of dictionaries and lists of dictionaries. There may be a dictionary or a list at the top level. In part B, you will use the following json files, where the top-level item is a list:

  • fulltracks.json – contains a list of dictionaries: 1 for each of the top 50 tracks at the time the API was last accessed.
  • toptags.json – contains a list of dictionaries: 1 for each of the top 50 tags at the time the API was last accessed.

Structure of the data

In order to understand the structure of the data in these files, you can either open the files in VS Code and format them, you can open them in Chrome and Chrome can format them, or you can go to one of many JSON pretty print websites such as jsonformatter.org and paste the data there for formatting.

A track dictionary contains all the information about a song, including: its name, length, its artist, the album(s) it appears on, available links and artwork, the tags it is tagged with, and wiki text information. A tag dictionary contains the tag name, its count (how many times it has been used) and its reach (how many users have used the tag).

Cleaning the data

When a program uses real data, sometimes that data is not perfect for the uses that the program wants it for, or the data may contain mistakes or inconsistences. When a program handles issues in real data like this, it is known as cleaning the data. In this project, you will have to write some special case code to handle issues in the real data.

Software Requirements

You will write several functions that allow a user to generate a playlist from the top 50 tracks according to their interests. You are given the function docstrings and a prewritten main() function which uses all the other functions.

process_args()

The program expects the names of the tags file and the track file as command-line arguments. The process_args() function should assure that the program was run with the correct number of arguments. If not, it should print the following error message and exit:

Usage: python playlist.py tagfile trackfile

The function should check for the existence of each of the 2 input files and print the following message and exit if either file does not exist:

Input file passed-in-filename does not exist

If there are no problems, the function should return the 2 filenames in a tuple: the tagfile first, and the trackfile second.

load_json(filename)

This function loads a json file into a list and returns the list. Assume the file exists. Both of the json files used in this project have a list at the top level.

clean_tags(tags, exclude, limit)

This function should take a list of tag dictionaries and return a list of tag names (strings). While compiling the list of names, it should make each tag name lowercase and it should exclude any tags that appear in the exclude parameter (a list of tag names (strings)). The limit is an optional argument with the default value None. If the limit argument is provided, then the function should also exclude tags whose count value is less or equal to the given limit.

format_tags(tags)

This function takes a list of strings (tag names) and returns a string which formats them into 3 columns. The width of each column should be exactly 25 characters. For example, given the list of tags:

['rock', 'electronic', 'alternative', 'indie', 'pop']

The string returned by the function, when printed, should be:

rock                     electronic               alternative
indie                    pop

add_tag_to_set(name, tags, tagset)

This function takes 3 arguments: a tag name, a list of tag names, and a set of tag names. If the given name is in the list of tag names, it should be added to the tag set.

This function also has the responsibility to handle an inconsistency in the last.fm data. There are 2 tags - "hip hop" and "hip-hop" and some songs are tagged with one and some with the other. To be sure that if a user chooses one of these tags, they also get the other one, this function must contain special code to handle this. If the given tag is either one of the hip-hop tags, then both should be added to the set.

clean_durations(tracks, default_duration)

This function is responsible for cleaning the data in the given tracks list. Tracks have an attribute called duration (song length). That attribute is a string, but it should be an integer. Also, for some tracks, the value is in milliseconds, for others it is in seconds, and for others it is empty. This function must update the duration attribute in each track in the given tracks list. It should convert them all to integers. Any duration attributes in milliseconds should be converted to seconds. A value in milliseconds can be recognized by being evenly divisible by 1000.

The default_duration argument is optional. Any duration attributes that are 0 should be replaced with the default duration. If no default duration argument is provided, the value of 300 seconds should be used.

tags_match(track, tagset)

This function takes a single track dictionary and a set of tag names. If the given track is tagged with any of the tag names in the tag set, the function should return True (otherwise False). The function must ignore the case of the tag names associated with the track.

create_playlist(tracks, tagset, minutes)

This function takes a list of track dictionaries, a set of tag names, and a value, in minutes, for the length of the playlist to be created. It should return a new list of track dictionaries which represent a playlist that meets the given criteria. This function should examine each track in the tracks list in order. If a track is tagged with one of the tag names in the tag set, it should be added to the playlist, but only if the playlist length (minutes) would not be exceeded.

This function should return a list of tracks that match the tags and whose combined length gets as close to as possible but does not exceed the minutes value. This function should make use of other function(s) in the file.

format_playlist(playlist)

This function takes a list of track dictionaries and return a string that represents the playlist. The first line of the playlist string should be a header:

TRACK NAME ARTIST NAME LENGTH (min)

The width of the track name column should be 50, the artist name column's width should be 30, and the length column should finish the line. The tracks should be displayed one per line underneath, with the same widths, and the format of the track length (duration) should be float with 1 digit after the decimal point, for example:

TRACK NAME                                        ARTIST NAME                   LENGTH (min)
BIRDS OF A FEATHER                                Billie Eilish                 3.5
tAsTe                                             Sabrina Carpenter             5.0
Bed Chem                                          Sabrina Carpenter             2.9
Following the list of tracks should be a blank line and then the header of the track details section, beginning with the title line. Each following line should contain the summary field from the wiki dictionary in the track dictionary followed by 2 newline characters. For example:
DETAILS
“BIRDS OF A FEATHER” is one of the standout tracks from Billie Eilish’s third album HIT ME HARD AND SOFT. The second single became a massive success, gaining widespread acclaim for its emotional depth and resonating with a global audience. It eventually became Billie Eilish’s first song to top the Billboard Global 200 chart, especially after her performance during the 2024 Summer Olympics closing ceremony.

Thematically, the song delves into the longing for deep love and lasting connection <a href="http://www.last.fm/music/Billie+Eilish/_/BIRDS+OF+A+FEATHER">Read more on Last.fm</a>.

“Taste” is the first track off of Sabrina Carpenter’s sixth studio album, Short n' Sweet, and was released as the highly anticipated third single with a music video featuring female actress Jenna Ortega on August 23, 2024, following hits “Espresso” & “Please Please Please”.

Disco-infused & irresistibly danceable to, “Taste” serves as the album’s opener, layering risqué innuendos with revenge and betrayal, stating that even if he’s cheating, she’ll allow it as he’s always got the taste of her on his lips. <a href="http://www.last.fm/music/Sabrina+Carpenter/_/tAsTe">Read more on Last.fm</a>.

"Bed Chem" is a song by Sabrina Carpenter, from her sixth album Short n' Sweet. It was written by Sabrina Carpenter, Julia Michaels, John Ryan, Ian Kirkpatrick and Amy Allen. <a href="http://www.last.fm/music/Sabrina+Carpenter/_/Bed+Chem">Read more on Last.fm</a>.

Part C

Coming soon …

Submission Instructions

Part A (10 points)

Submit test_playlist.py on Gradescope. You have unlimited submissions for Part A.

Part B (50 points)

Submit playlist.py on Gradescope. You are limited to 10 submissions for Part B.

Part C (40 points)

Coming soon …