Project 3: Music Data¶

last.fm logo
credit: last.fm

last.fm is a social network based on audio scrobbling: tracking your music plays across any streaming service that users allow their listening to be tracked. According to last.fm:

We bring together your favourite music services and join up listening, watching and sharing to connect your musical world.

In this project, you will manipulate real data from last.fm.

Data content warning

The data files provided for this project contain real data accessed from last.fm APIs for popular songs and tags. Like the songs themselves, the data, which includes song titles and song descriptions, may contain profanity or other language and themes that do not represent this course or JMU.

Objectives¶

Read and write json files.
Search and edit complex nested lists and dictionaries.
Use recursion to search data.
Use an API to retrieve live data from a website.

Background¶

An Application Programming Interface (API) defines the ways one program can interact with another. A Web API is an API that is accessed over the Web, using Web protocols (https). Web APIs give programs access to a huge amount of data and services. last.fm is one of the many social networks that give programs access to their public data: last.fm API. Their API methods give access to a wealth of data, including album, artist, track and user data.

In part A of this project, you will write tests for part B of the project.

In part B, you will develop functions that manipulate and display last.fm data that has been stored in json files. These functions will power a small application that allows its users to create playlists that match their needs.

In part C, you will use recursion to analyze the social network.

In part D, you will use the last.fm API to explore data of interest.

Part A: Unit Testing¶

For this assignment, instead of completing a "readiness quiz" on Canvas, you will write unit tests. In other words, you are "ready" to begin Part B if you know how to call each function and what the expected return values are.

The following files are provided:

playlist.py – what you will submit for Part B
sample_data.py – subset of the data files for Part B
test_playlist.py – what you will submit for Part A

The test_playlist.py module includes examples of how to call the functions in Part B. Read the requirements (in the next section of the spec) and the docstrings for more details. You must write at least two more assert statements for each test function. In the end, your test code must cover 100% of the sample solution for Part B.

When you submit test_playlist.py to Gradescope, you will get feedback about your tests. If any tests fail or don't fully cover the solution, you should revise and resubmit. That way, you will have a good set of tests when working on Part B. If you don't complete Part A on time, you will have a disadvantage on Part B.

Part B: Nested Data¶

A JSON file is composed of key-value pair data (like a Python dictionary) and lists: a combination of dictionaries and lists of dictionaries. There may be a dictionary or a list at the top level. In part B, you will use the following json files, where the top-level item is a list:

fulltracks.json – contains a list of dictionaries: 1 for each of the top 50 tracks at the time the API was last accessed.
toptags.json – contains a list of dictionaries: 1 for each of the top 50 tags at the time the API was last accessed.

Structure of the data¶

In order to understand the structure of the data in these files, you can either open the files in VS Code and format them, you can open them in Chrome and Chrome can format them, or you can go to one of many JSON pretty print websites such as jsonformatter.org and paste the data there for formatting.

A track dictionary contains all the information about a song, including: its name, length, its artist, the album(s) it appears on, available links and artwork, the tags it is tagged with, and wiki text information. A tag dictionary contains the tag name, its count (how many times it has been used) and its reach (how many users have used the tag).

Cleaning the data¶

When a program uses real data, sometimes that data is not perfect for the uses that the program wants it for, or the data may contain mistakes or inconsistences. When a program handles issues in real data like this, it is known as cleaning the data. In this project, you will have to write some special case code to handle issues in the real data.

Software Requirements¶

You will write several functions that allow a user to generate a playlist from the top 50 tracks according to their interests. You are given the function docstrings and a prewritten main() function which uses all the other functions.

`process_args()`¶

The program expects the names of the tags file and the track file as command-line arguments. The process_args() function should assure that the program was run with the correct number of arguments. If not, it should print the following error message and exit:

Usage: python playlist.py tagfile trackfile

The function should check for the existence of each of the 2 input files and print the following message and exit if either file does not exist:

Input file passed-in-filename does not exist

Replace "passed-in-filename" with the actual filename. If there are no problems, the function should return the 2 filenames in a tuple: the tagfile first, and the trackfile second.

`load_json(filename)`¶

This function loads a json file into a list and returns the list. Assume the file exists. Both of the json files used in this project have a list at the top level.

`clean_tags(tags, exclude, limit)`¶

This function should take a list of tag dictionaries and return a list of tag names (strings). While compiling the list of names, it should make each tag name lowercase and it should exclude any tags that appear in the exclude parameter (a list of tag names (strings)). The limit is an optional argument with the default value None. If the limit argument is provided, then the function should also exclude tags whose count value is less or equal to the given limit.

`format_tags(tags)`¶

This function takes a list of strings (tag names) and returns a string which formats them into 3 columns. The width of each column should be exactly 25 characters. For example, given the list of tags:

['rock', 'electronic', 'alternative', 'indie', 'pop']

The string returned by the function, when printed, should be:

rock                     electronic               alternative
indie                    pop

`add_tag_to_set(name, tags, tagset)`¶

This function takes 3 arguments: a tag name, a list of tag names, and a set of tag names. If the given name is in the list of tag names, it should be added to the tag set.

This function also has the responsibility to handle an inconsistency in the last.fm data. There are 2 tags - "hip hop" and "hip-hop" and some songs are tagged with one and some with the other. To be sure that if a user chooses one of these tags, they also get the other one, this function must contain special code to handle this. If the given tag is either one of the hip-hop tags, then both should be added to the set.

`clean_durations(tracks, default_duration)`¶

This function is responsible for cleaning the data in the given tracks list. Tracks have an attribute called duration (song length). That attribute is a string, but it should be an integer. Also, for some tracks, the value is in milliseconds, for others it is in seconds, and for others it is empty. This function must update the duration attribute in each track in the given tracks list. It should convert them all to integers. Any duration attributes in milliseconds should be converted to seconds. A value in milliseconds can be recognized by being evenly divisible by 1000.

The default_duration argument is optional. Any duration attributes that are 0 should be replaced with the default duration. If no default duration argument is provided, the value of 300 seconds should be used.

`tags_match(track, tagset)`¶

This function takes a single track dictionary and a set of tag names. If the given track is tagged with any of the tag names in the tag set, the function should return True (otherwise False). The function must ignore the case of the tag names associated with the track.

`create_playlist(tracks, tagset, minutes)`¶

This function takes a list of track dictionaries, a set of tag names, and a value, in minutes, for the length of the playlist to be created. It should return a new list of track dictionaries which represent a playlist that meets the given criteria. This function should examine each track in the tracks list in order. If a track is tagged with one of the tag names in the tag set, it should be added to the playlist, but only if the playlist length (minutes) would not be exceeded.

This function should return a list of tracks that match the tags and whose combined length gets as close to as possible but does not exceed the minutes value. This function should make use of other function(s) in the file.

`format_playlist(playlist)`¶

This function takes a list of track dictionaries and return a string that represents the playlist. The first line of the playlist string should be a header:

TRACK NAME ARTIST NAME LENGTH (min)

The width of the track name column should be 50, the artist name column's width should be 30, and the length column should finish the line. The tracks should be displayed one per line underneath, with the same widths, and the format of the track length (duration) should be float with 1 digit after the decimal point, for example:

TRACK NAME                                        ARTIST NAME                   LENGTH (min)
BIRDS OF A FEATHER                                Billie Eilish                 3.5
tAsTe                                             Sabrina Carpenter             5.0
Bed Chem                                          Sabrina Carpenter             2.9

Following the list of tracks should be a blank line and then the header of the track details section, beginning with the title line. Each following line should contain the summary field from the wiki dictionary in the track dictionary followed by 2 newline characters. For example:

DETAILS
“BIRDS OF A FEATHER” is one of the standout tracks from Billie Eilish’s third album HIT ME HARD AND SOFT. The second single became a massive success, gaining widespread acclaim for its emotional depth and resonating with a global audience. It eventually became Billie Eilish’s first song to top the Billboard Global 200 chart, especially after her performance during the 2024 Summer Olympics closing ceremony.

Thematically, the song delves into the longing for deep love and lasting connection <a href="http://www.last.fm/music/Billie+Eilish/_/BIRDS+OF+A+FEATHER">Read more on Last.fm</a>.

“Taste” is the first track off of Sabrina Carpenter’s sixth studio album, Short n' Sweet, and was released as the highly anticipated third single with a music video featuring female actress Jenna Ortega on August 23, 2024, following hits “Espresso” & “Please Please Please”.

Disco-infused & irresistibly danceable to, “Taste” serves as the album’s opener, layering risqué innuendos with revenge and betrayal, stating that even if he’s cheating, she’ll allow it as he’s always got the taste of her on his lips. <a href="http://www.last.fm/music/Sabrina+Carpenter/_/tAsTe">Read more on Last.fm</a>.

"Bed Chem" is a song by Sabrina Carpenter, from her sixth album Short n' Sweet. It was written by Sabrina Carpenter, Julia Michaels, John Ryan, Ian Kirkpatrick and Amy Allen. <a href="http://www.last.fm/music/Sabrina+Carpenter/_/Bed+Chem">Read more on Last.fm</a>.

Part C: Recursion¶

The next part of this assignment is about recursive functions. The following files are provided:

users.json – data file for Part C
friends.py – what you will submit for Part C

On last.fm, users can "friend" other users to share playlists, discover new music, and interact in the social network. The "friends of friends" pattern is often analyzed using recursion.

Structure of the data¶

The users.json file is sample of 31,774 public users. Most users have a username, url, country, real name, user type, and list of friends' names. However, some of these attributes are optional. The strings in the "friends_names" list correspond to the "name" attribute of other users.

Software Requirements¶

You need to write three functions: one to gather the data you need, and two to analyze the data. The last two functions must be recursive. You may (and will need to) use for loops. But do not use any while loops in your code.

`load_data(path)`¶

This function reads the JSON file from the given path (Ex: data/users.json). The file contains some extra data, so this function returns only the data that is needed. Specifically, load_data() returns a dictionary that maps each user to their friends. The key of the dictionary is the user's name, and the value is the list of their friends. If the user doesn't have any friends, then the value should be None.

Tip

Use a dict comprehension to build the result.

`influence(data, user)`¶

This function counts the number of unique friends reachable from a user. For example, if a user has 5 friends, and each of those users have 3 friends, and each of those users have no friends, then the first user's influence is 5 + 15 = 20. This count is an estimate of how much influence a user has in the social network.

Notice in the provided friends.py that the function has a default parameter visited=None. This parameter is used during recursion to avoid counting the same user twice. Each time the function is called, the current user is added to the visited set. If the user is already in the set, then the function returns immediately, because that user has already been counted.

Hints for implementing influence()

Here is an outline of how to write the function:

Setup
- Initialize visited if first function call
- Return zero if the user was already visited
- Otherwise add the user to the visited set
Base case
- If the user has no friends, influence is zero
Recursion
- Count the user's direct friends
- Recursively count friends of friends
- Add the counts and return the total

Here are two examples for testing your solution:

assert influence(data, "Babs_05") == 1494
assert influence(data, "alistair3408") == 0

Tip

Copy the above outline into your code as comments.

`separation(data, user1, user2)`¶

This function measures degrees of separation or how far apart one user is from another in the social network. For example, if user A is friends with user B, and user B is friends with user C, then the separation of A and B is 1, and the separation of A and C is 2.

Notice in the provided friends.py that the function has two default parameters: visited=None and depth=0. The visited parameter has the same purpose as before. The depth parameter keeps track of how many times the function has been called recursively—in other words, depth is the current degree of separation.

Hints for implementing separation()

Here is an outline of how to write the function:

Setup
- Initialize visited if first function call
- Return zero if the user was already visited
- Otherwise add the user to the visited set
Base cases
- If both users are the same, return the result
- If the user has no friends, return not found
Recursion
- Recursively search friends of friends
- If the result is found, return the result

Here are two examples for testing your solution:

assert separation(data, "Babs_05", "Malour") == 2
assert separation(data, "Babs_05", "alistair3408") == -1

Tip

Copy the above outline into your code as comments.

Part D: API Usage¶

This last part of the assignment is completely open-ended. There are a few tests to pass on Gradescope, but you get to choose your own adventure! An Application Programming Interface (API) is a way for different programs to interact, often by calling functions over the internet. You will use the Last.fm Music Discovery API to explore music data of your choice.

Example Program¶

The faculty used the Last.fm API to create the data files provided in Part A. Here is one of the programs that we wrote:

fulltracks.py – example program that creates fulltracks.json

This example first calls the chart.getTopTracks method to retrieve the top 50 tracks. The code then calls track.getInfo for each track to get the full information. The resulting data is stored in the fulltracks.json file.

Notice the following details about the example:

The program uses the requests module. You will need to install this module either via the terminal (pip install requests) or the Thonny package manager.
In order to run the program, you will need an API key (see instructions below).
All data sent to the API and received from the API is in JSON format.
The code is commented neatly, making the code easier to understand.

Getting an API Key¶

To get started, you will need to create an account on Last.fm. After logging in for the first time, verify your email address by clicking the link sent to your email. Next, you will need to create an API account. Enter the following details on the form:


Contact email:	your email
Application name:	JMU CS 149 PA3
Application description:	School assignment
Callback URL:	leave blank
Application homepage:	https://w3.cs.jmu.edu/cs149/f24/pa/pa3/

After submitting the form, you will be shown your API key and Shared secret. Save these details in a text file or print the page to PDF for future reference.

Software Requirements¶

Now it's your turn to use the API. Read the Last.fm API Documentation to learn what methods you can call and what parameters to pass. Complete the following steps:

Create a file named explore.py. Write a docstring at the top with a one-line description, your name, and the date.
Write a program that calls at least two API methods and combines the data into one result. Your program should be about as complex as the provided fulltracks.py example.
In the docstring at the top of the file, write a paragraph (at least 100 words) that summarizes what your program does. Explain what you discovered or what you learned from the data.
Be sure to include comments in your code. Do not comment every line; write about as many comments as you see in fulltracks.py.

We hope you will be creative and find data you are genuinely interested in, such as your favorite artists, albums, or songs.

Submission Instructions¶

Part A (15 points)¶

Submit test_playlist.py on Gradescope. You have unlimited submissions for Part A.

Part B (50 points)¶

Submit playlist.py on Gradescope. You are limited to 10 submissions for Part B.

Part C (15 points)¶

Submit friends.py on Gradescope. You are limited to 10 submissions for Part C.

Part D (20 points)¶

Submit explore.py on Gradescope. You have unlimited submissions for Part D.

Project 3: Music Data¶

Objectives¶

Background¶

Part A: Unit Testing¶

Part B: Nested Data¶

Structure of the data¶

Cleaning the data¶

Software Requirements¶

process_args()¶

load_json(filename)¶

clean_tags(tags, exclude, limit)¶

format_tags(tags)¶

add_tag_to_set(name, tags, tagset)¶

clean_durations(tracks, default_duration)¶

tags_match(track, tagset)¶

create_playlist(tracks, tagset, minutes)¶

format_playlist(playlist)¶

Part C: Recursion¶

Structure of the data¶

Software Requirements¶

load_data(path)¶

influence(data, user)¶

separation(data, user1, user2)¶

Part D: API Usage¶

Example Program¶

Getting an API Key¶

Software Requirements¶

Submission Instructions¶

Part A (15 points)¶

Part B (50 points)¶

Part C (15 points)¶

Part D (20 points)¶

`process_args()`¶

`load_json(filename)`¶

`clean_tags(tags, exclude, limit)`¶

`format_tags(tags)`¶

`add_tag_to_set(name, tags, tagset)`¶

`clean_durations(tracks, default_duration)`¶

`tags_match(track, tagset)`¶

`create_playlist(tracks, tagset, minutes)`¶

`format_playlist(playlist)`¶

`load_data(path)`¶

`influence(data, user)`¶

`separation(data, user1, user2)`¶