Skip to content

Instantly share code, notes, and snippets.

@Hadryan
Forked from dbogdanov/smc_tasks_acousticbrainz.md
Created December 24, 2018 10:08
Show Gist options
  • Save Hadryan/31c65f2e010b52530fc3c110ef954aae to your computer and use it in GitHub Desktop.
Save Hadryan/31c65f2e010b52530fc3c110ef954aae to your computer and use it in GitHub Desktop.
AMP Lab task 2017.01

Task 1: correlation between descriptors and music categories

You are given a dataset of descriptors for music recordings and additional annotations by genres and moods.

  • Analyze global correlations (across all music recordings in the dataset)
    • between descriptors within Group 1
    • between descriptors within Group 2
  • Analyze a correlation of descriptors in Group 1 and Group 2 with
    • music genres associated with the recordings
    • music moods (mood_happy, mood_sad, mood_agressive, mood_relaxed, mood_party)

Use Pearson correlation (scipy) and Maximal information coefficient (pip install minepy) for measuring correlation between each pair of descriptors and between each pair of descriptor and annotation (genre or mood). In the case of genres and mood labels, treat those as discrete variables with values 0 or 1 to be able to run the same correlation metrics.

As there are many mood labels, you can experiment with grouping them according to their sense (e.g., relaxed and chill = 0 versus energetic and agresssive = 1) to reduce the amount of variables.

For genres, select the top 20 genres in your analysis (this accounts to 20 variables to test correlation with).

Group 1 (loudness/energy related descriptors):

  • average_loudness
  • replay_gain
  • spectral_rms (mean and median values)
  • spectral_energy (mean and median values)
  • MFCC 0th coefficient (mean value)

Group 2 (tempo/rhythm related descriptors):

  • bpm
  • bpm_histogram_first_peak_bpm (mean and median)
  • bpm_histogram_second_peak_bpm (mean and median)
  • onset_rate
  • danceability

Write a report in form of a python notebook:

  • load the dataset of descriptors for music recordings
  • compute correlation between particular descriptors within each group
  • rank correlations in descending order and discuss them
  • generate four heatmaps (using seaborn package) of correlations for each group of descriptors and each correlation measure
  • compute correlation between particular descriptors from each group with music genre and moods

Deliverable

A python notebook with code and report

Download links:

Dataset: TODO

Info

Task 2: agreement between AcousticBrainz vs MSD data

You are given two datasets of descriptors for the same music recordings, containing descriptors from AcousticBrainz and from Million Song Dataset. Study correlations between descriptors in AcousticBrainz against their counterparts in MSD:

Compare track durations to double check that the tracks analyzed in AB and MSD are identical. Slight differences of a few seconds are normal.

Deliverable

A python notebook with code and report

Download links:

Dataset: TODO

Info

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment