Task 1: Emotion regression

Description

Train and evaluate two Support Vector Regression models for prediction of arousal and valence. Use Essentia's MusicExtractor to compute summarized descriptor values (no frames). Use Scikit-learn for Support Vector Regression. It will be useful to pre-process all features first, standardizing them to zero mean and unit variance.

Write a report in form of a python notebook. This script should:

Analyze all audio in the dataset using MusicExtractor and store aggredated analysis results in json files
Load descriptors computed for all files, and load arousal/valence annotations
- You can first convert json to CSV if you like
- Use lowlevel.*, rhythm.*, and tonal.* descriptors
- For summarized descriptors only consider *.mean and *.stdev
- Ignore non-numerical descriptors
Standardize all features
Split dataset into two subsets:
- tracks 1-1700 for training
- tracks 1701-... for testing
Train model on training subset
Predict arousal/valence for testing subset and evaluate predictions using regression metrics

Report and discuss evaluation results.

Deliverable

A python notebook with code and report. Include text explaining all your steps, decisions, and discussion of results (submissions with code only and no explanation are not enough!)

Download links

Dataset: https://www.dropbox.com/s/tlxihautrx2hqd0/Essentia_mood_task.zip?dl=0

Alternative link for download: http://essentia.upf.edu/documentation/tmp/Essentia_mood_task.zip

MIR Course Docker image including Essentia python wrapper: https://github.com/MTG/MIRCourse

Info

Task 2: Genre classification

Description

Train and evaluate Support Vector Machine (SVM) for prediction of genre. Use Essentia's MusicExtractor to compute summarized descriptor values (no frames). Use Scikit-learn for SVM classifier. It will be useful to pre-process all features first, standardizing them to zero mean and unit variance.

Write a report in form of a python notebook. This script should:

Analyze all audio in the dataset using MusicExtractor and store aggredated analysis results in json files
Load descriptors computed for all files, and load genre annotations (all audio files are inside folder names that correspond to genres)
- You can first convert json to CSV if you like
- Use lowlevel.*, rhythm.*, and tonal.* descriptors
- For summarized descriptors only consider *.mean and *.stdev
- Ignore non-numerical descriptors
Standardize all features
Split dataset into two balanced subsets: 80% for training, 20% for testing (make sure that the testing subset contains equal amount of tracks by each genre)
Train model on training subset
Predict genre for testing subset and evaluate accuracy of predictions (% of correct predictions). Compute classification accuracy and the confusion matrix.

Report and discuss evaluation results.

Deliverable

A python notebook with code and report. Include text explaining all your steps, decisions, and discussion of results (submissions with code only and no explanation are not enough!)

Download links

Dataset: https://www.dropbox.com/s/gljckh3nff55r9c/Essentia_genre_task.zip?dl=0

Alternative link for download: http://essentia.upf.edu/documentation/tmp/Essentia_genre_task.zip

MIR Course Docker image including Essentia python wrapper: https://github.com/MTG/MIRCourse

Info

Task 3: Music segments playlist

Analyze 20 various music tracks from your collection. Split tracks into segments by multiple beats (segments of 1 beat, 2 beats, 4 beats, 8 beats). Compute average MFCC mean/stdev values for each segment using Essentia and cluster segments using Scikit-learn (discard the 0th MFCC coefficient as it is correlated to loudness, not timbre). Decide which number of segments is appropriate. For each cluster generate audio file with corresponding segments. Number of beats in a segment and number of clusters to consider should be configurable in the script.

Write python script using Essentia that:

Loads audio
Estimates beats positions
Cuts audio into segments (Slicer algorithm)
Computes MFCC frames in each segment and summarizes frames to mean/stdev values (using PoolAggretator)
Clusters segments using Scikit-learn (k-means clustering)
Writes audio files with segments from each cluster (using AudioWriter)

Deliverable

A python notebook with code and report. Audio files with segments for each cluster. Include text explaining all your steps, decisions, and discussion of results (submissions with code only and no explanation are not enough!)

Download links

MIR Course Docker image including Essentia python wrapper: https://github.com/MTG/MIRCourse

dbogdanov/smc_tasks_essentia.md

Task 1: Emotion regression

Description

Deliverable

Download links

Info

Task 2: Genre classification

Description

Deliverable

Download links

Info

Task 3: Music segments playlist

Deliverable

Download links

Info