Skip to content

Instantly share code, notes, and snippets.

@dbogdanov
Last active January 29, 2018 23:21
Show Gist options
  • Save dbogdanov/7522f3966c4abe70c84d368d76fff354 to your computer and use it in GitHub Desktop.
Save dbogdanov/7522f3966c4abe70c84d368d76fff354 to your computer and use it in GitHub Desktop.
SCM tasks 2017.01 (Essentia)

Task 1: Emotion regression

Description

Train and evaluate two Support Vector Regression models for prediction of arousal and valence. Use Essentia's MusicExtractor to compute summarized descriptor values (no frames). Use Scikit-learn for Support Vector Regression. It will be useful to pre-process all features first, standardizing them to zero mean and unit variance.

Write a report in form of a python notebook. This script should:

  • Analyze all audio in the dataset using MusicExtractor and store aggredated analysis results in json files
  • Load descriptors computed for all files, and load arousal/valence annotations
    • You can first convert json to CSV if you like
    • Use lowlevel.*, rhythm.*, and tonal.* descriptors
    • For summarized descriptors only consider *.mean and *.stdev
    • Ignore non-numerical descriptors
  • Standardize all features
  • Split dataset into two subsets:
    • tracks 1-1700 for training
    • tracks 1701-... for testing
  • Train model on training subset
  • Predict arousal/valence for testing subset and evaluate predictions using regression metrics

Report and discuss evaluation results.

Deliverable

A python notebook with code and report. Include text explaining all your steps, decisions, and discussion of results (submissions with code only and no explanation are not enough!)

Download links

Dataset: https://www.dropbox.com/s/tlxihautrx2hqd0/Essentia_mood_task.zip?dl=0

Alternative link for download: http://essentia.upf.edu/documentation/tmp/Essentia_mood_task.zip

MIR Course Docker image including Essentia python wrapper: https://github.com/MTG/MIRCourse

Info

Task 2: Genre classification

Description

Train and evaluate Support Vector Machine (SVM) for prediction of genre. Use Essentia's MusicExtractor to compute summarized descriptor values (no frames). Use Scikit-learn for SVM classifier. It will be useful to pre-process all features first, standardizing them to zero mean and unit variance.

Write a report in form of a python notebook. This script should:

  • Analyze all audio in the dataset using MusicExtractor and store aggredated analysis results in json files
  • Load descriptors computed for all files, and load genre annotations (all audio files are inside folder names that correspond to genres)
    • You can first convert json to CSV if you like
    • Use lowlevel.*, rhythm.*, and tonal.* descriptors
    • For summarized descriptors only consider *.mean and *.stdev
    • Ignore non-numerical descriptors
  • Standardize all features
  • Split dataset into two balanced subsets: 80% for training, 20% for testing (make sure that the testing subset contains equal amount of tracks by each genre)
  • Train model on training subset
  • Predict genre for testing subset and evaluate accuracy of predictions (% of correct predictions). Compute classification accuracy and the confusion matrix.

Report and discuss evaluation results.

Deliverable

A python notebook with code and report. Include text explaining all your steps, decisions, and discussion of results (submissions with code only and no explanation are not enough!)

Download links

Dataset: https://www.dropbox.com/s/gljckh3nff55r9c/Essentia_genre_task.zip?dl=0

Alternative link for download: http://essentia.upf.edu/documentation/tmp/Essentia_genre_task.zip

MIR Course Docker image including Essentia python wrapper: https://github.com/MTG/MIRCourse

Info

Task 3: Music segments playlist

Analyze 20 various music tracks from your collection. Split tracks into segments by multiple beats (segments of 1 beat, 2 beats, 4 beats, 8 beats). Compute average MFCC mean/stdev values for each segment using Essentia and cluster segments using Scikit-learn (discard the 0th MFCC coefficient as it is correlated to loudness, not timbre). Decide which number of segments is appropriate. For each cluster generate audio file with corresponding segments. Number of beats in a segment and number of clusters to consider should be configurable in the script.

Write python script using Essentia that:

  • Loads audio
  • Estimates beats positions
  • Cuts audio into segments (Slicer algorithm)
  • Computes MFCC frames in each segment and summarizes frames to mean/stdev values (using PoolAggretator)
  • Clusters segments using Scikit-learn (k-means clustering)
  • Writes audio files with segments from each cluster (using AudioWriter)

Deliverable

A python notebook with code and report. Audio files with segments for each cluster. Include text explaining all your steps, decisions, and discussion of results (submissions with code only and no explanation are not enough!)

Download links

MIR Course Docker image including Essentia python wrapper: https://github.com/MTG/MIRCourse

Info

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment