Skip to content

Instantly share code, notes, and snippets.

@bzamecnik
Created July 17, 2017 11:12
Show Gist options
  • Save bzamecnik/6e4c81e1dab55bb796a205f125ceea84 to your computer and use it in GitHub Desktop.
Save bzamecnik/6e4c81e1dab55bb796a205f125ceea84 to your computer and use it in GitHub Desktop.
Extract MusicNet labels
#!/usr/bin/env python3
# Extracts just a dataset of labels (ingoring audio) from the full
# MusicNet dataset in order to shrink it from 11 GB.
# Note: I was not able to save objects (not numpy arrays) back to npz file,
# so we save to pickle.
import numpy as np
from tqdm import tqdm
# the npz file was picked in Python 2, we're opening in Python 3
musicnet = np.load('musicnet.npz', encoding = 'latin1', fix_imports=True, mmap_mode='r')
musicnet_labels = {id:musicnet[id][1] for id in tqdm(musicnet.keys())}
with open('musicnet_labels.pickle', 'wb') as f:
pickle.dump(musicnet_labels, f)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment