Skip to content

Instantly share code, notes, and snippets.

@egorsmkv
Created March 21, 2025 13:32
Show Gist options
  • Save egorsmkv/eb181051622ad6826cc6673311f2497c to your computer and use it in GitHub Desktop.
Save egorsmkv/eb181051622ad6826cc6673311f2497c to your computer and use it in GitHub Desktop.
Upload MP3 to HF
import json
from glob import glob
from os.path import basename
files_all = glob("data/*.mp3")
results = []
for idx, filename in enumerate(files_all):
duration = 0
results.append({'file_name': basename(filename), 'duration': duration, 'transcription': '-'})
with open('data/metadata.jsonl', 'w') as f:
for result in results:
f.write(json.dumps(result) + '\n')
from datasets import load_dataset
af_ds = load_dataset("audiofolder", data_dir="./data")
print(af_ds)
af_ds.push_to_hub("speech-uk/test-dataset", token='hf_....')
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment