Skip to content

Instantly share code, notes, and snippets.

@GokulNC
Last active June 25, 2020 05:41
Show Gist options
  • Save GokulNC/32c1bdbc3fc93cb6bd4dfe91597033fe to your computer and use it in GitHub Desktop.
Save GokulNC/32c1bdbc3fc93cb6bd4dfe91597033fe to your computer and use it in GitHub Desktop.
LibriSpeech Annotation Corrector
from glob import glob
from playsound import playsound
from multiprocessing import Process
import os, sys, time
def open_text_editor(txt_file):
# Only Windows 😎
# Probably use gedit for Ubuntu?
os.system('notepad++ %s' % txt_file)
return
if __name__ == '__main__':
audio_files = sorted(glob('wav/*.wav'))
if not audio_files:
sys.exit('No audio files found.')
# i = 0
i = int(input('Start annotating from index: '))
while i < len(audio_files):
audio_file = audio_files[i]
audio_player = Process(target=playsound, args=(audio_file, ))
txt_file = 'txt/' + os.path.basename(audio_file).replace('.wav', '.txt')
if not os.path.isfile(txt_file):
print('No txt for:', audio_file)
continue
print('>> Current:', i)
text_editor = Process(target=open_text_editor, args=(txt_file,))
text_editor.start()
time.sleep(0.5)
audio_player.start()
while True:
command = input('Command: ')
audio_player.terminate()
if command == 'n': # Next audio
i += 1
break
if command == 'r': # Replay audio
audio_player = Process(target=playsound, args=(audio_file, ))
audio_player.start()
continue
if command == 'p': # Previous audio
if i > 0:
i -= 1
break
if command == 'q': # Quit Annotating
print('Quitting... Remember: Last Audio Index:', i)
sys.exit()
# text_editor.terminate()
print('SUCCESS: Completed Annotating %d samples of data!!!' % len(audio_files))

Steps:

  1. Place this script inside your librispeech dataset folder.
  2. This should the directory structure of the dataset folder:
libri_dataset
|-libri_annotator.py
|-txt
 |- sample1.txt
 |- sample2.txt
|-wav
 |-sample1.wav
 |-sample2.wav
  1. Ensure Notepad++ is installed and set in PATH
  2. pip install playsound
  3. Run python libri_annotator.py from inside the libri_dataset folder.

How to use:

  • Press n for next sample
  • Press p for previous sample
  • Press r for replay
  • Press q to quit

Tips:

  • Dock the Notepad++ window to the right-side and the command-line to the left side (for ease of annotation)
  • For Linux, you might have to use something like gedit instead of notepad++
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment