Skip to content

Instantly share code, notes, and snippets.

@Vaibhavs10
Last active May 17, 2022 10:12
Show Gist options
  • Save Vaibhavs10/e1f76fa8a563004ba94ad65f3e526d70 to your computer and use it in GitHub Desktop.
Save Vaibhavs10/e1f76fa8a563004ba94ad65f3e526d70 to your computer and use it in GitHub Desktop.
Robust ASR: An applied survey of current SoTA ASR architectures

Motivation

Whilst the current ASR landscape is really promosing a lot of it is currently benchmarked on rather "clean" datasets. This often creates a false sense of confidence in the Architecture which might not translate to the real world.

Types of Noises

  1. Gaussian White Noise
  2. Real World Noise
  3. Choppy audio (random 1-2s removed from the audio snippet)
  4. Speed up (random 10s snippets sped up than the rest)

Evaluation

  1. WER
  2. CER

Languages (open to more)

  1. English
  2. German

Dataset

  1. Robust Speech Dataset
  2. Common Voice

Experiments (Across noise types)

  1. Tracking evaluation metrics across noise types with decreasing Signal-to-Noise (SNR) Ratio (w/ & w/o LM)
  2. Tracking evaluation metrics across noise types with an explicit Speech Enhancement preprocessing step (w/ & w/o LM)

Candidate Architectures

  1. Wav2Vec2
  2. HuBERT
  3. Data2Vec
  4. UniSpeech
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment