Whilst the current ASR landscape is really promosing a lot of it is currently benchmarked on rather "clean" datasets. This often creates a false sense of confidence in the Architecture which might not translate to the real world.
- Gaussian White Noise
- Real World Noise
- Choppy audio (random 1-2s removed from the audio snippet)
- Speed up (random 10s snippets sped up than the rest)