Skip to content

Instantly share code, notes, and snippets.

@ljnmedium
ljnmedium / step.md
Last active July 17, 2023 07:56
step.md
pyannote NeMo
Voice Activity Detection (VAD) Pyannet derived from Syncnet MarbleNet
Audio embedding ECAPA-TDNN TitaNet
Clustering Hidden Markov Model clustering Multi-scale clustering (MSDD)
@ljnmedium
ljnmedium / compare.md
Last active July 17, 2023 08:57
compare.md
pyannote Nemo
Pre-trained models available
Good overlapping speakers detection (multilabel segmentation)
Easy integration with ASR task and downstream NLP tasks
Possibility to specify the number of speaker as a parameter for inference
Automatic detection of the number of speakers
Models available for specific use cases (phone call, outdoor conversation, high quality,…)
Highly customizable pipeline
@ljnmedium
ljnmedium / perform1.md
Created July 12, 2023 13:02
perform1.md
Model DER CDER BER MD FA SC
pyannote 0.10 0.14 0.18 0.01 0.06 0.03
NeMo - default parameters 0.37 0.32 0.44 0.36 0.01 0.01
NeMo - optimized VAD parameters 0.11 0.16 0.15 0.04 0.06 0.01
@ljnmedium
ljnmedium / performe2.md
Created July 12, 2023 13:01
performe2.md
Model DER CDER BER MS FA SC
Pyannote - 7 clusters specified 0.49 0.80 0.74 0.07 0.23 0.19
Nemo - no cluster number specified 0.17 0.84 0.24 0.08 0.07 0.02
Nemo - manual parameter tuning 0.12 0.15
@ljnmedium
ljnmedium / metric.md
Created July 12, 2023 12:37
metric.md
Error Definition
False Alarm Speech segment predicted where there is no speaker (False positive from VAD model)
Missed Detection No speech detected where there is a speaker (False negative from VAD model)
Confusion Speech is in the wrong cluster (error from the clustering model)
@ljnmedium
ljnmedium / ex.md
Created July 12, 2023 12:35
exemple.md
Error Definition
False Alarm Speech segment predicted where there is no speaker (False positive from VAD model)
Missed Detection No speech detected where there is a speaker (False negative from VAD model)
Confusion Speech is in the wrong cluster (error from the clustering model)