Here we use ffmpeg command, which can be installed on Ubuntu / Debian using apt-get install ffmpeg.
Next commands can be combined into one, though I prefer to keep each part separate.
- Convert video (
mp4withaacaudio) to audio.
ffmpeg -i video.mp4 -vn -acodec copy video.aac- Convert
aactowav.
ffmpeg -i video.aac audio.wav- Split
wavin parts 60 seconds long.
ffmpeg -i audio.wav -f segment -segment_time 60 -c copy part%03d.mp3For conversion we'll use pretrained model jonatasgrosman/wav2vec2-xls-r-1b-russian
Install HugginSound package and run Python interpreter.
pip install huggingsound
pythonfrom huggingsound import SpeechRecognitionModel
n = 165
model = SpeechRecognitionModel("jonatasgrosman/wav2vec2-xls-r-1b-russian")
audio_paths = ["part%03d.wav" % i for i in range(0,n + 1)]
transcriptions = model.transcribe(audio_paths)
transcriptions