Docker image containing several tools for tinkering with audio and video files.
The Dockerfile is an edit of the Dockerfile from kevinhughes27/audiogrep-docker that includes a patch and additional utilities and a shared folder. The other files from that repository are required to build the image.
Original audiogrep
docs here: antiboredom/audiogrep
See also these examples of what audiogrep
can do.
audiogrep
also makes use of:
- cmusphinx/pocketsphinx for automatic transcription
- pydub to splice files together
#create shared folder on host
mkdir -p files
#The build around the Dockerfile needs to be in the context of other files from: https://github.com/kevinhughes27/audiogrep-docker
docker build -t psychemedia/avgrep .
#Transcribe an audio file
docker run --volume "${PWD}/files":/avgrepfiles --tty --interactive --rm psychemedia/avgrep audiogrep --input avgrepfiles/MYFILE.mp3 --transcribe
#The transcription seems to chunk the audio file and produce a transcript for each as a separate file
#The audiogrep search seems to want a single trasncript with a different filename
#Create the single transcript file
cat files/MYFILE*.txt >> MYFILE.mp3.transcription.txt
#Generate a supercut
docker run --volume "${PWD}/files":/avgrepfiles --tty --interactive --rm psychemedia/avgrep audiogrep --input /avgrepfiles/MYFILE.mp3 --search 'transparency | honest | health' --output /avgrepfiles/supercut.mp3 --regex --output-mode word
videogrep
is also included in the container, but untested. Original videogrep
docs here: antiboredom/videogrep
See also this example of what videogrep
can do.
To help grab files from YouTube, youtube_dl
is also included in the container.
Usage is along the lines of:
docker run --volume "${PWD}/files":/audiogrepfiles --tty --interactive --rm psychemedia/avgrep youtube-dl --extract-audio --audio-format mp3 -o '/avgrepfiles/%(id)s.mp3' https://www.youtube.com/watch?v=YOUTUBE_ID
Using a couple of test audio files with UK English speakers, I couldn't replicate anything like the original demos. Transcription was poor, the timing seemed really off (and didn't match searched for words), and some of the splices were of very long segments (minutes long). In the transcript, only single words seemed to be indentified, so I'm not sure how phrase identification is supposed to work.
I haven't looked at the code, but it might be worth generating a view reports over the extracted words to help identify sensible phrases. Something like nltk
concordancing relative to a single word or multiple words would add another dimension to the reporting, and help the user spot keyword keyed phrases in the text, rather than the audio. (Adding the ability for the concordancer to act on OR'd words is a feature we can perhaps take away from audiogrep
- I'll add it to my to do list!;-)