How to transcribe Thai speech in videos into text.
-
Google Cloud or Firebase project with billing enabled.
-
ffmpeg
or Docker. -
youtube-dl
to download YouTube videos. -
30 Baht per 1 hour of input.
Example, from YouTube, using youtube-dl
:
youtube-dl -f bestaudio 'https://www.youtube.com/watch?v=..........'
We need to convert a audio into a format that is supported by Google Cloud APIs. We will use OGG Opus.
docker run -v "$PWD:/data" jrottenberg/ffmpeg -i "/data/<FILENAME>.m4a" -c:a libopus -ar 16000 -ac 1 "/data/<FILENAME>.ogg"
To cut a portion of audio, put -ss <START TIME> -t <DURATION>
before -i
. For example, -ss 01:38:23 -t 00:30:00
.
-
Upload the ogg file to Google/Firebase Cloud Storage. After uploading, you will get a
<STORAGE LOCATION>
such asgs://<PROJECT>.appspot.com/transcribe/<FILENAME>.ogg
. -
Start the transcription:
gcloud ml speech recognize-long-running "<STORAGE LOCATION>" --language-code=th --encoding=ogg-opus --include-word-time-offsets --sample-rate=16000 --async
It will print out:
{ "name": "5766027198115285298" }
This is your
<OPERATION ID>
. -
Wait for the operation to finish and write the results to the file.
gcloud ml speech operations wait "<OPERATION ID>" > "<FILENAME>.json"
View the JSON file.