Transcribing Thai YouTube video using Google Cloud

How to transcribe Thai speech in videos into text.

Requirements

Google Cloud or Firebase project with billing enabled.
gcloud command line tool installed.
ffmpeg or Docker.
youtube-dl to download YouTube videos.
30 Baht per 1 hour of input.

Step 1: Grab the audio track

Example, from YouTube, using youtube-dl:

youtube-dl -f bestaudio 'https://www.youtube.com/watch?v=..........'

Step 2: Convert

We need to convert a audio into a format that is supported by Google Cloud APIs. We will use OGG Opus.

docker run -v "$PWD:/data" jrottenberg/ffmpeg -i "/data/<FILENAME>.m4a" -c:a libopus -ar 16000 -ac 1 "/data/<FILENAME>.ogg"

To cut a portion of audio, put -ss <START TIME> -t <DURATION> before -i. For example, -ss 01:38:23 -t 00:30:00.

Step 3: Recognize

Upload the ogg file to Google/Firebase Cloud Storage. After uploading, you will get a <STORAGE LOCATION> such as gs://<PROJECT>.appspot.com/transcribe/<FILENAME>.ogg.

Start the transcription:

gcloud ml speech recognize-long-running "<STORAGE LOCATION>" --language-code=th --encoding=ogg-opus --include-word-time-offsets --sample-rate=16000 --async

It will print out:

{
  "name": "5766027198115285298"
}

This is your <OPERATION ID>.

Wait for the operation to finish and write the results to the file.

gcloud ml speech operations wait "<OPERATION ID>" > "<FILENAME>.json"

View the JSON file.

dtinth/README.md

Requirements

Step 1: Grab the audio track

Step 2: Convert

Step 3: Recognize