Skip to content

Instantly share code, notes, and snippets.

@dtinth
Created August 13, 2019 12:59
Show Gist options
  • Save dtinth/955be399d4b8442344db02c17a64ddca to your computer and use it in GitHub Desktop.
Save dtinth/955be399d4b8442344db02c17a64ddca to your computer and use it in GitHub Desktop.
Transcribing Thai YouTube video using Google Cloud

How to transcribe Thai speech in videos into text.

Requirements

  • Google Cloud or Firebase project with billing enabled.

  • gcloud command line tool installed.

  • ffmpeg or Docker.

  • youtube-dl to download YouTube videos.

  • 30 Baht per 1 hour of input.

Step 1: Grab the audio track

Example, from YouTube, using youtube-dl:

youtube-dl -f bestaudio 'https://www.youtube.com/watch?v=..........'

Step 2: Convert

We need to convert a audio into a format that is supported by Google Cloud APIs. We will use OGG Opus.

docker run -v "$PWD:/data" jrottenberg/ffmpeg -i "/data/<FILENAME>.m4a" -c:a libopus -ar 16000 -ac 1 "/data/<FILENAME>.ogg"

To cut a portion of audio, put -ss <START TIME> -t <DURATION> before -i. For example, -ss 01:38:23 -t 00:30:00.

Step 3: Recognize

  1. Upload the ogg file to Google/Firebase Cloud Storage. After uploading, you will get a <STORAGE LOCATION> such as gs://<PROJECT>.appspot.com/transcribe/<FILENAME>.ogg.

  2. Start the transcription:

    gcloud ml speech recognize-long-running "<STORAGE LOCATION>" --language-code=th --encoding=ogg-opus --include-word-time-offsets --sample-rate=16000 --async

    It will print out:

    {
      "name": "5766027198115285298"
    }

    This is your <OPERATION ID>.

  3. Wait for the operation to finish and write the results to the file.

    gcloud ml speech operations wait "<OPERATION ID>" > "<FILENAME>.json"

View the JSON file.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment