Skip to content

Instantly share code, notes, and snippets.

@dannguyen
Last active February 5, 2016 02:12
Show Gist options
  • Select an option

  • Save dannguyen/fdd5e466d32a74be3344 to your computer and use it in GitHub Desktop.

Select an option

Save dannguyen/fdd5e466d32a74be3344 to your computer and use it in GitHub Desktop.
How to write a transcription of a video describing a UNIX spell checker

In progress: Attempting to write a set of command-line steps to process IBM Watson Speech-to-Text data to transcribe an old video about UNIX.

UNIX: Making Computers Easier To Use -- AT&T Archives film from 1982, Bell Laboratories

via this HN discussion: https://news.ycombinator.com/item?id=10789019

The discussion of the spell checker starts at the 5th minute with Brian Kernighan: https://www.youtube.com/watch?v=XvDZLjaCJuw&t=5m15s

It continues at the 13th minute with Lorinda Cherry: https://youtu.be/XvDZLjaCJuw?t=13m47s

$ youtube-dl https://www.youtube.com/watch?v=XvDZLjaCJuw \
      --keep-video \
      --extract-audio \
      --audio-format wav \
      --audio-quality 0 \
      --restrict-filenames 

Output:

[youtube] Setting language
[youtube] Confirming age
[youtube] XvDZLjaCJuw: Downloading webpage
[youtube] XvDZLjaCJuw: Downloading video info webpage
[youtube] XvDZLjaCJuw: Extracting video information
[download] Destination: UNIX_-_Making_Computers_Easier_To_Use_--_AT_T_Archives_film_from_1982_Bell_Laboratories-XvDZLjaCJuw.mp4

Cut a 10 second sample at a sample rate of 16000, the max allowed for Microsoft's Oxford:

$ avconv -ss 00:15:15 \
         -t 00:00:10  \
         -i UNIX_-_Making_Computers_Easier_To_Use_--_AT_T_Archives_film_from_1982_Bell_Laboratories-XvDZLjaCJuw.wav \
         -ar 16000 \
         unixaudio-sample.wav

Project Oxford

Your application must endpoint the audio to determine start and end of speech. The endpoints specify to the service the start and end of the request. You may not upload more than 10 seconds of audio in any one request and the total request duration cannot exceed 14 seconds.

Get the access token

$ curl -X POST \
  -d 'grant_type=client_credentials' \
  -d 'client_id=dansfootest' \
  -d "client_secret=$OXFORD_CLIENT_SECRET" \
  -d "scope=https://speech.platform.bing.com" \
  https://oxford-speech.cloudapp.net/token/issueToken

Result

{
    "access_token": "XXXX.YYYY.ZZZZZ",
    "expires_in": "600",
    "scope": "https://speech.platform.bing.com",
    "token_type": "jwt"
}

Call the recognize endpoint

$ FILENAME=unixaudio-sample.wav 
$ curl --request POST \
    -H "Authorization: Bearer $OXFORD_ACCESS_TOKEN" \
    -H 'Content-Type: audio/wav; samplerate=16000' \
    -H "Content-Length: $(wc -c < $FILENAME | tr -d ' ')" \
    --data-binary "@$FILENAME" \
    "https://speech.platform.bing.com/recognize?version=3.0&requestid=$(uuid)&instanceid=$(uuid)&device.os=osx&locale=en-US&format=json&appID=D4D52672-91D7-4C74-8AD8-42B1D98141A5&scenarios=ulm&result.profanitymarkup=0&maxnbest=3"

Result

{
    "header": {
        "lexical": "jack and i can check on my check in again i'll get up get out my spelling ass off i wanna show you another example i have a desktop",
        "name": "jack and I can check on my check in again I'll get up get out my spelling ass off I wanna show you another example I have a desktop",
        "properties": {
            "HIGHCONF": "1",
            "requestid": "abc-def-jkl-mno-xyz"
        },
        "scenario": "ulm",
        "status": "success"
    },
    "results": [
        {
            "confidence": "0.6608185",
            "lexical": "jack and i can check on my check in again i'll get up get out my spelling ass off i wanna show you another example i have a desktop",
            "name": "jack and I can check on my check in again I'll get up get out my spelling ass off I wanna show you another example I have a desktop",
            "properties": {
                "HIGHCONF": "1"
            },
            "scenario": "ulm"
        }
    ],
    "version": "3.0"
}

IBM Watson's Speech to Text API

https://www.ibm.com/smarterplanet/us/en/ibmwatson/developercloud/speech-to-text/api/v1/#recognize

$ curl -u "$WATSON_USERNAME":"$WATSON_PASSWORD" \
  -H  "content-type: audio/wav" \
  --data-binary @"$FILENAME" \
  "https://stream.watsonplatform.net/speech-to-text/api/v1/recognize?continuous=true&max_alternatives=1&timestamps=true&word_confidence=true"

Result

{
    "result_index": 0,
    "results": [
        {
            "alternatives": [
                {
                    "confidence": 0.0,
                    "timestamps": [],
                    "transcript": "yeah "
                }
            ],
            "final": true
        },
        {
            "alternatives": [
                {
                    "confidence": 0.83,
                    "timestamps": [
                        [
                            "and",
                            0.83,
                            1.0
                        ],
                        [
                            "I",
                            1.0,
                            1.08
                        ],
                        [
                            "can",
                            1.08,
                            1.26
                        ],
                        [
                            "run",
                            1.26,
                            1.45
                        ],
                        [
                            "a",
                            1.45,
                            1.48
                        ],
                        [
                            "check",
                            1.48,
                            1.78
                        ],
                        [
                            "on",
                            1.78,
                            1.89
                        ],
                        [
                            "my",
                            1.89,
                            2.07
                        ],
                        [
                            "tax",
                            2.07,
                            2.6
                        ],
                        [
                            "and",
                            2.74,
                            2.89
                        ],
                        [
                            "again",
                            2.89,
                            3.23
                        ],
                        [
                            "I'll",
                            3.23,
                            3.4
                        ],
                        [
                            "get",
                            3.4,
                            3.57
                        ],
                        [
                            "up",
                            3.57,
                            3.68
                        ],
                        [
                            "get",
                            3.77,
                            3.95
                        ],
                        [
                            "out",
                            3.95,
                            4.11
                        ],
                        [
                            "my",
                            4.11,
                            4.21
                        ],
                        [
                            "spelling",
                            4.21,
                            4.65
                        ],
                        [
                            "errors",
                            4.65,
                            5.08
                        ]
                    ],
                    "transcript": "and I can run a check on my tax and again I'll get up get out my spelling errors ",
                    "word_confidence": [
                        [
                            "and",
                            1.0
                        ],
                        [
                            "I",
                            1.0
                        ],
                        [
                            "can",
                            0.7779403827740854
                        ],
                        [
                            "run",
                            0.26502452550612693
                        ],
                        [
                            "a",
                            0.15611111025600333
                        ],
                        [
                            "check",
                            1.0
                        ],
                        [
                            "on",
                            1.0
                        ],
                        [
                            "my",
                            1.0
                        ],
                        [
                            "tax",
                            0.4133005116503248
                        ],
                        [
                            "and",
                            1.0
                        ],
                        [
                            "again",
                            0.8481161519123086
                        ],
                        [
                            "I'll",
                            0.7681370465998621
                        ],
                        [
                            "get",
                            1.0
                        ],
                        [
                            "up",
                            0.30116076025256117
                        ],
                        [
                            "get",
                            1.0
                        ],
                        [
                            "out",
                            1.0
                        ],
                        [
                            "my",
                            1.0
                        ],
                        [
                            "spelling",
                            1.0
                        ],
                        [
                            "errors",
                            0.9969654445124339
                        ]
                    ]
                }
            ],
            "final": true
        },
        {
            "alternatives": [
                {
                    "confidence": 0.875,
                    "timestamps": [
                        [
                            "%HESITATION",
                            6.16,
                            6.52
                        ],
                        [
                            "let",
                            7.1,
                            7.3
                        ],
                        [
                            "me",
                            7.3,
                            7.37
                        ],
                        [
                            "show",
                            7.37,
                            7.53
                        ],
                        [
                            "you",
                            7.53,
                            7.64
                        ],
                        [
                            "another",
                            7.64,
                            7.92
                        ],
                        [
                            "example",
                            7.92,
                            8.49
                        ]
                    ],
                    "transcript": "%HESITATION let me show you another example ",
                    "word_confidence": [
                        [
                            "%HESITATION",
                            0.5481795076045378
                        ],
                        [
                            "let",
                            0.771839641482555
                        ],
                        [
                            "me",
                            0.9999999999999625
                        ],
                        [
                            "show",
                            0.970320300917179
                        ],
                        [
                            "you",
                            0.9999999999999634
                        ],
                        [
                            "another",
                            0.9999999999999755
                        ],
                        [
                            "example",
                            0.9894367798458484
                        ]
                    ]
                }
            ],
            "final": true
        },
        {
            "alternatives": [
                {
                    "confidence": 0.973,
                    "timestamps": [
                        [
                            "I",
                            8.98,
                            9.14
                        ],
                        [
                            "have",
                            9.14,
                            9.27
                        ],
                        [
                            "a",
                            9.27,
                            9.31
                        ],
                        [
                            "desk",
                            9.31,
                            9.65
                        ],
                        [
                            "jockey",
                            9.65,
                            9.92
                        ]
                    ],
                    "transcript": "I have a desk jockey ",
                    "word_confidence": [
                        [
                            "I",
                            0.9999999999999918
                        ],
                        [
                            "have",
                            0.802175018932835
                        ],
                        [
                            "a",
                            0.9999999999999989
                        ],
                        [
                            "desk",
                            0.9999999999999988
                        ],
                        [
                            "jockey",
                            0.999999999999999
                        ]
                    ]
                }
            ],
            "final": true
        }
    ]
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment