How to write a transcription of a video describing a UNIX spell checker

In progress: Attempting to write a set of command-line steps to process IBM Watson Speech-to-Text data to transcribe an old video about UNIX.

UNIX: Making Computers Easier To Use -- AT&T Archives film from 1982, Bell Laboratories

via this HN discussion: https://news.ycombinator.com/item?id=10789019

The discussion of the spell checker starts at the 5th minute with Brian Kernighan: https://www.youtube.com/watch?v=XvDZLjaCJuw&t=5m15s

It continues at the 13th minute with Lorinda Cherry: https://youtu.be/XvDZLjaCJuw?t=13m47s

$ youtube-dl https://www.youtube.com/watch?v=XvDZLjaCJuw \
      --keep-video \
      --extract-audio \
      --audio-format wav \
      --audio-quality 0 \
      --restrict-filenames

Output:

[youtube] Setting language
[youtube] Confirming age
[youtube] XvDZLjaCJuw: Downloading webpage
[youtube] XvDZLjaCJuw: Downloading video info webpage
[youtube] XvDZLjaCJuw: Extracting video information
[download] Destination: UNIX_-_Making_Computers_Easier_To_Use_--_AT_T_Archives_film_from_1982_Bell_Laboratories-XvDZLjaCJuw.mp4

Cut a 10 second sample at a sample rate of 16000, the max allowed for Microsoft's Oxford:

$ avconv -ss 00:15:15 \
         -t 00:00:10  \
         -i UNIX_-_Making_Computers_Easier_To_Use_--_AT_T_Archives_film_from_1982_Bell_Laboratories-XvDZLjaCJuw.wav \
         -ar 16000 \
         unixaudio-sample.wav

Project Oxford

REST Speech API reference

Your application must endpoint the audio to determine start and end of speech. The endpoints specify to the service the start and end of the request. You may not upload more than 10 seconds of audio in any one request and the total request duration cannot exceed 14 seconds.

Get the access token

$ curl -X POST \
  -d 'grant_type=client_credentials' \
  -d 'client_id=dansfootest' \
  -d "client_secret=$OXFORD_CLIENT_SECRET" \
  -d "scope=https://speech.platform.bing.com" \
  https://oxford-speech.cloudapp.net/token/issueToken

Result

{
    "access_token": "XXXX.YYYY.ZZZZZ",
    "expires_in": "600",
    "scope": "https://speech.platform.bing.com",
    "token_type": "jwt"
}

Call the recognize endpoint

$ FILENAME=unixaudio-sample.wav 
$ curl --request POST \
    -H "Authorization: Bearer $OXFORD_ACCESS_TOKEN" \
    -H 'Content-Type: audio/wav; samplerate=16000' \
    -H "Content-Length: $(wc -c < $FILENAME | tr -d ' ')" \
    --data-binary "@$FILENAME" \
    "https://speech.platform.bing.com/recognize?version=3.0&requestid=$(uuid)&instanceid=$(uuid)&device.os=osx&locale=en-US&format=json&appID=D4D52672-91D7-4C74-8AD8-42B1D98141A5&scenarios=ulm&result.profanitymarkup=0&maxnbest=3"

Result

{
    "header": {
        "lexical": "jack and i can check on my check in again i'll get up get out my spelling ass off i wanna show you another example i have a desktop",
        "name": "jack and I can check on my check in again I'll get up get out my spelling ass off I wanna show you another example I have a desktop",
        "properties": {
            "HIGHCONF": "1",
            "requestid": "abc-def-jkl-mno-xyz"
        },
        "scenario": "ulm",
        "status": "success"
    },
    "results": [
        {
            "confidence": "0.6608185",
            "lexical": "jack and i can check on my check in again i'll get up get out my spelling ass off i wanna show you another example i have a desktop",
            "name": "jack and I can check on my check in again I'll get up get out my spelling ass off I wanna show you another example I have a desktop",
            "properties": {
                "HIGHCONF": "1"
            },
            "scenario": "ulm"
        }
    ],
    "version": "3.0"
}

IBM Watson's Speech to Text API

https://www.ibm.com/smarterplanet/us/en/ibmwatson/developercloud/speech-to-text/api/v1/#recognize

$ curl -u "$WATSON_USERNAME":"$WATSON_PASSWORD" \
  -H  "content-type: audio/wav" \
  --data-binary @"$FILENAME" \
  "https://stream.watsonplatform.net/speech-to-text/api/v1/recognize?continuous=true&max_alternatives=1&timestamps=true&word_confidence=true"

Result

{
    "result_index": 0,
    "results": [
        {
            "alternatives": [
                {
                    "confidence": 0.0,
                    "timestamps": [],
                    "transcript": "yeah "
                }
            ],
            "final": true
        },
        {
            "alternatives": [
                {
                    "confidence": 0.83,
                    "timestamps": [
                        [
                            "and",
                            0.83,
                            1.0
                        ],
                        [
                            "I",
                            1.0,
                            1.08
                        ],
                        [
                            "can",
                            1.08,
                            1.26
                        ],
                        [
                            "run",
                            1.26,
                            1.45
                        ],
                        [
                            "a",
                            1.45,
                            1.48
                        ],
                        [
                            "check",
                            1.48,
                            1.78
                        ],
                        [
                            "on",
                            1.78,
                            1.89
                        ],
                        [
                            "my",
                            1.89,
                            2.07
                        ],
                        [
                            "tax",
                            2.07,
                            2.6
                        ],
                        [
                            "and",
                            2.74,
                            2.89
                        ],
                        [
                            "again",
                            2.89,
                            3.23
                        ],
                        [
                            "I'll",
                            3.23,
                            3.4
                        ],
                        [
                            "get",
                            3.4,
                            3.57
                        ],
                        [
                            "up",
                            3.57,
                            3.68
                        ],
                        [
                            "get",
                            3.77,
                            3.95
                        ],
                        [
                            "out",
                            3.95,
                            4.11
                        ],
                        [
                            "my",
                            4.11,
                            4.21
                        ],
                        [
                            "spelling",
                            4.21,
                            4.65
                        ],
                        [
                            "errors",
                            4.65,
                            5.08
                        ]
                    ],
                    "transcript": "and I can run a check on my tax and again I'll get up get out my spelling errors ",
                    "word_confidence": [
                        [
                            "and",
                            1.0
                        ],
                        [
                            "I",
                            1.0
                        ],
                        [
                            "can",
                            0.7779403827740854
                        ],
                        [
                            "run",
                            0.26502452550612693
                        ],
                        [
                            "a",
                            0.15611111025600333
                        ],
                        [
                            "check",
                            1.0
                        ],
                        [
                            "on",
                            1.0
                        ],
                        [
                            "my",
                            1.0
                        ],
                        [
                            "tax",
                            0.4133005116503248
                        ],
                        [
                            "and",
                            1.0
                        ],
                        [
                            "again",
                            0.8481161519123086
                        ],
                        [
                            "I'll",
                            0.7681370465998621
                        ],
                        [
                            "get",
                            1.0
                        ],
                        [
                            "up",
                            0.30116076025256117
                        ],
                        [
                            "get",
                            1.0
                        ],
                        [
                            "out",
                            1.0
                        ],
                        [
                            "my",
                            1.0
                        ],
                        [
                            "spelling",
                            1.0
                        ],
                        [
                            "errors",
                            0.9969654445124339
                        ]
                    ]
                }
            ],
            "final": true
        },
        {
            "alternatives": [
                {
                    "confidence": 0.875,
                    "timestamps": [
                        [
                            "%HESITATION",
                            6.16,
                            6.52
                        ],
                        [
                            "let",
                            7.1,
                            7.3
                        ],
                        [
                            "me",
                            7.3,
                            7.37
                        ],
                        [
                            "show",
                            7.37,
                            7.53
                        ],
                        [
                            "you",
                            7.53,
                            7.64
                        ],
                        [
                            "another",
                            7.64,
                            7.92
                        ],
                        [
                            "example",
                            7.92,
                            8.49
                        ]
                    ],
                    "transcript": "%HESITATION let me show you another example ",
                    "word_confidence": [
                        [
                            "%HESITATION",
                            0.5481795076045378
                        ],
                        [
                            "let",
                            0.771839641482555
                        ],
                        [
                            "me",
                            0.9999999999999625
                        ],
                        [
                            "show",
                            0.970320300917179
                        ],
                        [
                            "you",
                            0.9999999999999634
                        ],
                        [
                            "another",
                            0.9999999999999755
                        ],
                        [
                            "example",
                            0.9894367798458484
                        ]
                    ]
                }
            ],
            "final": true
        },
        {
            "alternatives": [
                {
                    "confidence": 0.973,
                    "timestamps": [
                        [
                            "I",
                            8.98,
                            9.14
                        ],
                        [
                            "have",
                            9.14,
                            9.27
                        ],
                        [
                            "a",
                            9.27,
                            9.31
                        ],
                        [
                            "desk",
                            9.31,
                            9.65
                        ],
                        [
                            "jockey",
                            9.65,
                            9.92
                        ]
                    ],
                    "transcript": "I have a desk jockey ",
                    "word_confidence": [
                        [
                            "I",
                            0.9999999999999918
                        ],
                        [
                            "have",
                            0.802175018932835
                        ],
                        [
                            "a",
                            0.9999999999999989
                        ],
                        [
                            "desk",
                            0.9999999999999988
                        ],
                        [
                            "jockey",
                            0.999999999999999
                        ]
                    ]
                }
            ],
            "final": true
        }
    ]
}

dannguyen/unix-spell-checker-fun.md

Select an option

No results found

Select an option

No results found

UNIX: Making Computers Easier To Use -- AT&T Archives film from 1982, Bell Laboratories

Project Oxford

Get the access token

Result

Call the recognize endpoint

Result

IBM Watson's Speech to Text API

Result