Leverage Whisper AI to transcribe any video or audio file hosted on your computer.

Prerequisites

This script assumes you have r2 alias set for

r2='s5cmd --credentials-file ~/.config/s3/r2-config.cfg --endpoint-url https://{your bucket id}.r2.cloudflarestorage.com'

This script uses s5cmd as an s3 client which can be installed using brew install s5cmd, or equivalent on other systems.
Provide Replicate token

export REPLICATE_API_TOKEN=

Process

Upload video to S3 on Cloudflare R2 bucket. This will create as web accessible link so we don't have to kill ourselves uploading the file directly. The bucket files are not browsable by public so they're still private.

S3_PATH="s3://tmp/7d/{{destination_name}}"
r2 cp --sp {{source_path}} $S3_PATH

Couple notes: Our tmp bucket is configured with two rules that delete files that start with either 1d and 7d prefix respectively. The rule of course matches the prefix duration. As the bucket name implies, this bucket shouldn't be used for anything that should be persistent. Each file is wiped on its 30 days year old birthday, aka, it gets removed. Please see section Object lifecycle rules in your bucket settings.

After uploading, get a publicly accesible link - that is presign the URL.

SIGNED_URL=$(r2 presign --expire 1h $S3_PATH)
echo $SIGNED_URL

Submit to Replicate

REPLICATE_RES=$(curl --location 'https://api.replicate.com/v1/predictions' \
--header 'Content-Type: application/json' \
--header "Authorization: Token $REPLICATE_API_TOKEN" \
--data @- <<EOF
{
    "version": "4f41e90243af171da918f04da3e526b2c247065583ea9b757f2071f573965408",
    "input": {
        "url": "$SIGNED_URL",
        "task": "transcribe",
        "timestamp": "chunk",
        "batch_size": 64,
        "language": "en"
    }
}
EOF
)
echo $REPLICATE_RES | jq
WHISPER_ID=$(echo $REPLICATE_RES | jq -r '.id')
echo $WHISPER_ID

This is a cold-started ML endpoint. It may take up to 3 minutes.

Get the results using the id in the response

OUTPUT_FILE="{{output_filename}}.json"
echo $OUTPUT_FILE
curl --location "https://api.replicate.com/v1/predictions/$WHISPER_ID" \
--header "Authorization: Token $REPLICATE_API_TOKEN" | jq > $OUTPUT_FILE

Get the complete transcript

jq '.output.text' $OUTPUT_FILE

Use AI to summarize

This is an example of how we could use AI to extract key points. I am currently using Fabric by Daniel Miessler repository. In short it provides a set of convenience prompts to submit together with the input text, that can be accepted as a piped input, while not leaving the terminal.

One of my favorite is extract_wisdom which was designed for YouTube videos.

jq '.output.text' $OUTPUT_FILE | fabric -p extract_wisdom -s

flexchar/video-to-json.md

Prerequisites

Process

Use AI to summarize