Skip to content

Instantly share code, notes, and snippets.

@sasasin
Last active November 12, 2023 13:40
Show Gist options
  • Save sasasin/ed51d363f2a850f3c758357bcc1acbf3 to your computer and use it in GitHub Desktop.
Save sasasin/ed51d363f2a850f3c758357bcc1acbf3 to your computer and use it in GitHub Desktop.
mp3から要約テキストを得るまでの一連の作業をまとめたシェルスクリプト
#!/bin/bash -v
BOOK_NAME=$1
# 1ファイルに結合する
ls *.mp3 \
| sort -u \
| awk '{print "file " $0 "" }' \
> ${BOOK_NAME}.mp3-list.txt
ffmpeg \
-f concat \
-safe 0 \
-i ${BOOK_NAME}.mp3-list.txt \
-c copy ${BOOK_NAME}.mp3
# 時間で分割する
ffmpeg -i ${BOOK_NAME}.mp3 \
-f segment -segment_time 900 -c copy \
${BOOK_NAME}.%03d.mp3
# 文字起こし
ls ${BOOK_NAME}.*.mp3 \
| sort -u \
| while read MP3_FILE_NAME; do
# 文字起こし
echo "generating speech ${MP3_FILE_NAME} to text ${MP3_FILE_NAME}.transcribe.txt"
curl \
--request POST \
--silent \
--url https://api.openai.com/v1/audio/transcriptions \
--header "Authorization: Bearer ${OPENAI_API_KEY}" \
--header "Content-Type: multipart/form-data" \
--form model=whisper-1 \
--form file=@${MP3_FILE_NAME} \
> ${MP3_FILE_NAME}.transcribe.txt
# API Rate limit を考慮して待つ
sleep 60;
done
# 要約
ls ${BOOK_NAME}.*.mp3 \
| sort -u \
| while read MP3_FILE_NAME; do
# 要約
echo "summerize text ${MP3_FILE_NAME}.transcribe.txt to ${MP3_FILE_NAME}.summerize.txt"
cat ${MP3_FILE_NAME}.transcribe.txt \
| python3 ~/bin/summerize-stdin-by-bedrock.py \
> ${MP3_FILE_NAME}.summerize.txt
# API Rate limit を考慮して待つ
sleep 60;
done
# 全要約テキストをさらに要約する
cat ${BOOK_NAME}.*.summerize.txt \
| python3 ~/bin/summerize-stdin-by-bedrock.py \
> ${BOOK_NAME}.summerize.txt
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment