Skip to content

Instantly share code, notes, and snippets.

@winlinvip
Last active June 8, 2025 12:36
Show Gist options
  • Save winlinvip/b37032baf06a00328b9490f14ddaac42 to your computer and use it in GitHub Desktop.
Save winlinvip/b37032baf06a00328b9490f14ddaac42 to your computer and use it in GitHub Desktop.
English 4/3/2 Spoken Coach
# Required.
OPENAI_API_KEY=your-key
# Optional. For O3 feedback, you need to verify your orgnization at https://platform.openai.com/settings/organization/general.
OPENAI_ORG_ID=your-verified-orgnization
# Optional.
OPENAI_PROXY=https://api.openai.com/v1

Usage

Setup the OPENAI_API_KEY in .env file. If you want to use O3 feedback, you should also set the OPENAI_ORG_ID in .env file, after verified your orgnization at here.

Create a directory and download all files there:

mkdir -p ~/git/coach432

Create a virtual environment:

cd ~/git/coach432 && python3 -m venv venv

Activate the virtual environment:

cd ~/git/coach432 && source venv/bin/activate

Install the required packages:

cd ~/git/coach432 && pip install -r requirements.txt

Record your 4/3/2 audio, see How to Use AI to improve Spoken English, or How to Improve Fluency with the 4/3/2 Technique.

  1. Spend one minute jotting down key points for your chosen topic.
  2. Record a 4-minute audio clip explaining the topic in full.
  3. Take a one-minute break to note hesitations and excess wording.
  4. Record a 3-minute version, reusing the same vocabulary but speaking more concisely.
  5. Pause for another one-minute break and mentally streamline your outline.
  6. Record a final 2-minute version, aiming for smooth, fluent delivery.

Convert your 2 minutes audio file to mp3 format using ffmpeg and analyze it:

cd ~/git/coach432 && ffmpeg -i ~/Downloads/*.m4a -c:a mp3 -y ./input.mp3 >/dev/null 2>&1 && python main.py --audio ./input.mp3

You can run it for more times to get different feedbacks:

cd ~/git/coach432 && python main.py --audio ./input.mp3

Example output:

Input: ./input.mp3
Model: gpt-4o-audio-preview

Quick Stats
• Length (sec): 141
• Words per Minute (≈): 94
• Silent Pauses ≥0.5 s (count): 18
• Fillers (“uh/um/like”) (count): 3
• IELTS Score (4-9): 5.5
• CELPIP Score (4-12): 6

Most Impactful Fluency Issue
Issue Type: Fillers and Hesitations
Example: "Um... after the retreat event... uh, I and my family visited the Niagara Falls because... it is very near."
Better version: "After the retreat event, my family and I visited Niagara Falls because it was very close by."

O3 Feedback:
這份反饋是合理的,而且把「猶豫+重複」列為首要改善目標也很合適

You can find the gist here: https://gist.github.com/winlinvip/b37032baf06a00328b9490f14ddaac42

'''
Usage, see README.md
'''
from dotenv import load_dotenv, find_dotenv
_ = load_dotenv(find_dotenv(".env")) # read local .env file
import os, base64, sys, threading, argparse
import sounddevice as sd
import soundfile as sf
parser = argparse.ArgumentParser(description="User arguments for audio analysis")
parser.add_argument('--audio', type=str, required=True, help='Your audio clip path, must be in mp3 format.')
args = parser.parse_args()
from openai import OpenAI
client = OpenAI(
api_key=os.environ.get("OPENAI_API_KEY"),
base_url=os.environ.get("OPENAI_PROXY"),
organization=os.environ.get("OPENAI_ORG_ID"),
)
if os.environ.get("OPENAI_ORG_ID") is None:
print('O3 feedback: Ignored, for you did not set the OPENAI_ORG_ID in .env file.')
############################################################
print(f'Input: {args.audio}')
with open(args.audio, "rb") as audio_file:
audio_bytes = audio_file.read()
audio_base64 = base64.b64encode(audio_bytes).decode("utf-8")
model = 'gpt-4o-audio-preview'
print(f'Model: {model}')
systemPrompt = '''
You are an IELTS English fluency coach using the 4/3/2 exercise method proposed by Paul Nation in "Teaching ESL/EFL Listening and Speaking."
You will analyze the uploaded English audio clips, examining the content to identify key factors affecting spoken fluency.
You should go though the audio clip from beginning to end, then find the most important issue that affects the fluency.
You should only output the Quick Stats, no other information.
OUTPUT TEMPLATE
Quick Stats
• Length (sec): X
• Words per Minute (≈): X
• Silent Pauses ≥0.5 s (count): X
• Fillers (“uh/um/like”) (count): X
• IELTS Score (4-9): X
• CELPIP Score (4-12): X
'''
response = client.chat.completions.create(
model=model,
messages=[
{"role": "system", "content": systemPrompt},
{"role": "user", "content": [
{"type": "text", "text": "This is an audio clip for analysis. Please provide feedback on the spoken fluency."},
{"type": "input_audio", "input_audio": { "data": audio_base64, "format": "mp3" }},
]}
],
modalities=["text"],
temperature=1,
max_completion_tokens=2048,
top_p=1,
frequency_penalty=0,
presence_penalty=0
)
text = response.choices[0].message.content
print(f'\n{text}')
############################################################
model = 'gpt-4o-audio-preview'
systemPrompt = '''
You are an IELTS English fluency coach using the 4/3/2 exercise method proposed by Paul Nation in "Teaching ESL/EFL Listening and Speaking."
You will analyze the uploaded English audio clips, examining the content to identify key factors affecting spoken fluency.
You should go though the audio clip from beginning to end, then find the most important issue that affects the fluency.
Your feedback is specific and targeted, helping users improve their speaking ability in real communication. You provide only one most important piece of feedback.
For each of your feedback, you must give what I said as examples, should never give feedback without example. Then you should provide examples for how to improve it.
Your feedback should focus on fluency, not on accuracy, grammar, or vocabulary.
You should only output the Most Impactful Fluency Issue, no other information.
OUTPUT TEMPLATE
Most Impactful Fluency Issue
Issue Type: “…”
Example: “…”
Better version: “…”
'''
response = client.chat.completions.create(
model=model,
messages=[
{"role": "system", "content": systemPrompt},
{"role": "user", "content": [
{"type": "text", "text": "This is an audio clip for analysis. Please provide feedback on the spoken fluency."},
{"type": "input_audio", "input_audio": { "data": audio_base64, "format": "mp3" }},
]}
],
modalities=["text", "audio"],
audio= {
"voice": "alloy",
"format": "wav"
},
temperature=1,
max_completion_tokens=2048,
top_p=1,
frequency_penalty=0,
presence_penalty=0
)
text = response.choices[0].message.audio.transcript
print(f'\n{text}')
audio_data = response.choices[0].message.audio.data
audio_bytes = base64.b64decode(audio_data)
with open("response.wav", "wb") as out_file:
out_file.write(audio_bytes)
def play_audio(file_path):
data, fs = sf.read(file_path, dtype='float32')
sd.play(data, fs)
sd.wait()
play_thread = threading.Thread(target=play_audio, args=('./response.wav',))
play_thread.start()
############################################################
def o3_feedback():
if os.environ.get("OPENAI_ORG_ID") is None:
return
print(f'\nO3 Feedback:')
text='''
Most Impactful Fluency Issue
Issue Type: Hesitation and Repetition
Example: "We visited, uh, Niagara Falls because, uh, it is very near the, uh, retreat event."
Better version: “We visited Niagara Falls because it's very near the retreat event.”
'''
# Use O3 to analyze the result.
model = 'o3'
systemPrompt = '''
You are an IELTS English fluency coach using the 4/3/2 exercise method proposed by Paul Nation in "Teaching ESL/EFL Listening and Speaking."
You will help me about the 4/3/2 training.
You will response in less than 300 words.
Please answer in Chinese.
'''
response = client.chat.completions.create(
model=model,
messages=[
{"role": "system", "content": systemPrompt},
{"role": "user", "content": [
{"type": "text", "text": f"{text}"},
{"type": "text", "text": "I received feedback from the audio analysis. Is this feedback reasonable, and should this issue be the main focus?"},
]}
],
response_format={"type": "text"},
reasoning_effort="medium",
stream=True,
)
for chunk in response:
content = chunk.choices[0].delta.content
if content:
sys.stdout.write(content)
sys.stdout.flush()
o3_thread = threading.Thread(target=o3_feedback, args=())
o3_thread.start()
############################################################
play_thread.join()
o3_thread.join()
print('')
annotated-types==0.7.0
anyio==4.9.0
certifi==2025.4.26
cffi==1.17.1
distro==1.9.0
h11==0.16.0
httpcore==1.0.9
httpx==0.28.1
idna==3.10
jiter==0.10.0
numpy==2.3.0
openai==1.84.0
pycparser==2.22
pydantic==2.11.5
pydantic_core==2.33.2
python-dotenv==1.1.0
sniffio==1.3.1
sounddevice==0.5.2
soundfile==0.13.1
tqdm==4.67.1
typing-inspection==0.4.1
typing_extensions==4.14.0
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment