Download Microsoft Teams meeting transcripts to your laptop using the Microsoft Graph API and device-code authentication.
This walkthrough takes you from nothing to a downloaded transcript file. You will register an Azure app, authenticate once, and run the script. Do this in order.
You need an app registration so Microsoft knows who is requesting your transcript files. This is a one-time step.
- Go to portal.azure.com and sign in with your work account.
- Search for App registrations and click New registration.
- Give it any name (e.g.
teams-transcripts-cli). - Under Supported account types, select "Accounts in this organizational directory only."
- Under Redirect URI, choose Public client/native and enter
http://localhost. - Click Register.
You will land on the app's overview page. Copy the Application (client) ID and the Directory (tenant) ID. You need both.
Still in the app registration:
- Click Authentication in the left sidebar. Scroll down to Advanced settings and set "Allow public client flows" to Yes. Save.
- Click API permissions > Add a permission > Microsoft Graph > Delegated permissions.
- Search for and add:
Files.ReadandOnlineMeetings.Read. - Click Add permissions.
You do not need to grant admin consent for these delegated permissions -- they only access your own files.
pip install msal requestsexport TEAMS_CLIENT_ID=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
export TEAMS_TENANT_ID=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxxAdd these to your shell profile (~/.zshrc, ~/.bashrc) so you do not have to re-enter them.
python teams-transcripts.py --days 14 --listYour browser will open (or you will be given a code to enter at microsoft.com/devicelogin). Sign in with your work account. The script will then list transcripts from the last two weeks.
Remove --list to download them:
python teams-transcripts.py --days 14Files land in ~/teams-transcripts/ named {date}_{meeting-title}.vtt.
That is it. Subsequent runs skip the browser -- the token is cached.
python teams-transcripts.py --days 30python teams-transcripts.py --format docxpython teams-transcripts.py --out ~/Documents/meetingspython teams-transcripts.py --listpython teams-transcripts.py --days 7 --format docx --out ~/Desktop/this-weekpython teams-transcripts.py --logoutUse this if you need to authenticate as a different account or if your token stops working.
Create ~/Library/LaunchAgents/com.local.teams-transcripts.plist:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN"
"http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
<key>Label</key>
<string>com.local.teams-transcripts</string>
<key>ProgramArguments</key>
<array>
<string>/usr/bin/python3</string>
<string>/path/to/teams-transcripts.py</string>
<string>--days</string>
<string>1</string>
<string>--out</string>
<string>/Users/you/teams-transcripts</string>
</array>
<key>EnvironmentVariables</key>
<dict>
<key>TEAMS_CLIENT_ID</key>
<string>your-client-id</string>
<key>TEAMS_TENANT_ID</key>
<string>your-tenant-id</string>
</dict>
<key>StartCalendarInterval</key>
<dict>
<key>Hour</key>
<integer>18</integer>
<key>Minute</key>
<integer>0</integer>
</dict>
</dict>
</plist>launchctl load ~/Library/LaunchAgents/com.local.teams-transcripts.plistThis runs the script daily at 6pm and pulls any new transcripts from the last 24 hours.
| Option | Default | Description |
|---|---|---|
--days N |
None (all time) | Limit to transcripts modified in the last N days |
--out PATH |
~/teams-transcripts |
Directory to save downloaded files |
--format vtt|docx |
vtt |
File format to download |
--list |
off | Print transcript list, skip download |
--logout |
off | Clear the cached token and exit |
| Variable | Required | Description |
|---|---|---|
TEAMS_CLIENT_ID |
Yes | Application (client) ID from Azure app registration |
TEAMS_TENANT_ID |
No (default: common) |
Directory (tenant) ID from Azure AD Overview |
Setting TENANT_ID to common allows any Microsoft account to authenticate. Use your specific tenant ID to restrict authentication to your organization.
The script stores a serialized MSAL token cache at ~/.teams-transcripts-token-cache.json. It is created with chmod 600. The cache contains refresh tokens that allow silent re-authentication without a browser. Delete this file or run --logout to force re-authentication.
Downloaded files follow the pattern {YYYY-MM-DD}_{original-name}.{ext}.
The date is the last-modified date of the file in OneDrive, which corresponds to when Teams finished generating the transcript after the meeting ended.
| Permission | Type | Purpose |
|---|---|---|
Files.Read |
Delegated | Search and download files from the signed-in user's OneDrive |
OnlineMeetings.Read |
Delegated | Read meeting metadata |
Both are delegated permissions, meaning the script acts as you and can only access what you can access.
VTT (WebVTT) is a timestamped caption format. Each block contains a time range, the speaker's name, and the spoken text. Tools like ffmpeg and most subtitle editors read it. It is the better choice for programmatic processing.
DOCX is a Word document with the transcript formatted as a table: timestamp, speaker, and text in columns. It is easier to read and edit by hand.
The script uses the Graph API drive search endpoint (/me/drive/search) to find files by extension in your OneDrive. It then filters results by parent path, keeping only items under directories containing Recording, Transcripts, or Microsoft Teams in the path. Teams stores all transcript files in the meeting organizer's OneDrive under one of these path patterns.
The script handles HTTP 429 responses by reading the Retry-After header and sleeping for the specified duration before retrying.
| Package | Version | Purpose |
|---|---|---|
msal |
>= 1.20 | Microsoft Authentication Library for token acquisition and caching |
requests |
>= 2.28 | HTTP client for Graph API calls and file downloads |
Microsoft does not provide a shared public client for the Graph API the way Google does for some of its APIs. Every app that calls the Graph API needs its own registration. The registration is free and does not require admin approval for delegated (user-level) permissions. It exists only to identify your script to Microsoft's auth system.
The app you register here has no special power. It can only access what your signed-in account can already access.
The alternative auth flows (interactive browser or username/password) have practical problems for a command-line script. Browser-based flows open a window and require redirect URL handling. Username/password flow does not work with accounts that use MFA, which is most corporate accounts.
Device-code flow sidesteps both problems. The script gets a short code from Microsoft, prints a URL and the code, and waits. You visit the URL on any device, enter the code, and sign in normally (including MFA). The script then exchanges the completed authentication for tokens. After the first run, the token cache handles re-authentication silently.
The Graph API has a dedicated meetings transcript endpoint (/communications/onlineMeetings/{id}/transcripts). It is the "right" way to access transcripts. In practice, it has two problems: it requires the OnlineMeetings.Read.All application permission (admin-granted, org-wide access) for anything beyond a narrow set of meetings you organized, and it requires you to know the meeting ID in advance.
The OneDrive search approach works with just your own delegated permissions, finds transcripts across meetings you attended as organizer, and requires no admin involvement. The tradeoff is that it only finds transcripts for meetings where you were the organizer (since Teams stores transcripts in the organizer's OneDrive).
If you need transcripts from meetings you attended but did not organize, ask the organizer to share the transcript, or explore the OnlineMeetings.Read.All route with your IT admin.
Teams generates the transcript asynchronously after the meeting ends. The VTT file appears in OneDrive within a few minutes of the meeting ending, but this varies with meeting length and system load. Long meetings (2+ hours) can take 10-15 minutes. The --days filter works off the file's last-modified timestamp in OneDrive, not the meeting's scheduled time, so a meeting from yesterday evening will appear in --days 1 results today even if the transcript was written hours after the meeting ended.
The cached token file contains OAuth refresh tokens. These allow silent re-authentication for as long as the refresh token remains valid (typically 90 days of inactivity, or until you change your password). Treat the cache file like a password. The script sets chmod 600 on it automatically, but be aware of it if you are working on a shared machine.
Once you have transcripts on disk, Claude can turn a folder of .vtt files into a structured daily report: decisions, action items, open questions, and a narrative summary per meeting. This section covers doing that from the command line using the claude CLI and from a script using the Anthropic API.
This assumes you have transcripts downloaded to ~/teams-transcripts/ and the Claude CLI installed.
1. Pull today's transcripts
python teams-transcripts.py --days 1 --out ~/teams-transcripts2. Concatenate them with meeting labels
for f in ~/teams-transcripts/$(date +%Y-%m-%d)_*.vtt; do
echo "=== $(basename $f .vtt) ===" >> /tmp/today.txt
cat "$f" >> /tmp/today.txt
echo >> /tmp/today.txt
done3. Ask Claude to synthesize
cat /tmp/today.txt | claude -p "
You are synthesizing meeting transcripts into a daily report.
For each meeting (delimited by === lines), produce:
- A one-sentence summary of what the meeting was about
- Decisions made (bulleted, attributed to speaker where clear)
- Action items (bulleted, with owner and deadline if mentioned)
- Open questions that were raised but not resolved
After all meetings, write a 2-3 sentence executive summary of the day.
Do not invent details. If something is unclear from the transcript, omit it.
"The report prints to stdout. Redirect to a file or pipe to pbcopy to paste into Notion, email, or wherever your daily standup lives.
DATE=$(date +%Y-%m-%d)
for f in ~/teams-transcripts/${DATE}_*.vtt; do
echo "=== $(basename $f .vtt) ===" >> /tmp/today.txt
cat "$f" >> /tmp/today.txt
done
cat /tmp/today.txt | claude -p "$(cat prompt.txt)" > ~/reports/${DATE}-daily-report.md
rm /tmp/today.txtKeep your prompt in prompt.txt so you can tune it without editing the pipeline.
cat ~/teams-transcripts/2024-03-15_architecture-review.vtt | claude -p "
Summarize this meeting. Extract decisions, action items with owners, and any risks flagged.
Format as Markdown.
"Add this to your shell profile or a wrapper script:
function teams-daily() {
DATE=$(date +%Y-%m-%d)
python ~/bin/teams-transcripts.py --days 1 --out ~/teams-transcripts
for f in ~/teams-transcripts/${DATE}_*.vtt; do
echo "=== $(basename $f .vtt) ===" >> /tmp/today.txt
cat "$f" >> /tmp/today.txt
done
cat /tmp/today.txt \
| claude -p "$(cat ~/bin/teams-report-prompt.txt)" \
> ~/reports/${DATE}-daily-report.md
rm /tmp/today.txt
echo "Report written to ~/reports/${DATE}-daily-report.md"
}Install the SDK:
pip install anthropic#!/usr/bin/env python3
"""synthesize-report.py — generate a daily report from VTT transcripts."""
import sys
from datetime import date
from pathlib import Path
import anthropic
PROMPT = """
You are synthesizing meeting transcripts into a daily report.
For each meeting (delimited by === lines), produce:
- A one-sentence summary
- Decisions made (bulleted, attributed where clear)
- Action items (owner and deadline if stated)
- Open questions not resolved
After all meetings, write a 2-3 sentence executive summary of the day.
Do not invent details. Omit anything unclear.
"""
def load_transcripts(directory: Path, date_prefix: str) -> str:
parts = []
for f in sorted(directory.glob(f"{date_prefix}_*.vtt")):
parts.append(f"=== {f.stem} ===")
parts.append(f.read_text())
return "\n\n".join(parts)
def main():
transcript_dir = Path("~/teams-transcripts").expanduser()
report_dir = Path("~/reports").expanduser()
report_dir.mkdir(exist_ok=True)
today = date.today().isoformat()
transcripts = load_transcripts(transcript_dir, today)
if not transcripts:
print(f"No transcripts found for {today}")
sys.exit(0)
client = anthropic.Anthropic()
message = client.messages.create(
model="claude-opus-4-5",
max_tokens=2048,
messages=[{"role": "user", "content": PROMPT + "\n\n" + transcripts}],
)
report = message.content[0].text
output = report_dir / f"{today}-daily-report.md"
output.write_text(report)
print(f"Report written to {output}")
if __name__ == "__main__":
main()python synthesize-report.pyThe prompt is the only thing that controls the report shape. A few tested variations:
Bullet-heavy (good for Slack)
Summarize each meeting as:
**{meeting name}** -- {one sentence what it was about}
Decisions: {bullet per decision}
Actions: {bullet per action item, @owner format}
Keep each meeting under 10 lines total.
Narrative prose (good for email)
Write a short narrative paragraph for each meeting describing what was decided
and what happens next. Avoid bullet points. Write in past tense, third person.
Conclude with a one-paragraph day summary.
Action items only
Extract every action item across all meetings. Format as a Markdown task list:
- [ ] {task} -- {owner if known} -- {deadline if stated} -- ({meeting name})
Do not include summaries or decisions.
A .vtt file looks like this:
WEBVTT
00:00:05.000 --> 00:00:09.000
Jane Smith: We need to decide on the deployment window before Friday.
00:00:09.500 --> 00:00:14.000
Bob Lee: I can have the runbook ready by Thursday EOD.
Each block has a timestamp range and the speaker name prefixed to the text. Claude reads this format without any preprocessing. The timestamps are useful context for the model (it can tell which items came late in the meeting versus early) but you do not need to strip them.
These placeholders are useful to parameterize if you template your prompt:
| Variable | Example | Use |
|---|---|---|
{date} |
2024-03-15 |
Anchor the report to a specific day |
{team} |
Platform Engineering |
Focus extraction on team-relevant items |
{output_format} |
Markdown, plain text |
Control rendering |
{max_length} |
500 words |
Cap verbose output |
| Model | Best for |
|---|---|
claude-haiku-4-5 |
Fast, cheap batch processing of many short meetings |
claude-sonnet-4-5 |
General use, good balance of quality and cost |
claude-opus-4-6 |
Long or complex transcripts where nuance matters |
For most daily report use, Sonnet is sufficient. Haiku works well if you are processing a full week of transcripts in a batch and cost is a concern.
A one-hour meeting transcript is typically 8,000-15,000 tokens in VTT format. Claude's context window comfortably handles a full day of meetings (4-6 hours of transcripts) in a single request. If you have a day with more than roughly 8 hours of recorded meetings, split into two requests or summarize each meeting individually and then synthesize the summaries.
Summarizing each file separately loses cross-meeting context. If a decision in the 9am meeting creates an action item that someone references in the 3pm meeting, a per-file approach surfaces them as disconnected. Feeding the full day as a single prompt lets Claude identify these threads and surfaces them in the executive summary. The cost difference is negligible.
DOCX files require parsing before you can feed them to Claude. VTT is plain text with a predictable structure Claude reads directly. The speaker labels (Jane Smith:) in VTT are also cleaner than the DOCX table format for attribution. If you need the DOCX for other purposes, download both with two separate runs.
Teams Copilot summaries exist inside Teams and require a Copilot license. They are not accessible programmatically, they use a fixed output format you cannot control, and they are only available per-meeting. This pipeline gives you a cross-meeting daily digest, full control over the output structure, and the ability to feed results into downstream tools (reports, ticketing systems, your own memory infrastructure) without manual copy-paste.