Skip to content

Instantly share code, notes, and snippets.

@zircote
Last active March 20, 2026 16:05
Show Gist options
  • Select an option

  • Save zircote/7b08922ffdfd5db54c0d99789d043ddf to your computer and use it in GitHub Desktop.

Select an option

Save zircote/7b08922ffdfd5db54c0d99789d043ddf to your computer and use it in GitHub Desktop.

teams-transcripts

Download Microsoft Teams meeting transcripts to your laptop using the Microsoft Graph API and device-code authentication.


Tutorial: Get your first transcript in 15 minutes

This walkthrough takes you from nothing to a downloaded transcript file. You will register an Azure app, authenticate once, and run the script. Do this in order.

1. Register an Azure application

You need an app registration so Microsoft knows who is requesting your transcript files. This is a one-time step.

  1. Go to portal.azure.com and sign in with your work account.
  2. Search for App registrations and click New registration.
  3. Give it any name (e.g. teams-transcripts-cli).
  4. Under Supported account types, select "Accounts in this organizational directory only."
  5. Under Redirect URI, choose Public client/native and enter http://localhost.
  6. Click Register.

You will land on the app's overview page. Copy the Application (client) ID and the Directory (tenant) ID. You need both.

2. Configure the app permissions

Still in the app registration:

  1. Click Authentication in the left sidebar. Scroll down to Advanced settings and set "Allow public client flows" to Yes. Save.
  2. Click API permissions > Add a permission > Microsoft Graph > Delegated permissions.
  3. Search for and add: Files.Read and OnlineMeetings.Read.
  4. Click Add permissions.

You do not need to grant admin consent for these delegated permissions -- they only access your own files.

3. Install the script dependencies

pip install msal requests

4. Set your credentials

export TEAMS_CLIENT_ID=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
export TEAMS_TENANT_ID=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx

Add these to your shell profile (~/.zshrc, ~/.bashrc) so you do not have to re-enter them.

5. Run it

python teams-transcripts.py --days 14 --list

Your browser will open (or you will be given a code to enter at microsoft.com/devicelogin). Sign in with your work account. The script will then list transcripts from the last two weeks.

Remove --list to download them:

python teams-transcripts.py --days 14

Files land in ~/teams-transcripts/ named {date}_{meeting-title}.vtt.

That is it. Subsequent runs skip the browser -- the token is cached.


How-to guides

Download transcripts from the last 30 days

python teams-transcripts.py --days 30

Download in Word format instead of VTT

python teams-transcripts.py --format docx

Save to a specific directory

python teams-transcripts.py --out ~/Documents/meetings

Preview what would be downloaded without downloading

python teams-transcripts.py --list

Combine filters

python teams-transcripts.py --days 7 --format docx --out ~/Desktop/this-week

Clear cached credentials (sign out)

python teams-transcripts.py --logout

Use this if you need to authenticate as a different account or if your token stops working.

Schedule automatic downloads on macOS with launchd

Create ~/Library/LaunchAgents/com.local.teams-transcripts.plist:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN"
  "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
  <key>Label</key>
  <string>com.local.teams-transcripts</string>
  <key>ProgramArguments</key>
  <array>
    <string>/usr/bin/python3</string>
    <string>/path/to/teams-transcripts.py</string>
    <string>--days</string>
    <string>1</string>
    <string>--out</string>
    <string>/Users/you/teams-transcripts</string>
  </array>
  <key>EnvironmentVariables</key>
  <dict>
    <key>TEAMS_CLIENT_ID</key>
    <string>your-client-id</string>
    <key>TEAMS_TENANT_ID</key>
    <string>your-tenant-id</string>
  </dict>
  <key>StartCalendarInterval</key>
  <dict>
    <key>Hour</key>
    <integer>18</integer>
    <key>Minute</key>
    <integer>0</integer>
  </dict>
</dict>
</plist>
launchctl load ~/Library/LaunchAgents/com.local.teams-transcripts.plist

This runs the script daily at 6pm and pulls any new transcripts from the last 24 hours.


Reference

Command-line options

Option Default Description
--days N None (all time) Limit to transcripts modified in the last N days
--out PATH ~/teams-transcripts Directory to save downloaded files
--format vtt|docx vtt File format to download
--list off Print transcript list, skip download
--logout off Clear the cached token and exit

Environment variables

Variable Required Description
TEAMS_CLIENT_ID Yes Application (client) ID from Azure app registration
TEAMS_TENANT_ID No (default: common) Directory (tenant) ID from Azure AD Overview

Setting TENANT_ID to common allows any Microsoft account to authenticate. Use your specific tenant ID to restrict authentication to your organization.

Token cache

The script stores a serialized MSAL token cache at ~/.teams-transcripts-token-cache.json. It is created with chmod 600. The cache contains refresh tokens that allow silent re-authentication without a browser. Delete this file or run --logout to force re-authentication.

File naming

Downloaded files follow the pattern {YYYY-MM-DD}_{original-name}.{ext}.

The date is the last-modified date of the file in OneDrive, which corresponds to when Teams finished generating the transcript after the meeting ended.

Graph API permissions used

Permission Type Purpose
Files.Read Delegated Search and download files from the signed-in user's OneDrive
OnlineMeetings.Read Delegated Read meeting metadata

Both are delegated permissions, meaning the script acts as you and can only access what you can access.

Output formats

VTT (WebVTT) is a timestamped caption format. Each block contains a time range, the speaker's name, and the spoken text. Tools like ffmpeg and most subtitle editors read it. It is the better choice for programmatic processing.

DOCX is a Word document with the transcript formatted as a table: timestamp, speaker, and text in columns. It is easier to read and edit by hand.

Discovery method

The script uses the Graph API drive search endpoint (/me/drive/search) to find files by extension in your OneDrive. It then filters results by parent path, keeping only items under directories containing Recording, Transcripts, or Microsoft Teams in the path. Teams stores all transcript files in the meeting organizer's OneDrive under one of these path patterns.

Rate limiting

The script handles HTTP 429 responses by reading the Retry-After header and sleeping for the specified duration before retrying.

Dependencies

Package Version Purpose
msal >= 1.20 Microsoft Authentication Library for token acquisition and caching
requests >= 2.28 HTTP client for Graph API calls and file downloads

Explanation

Why a personal Azure app registration?

Microsoft does not provide a shared public client for the Graph API the way Google does for some of its APIs. Every app that calls the Graph API needs its own registration. The registration is free and does not require admin approval for delegated (user-level) permissions. It exists only to identify your script to Microsoft's auth system.

The app you register here has no special power. It can only access what your signed-in account can already access.

Why device-code flow?

The alternative auth flows (interactive browser or username/password) have practical problems for a command-line script. Browser-based flows open a window and require redirect URL handling. Username/password flow does not work with accounts that use MFA, which is most corporate accounts.

Device-code flow sidesteps both problems. The script gets a short code from Microsoft, prints a URL and the code, and waits. You visit the URL on any device, enter the code, and sign in normally (including MFA). The script then exchanges the completed authentication for tokens. After the first run, the token cache handles re-authentication silently.

Why OneDrive, not the Meetings API?

The Graph API has a dedicated meetings transcript endpoint (/communications/onlineMeetings/{id}/transcripts). It is the "right" way to access transcripts. In practice, it has two problems: it requires the OnlineMeetings.Read.All application permission (admin-granted, org-wide access) for anything beyond a narrow set of meetings you organized, and it requires you to know the meeting ID in advance.

The OneDrive search approach works with just your own delegated permissions, finds transcripts across meetings you attended as organizer, and requires no admin involvement. The tradeoff is that it only finds transcripts for meetings where you were the organizer (since Teams stores transcripts in the organizer's OneDrive).

If you need transcripts from meetings you attended but did not organize, ask the organizer to share the transcript, or explore the OnlineMeetings.Read.All route with your IT admin.

What "transcript available" actually means

Teams generates the transcript asynchronously after the meeting ends. The VTT file appears in OneDrive within a few minutes of the meeting ending, but this varies with meeting length and system load. Long meetings (2+ hours) can take 10-15 minutes. The --days filter works off the file's last-modified timestamp in OneDrive, not the meeting's scheduled time, so a meeting from yesterday evening will appear in --days 1 results today even if the transcript was written hours after the meeting ended.

Token security

The cached token file contains OAuth refresh tokens. These allow silent re-authentication for as long as the refresh token remains valid (typically 90 days of inactivity, or until you change your password). Treat the cache file like a password. The script sets chmod 600 on it automatically, but be aware of it if you are working on a shared machine.


Using Claude to synthesize daily reports

Once you have transcripts on disk, Claude can turn a folder of .vtt files into a structured daily report: decisions, action items, open questions, and a narrative summary per meeting. This section covers doing that from the command line using the claude CLI and from a script using the Anthropic API.

Tutorial: Your first daily report in 5 minutes

This assumes you have transcripts downloaded to ~/teams-transcripts/ and the Claude CLI installed.

1. Pull today's transcripts

python teams-transcripts.py --days 1 --out ~/teams-transcripts

2. Concatenate them with meeting labels

for f in ~/teams-transcripts/$(date +%Y-%m-%d)_*.vtt; do
  echo "=== $(basename $f .vtt) ===" >> /tmp/today.txt
  cat "$f" >> /tmp/today.txt
  echo >> /tmp/today.txt
done

3. Ask Claude to synthesize

cat /tmp/today.txt | claude -p "
You are synthesizing meeting transcripts into a daily report.

For each meeting (delimited by === lines), produce:
- A one-sentence summary of what the meeting was about
- Decisions made (bulleted, attributed to speaker where clear)
- Action items (bulleted, with owner and deadline if mentioned)
- Open questions that were raised but not resolved

After all meetings, write a 2-3 sentence executive summary of the day.

Do not invent details. If something is unclear from the transcript, omit it.
"

The report prints to stdout. Redirect to a file or pipe to pbcopy to paste into Notion, email, or wherever your daily standup lives.


How-to guides

Save the report to a dated Markdown file

DATE=$(date +%Y-%m-%d)

for f in ~/teams-transcripts/${DATE}_*.vtt; do
  echo "=== $(basename $f .vtt) ===" >> /tmp/today.txt
  cat "$f" >> /tmp/today.txt
done

cat /tmp/today.txt | claude -p "$(cat prompt.txt)" > ~/reports/${DATE}-daily-report.md
rm /tmp/today.txt

Keep your prompt in prompt.txt so you can tune it without editing the pipeline.

Process a specific meeting instead of all of today's

cat ~/teams-transcripts/2024-03-15_architecture-review.vtt | claude -p "
Summarize this meeting. Extract decisions, action items with owners, and any risks flagged.
Format as Markdown.
"

Run synthesis automatically after each download

Add this to your shell profile or a wrapper script:

function teams-daily() {
  DATE=$(date +%Y-%m-%d)
  python ~/bin/teams-transcripts.py --days 1 --out ~/teams-transcripts

  for f in ~/teams-transcripts/${DATE}_*.vtt; do
    echo "=== $(basename $f .vtt) ===" >> /tmp/today.txt
    cat "$f" >> /tmp/today.txt
  done

  cat /tmp/today.txt \
    | claude -p "$(cat ~/bin/teams-report-prompt.txt)" \
    > ~/reports/${DATE}-daily-report.md

  rm /tmp/today.txt
  echo "Report written to ~/reports/${DATE}-daily-report.md"
}

Use the API instead of the CLI (for scripting or automation)

Install the SDK:

pip install anthropic
#!/usr/bin/env python3
"""synthesize-report.py — generate a daily report from VTT transcripts."""

import sys
from datetime import date
from pathlib import Path
import anthropic

PROMPT = """
You are synthesizing meeting transcripts into a daily report.

For each meeting (delimited by === lines), produce:
- A one-sentence summary
- Decisions made (bulleted, attributed where clear)
- Action items (owner and deadline if stated)
- Open questions not resolved

After all meetings, write a 2-3 sentence executive summary of the day.

Do not invent details. Omit anything unclear.
"""

def load_transcripts(directory: Path, date_prefix: str) -> str:
    parts = []
    for f in sorted(directory.glob(f"{date_prefix}_*.vtt")):
        parts.append(f"=== {f.stem} ===")
        parts.append(f.read_text())
    return "\n\n".join(parts)

def main():
    transcript_dir = Path("~/teams-transcripts").expanduser()
    report_dir = Path("~/reports").expanduser()
    report_dir.mkdir(exist_ok=True)

    today = date.today().isoformat()
    transcripts = load_transcripts(transcript_dir, today)

    if not transcripts:
        print(f"No transcripts found for {today}")
        sys.exit(0)

    client = anthropic.Anthropic()
    message = client.messages.create(
        model="claude-opus-4-5",
        max_tokens=2048,
        messages=[{"role": "user", "content": PROMPT + "\n\n" + transcripts}],
    )

    report = message.content[0].text
    output = report_dir / f"{today}-daily-report.md"
    output.write_text(report)
    print(f"Report written to {output}")

if __name__ == "__main__":
    main()
python synthesize-report.py

Tune the output format

The prompt is the only thing that controls the report shape. A few tested variations:

Bullet-heavy (good for Slack)

Summarize each meeting as:
**{meeting name}** -- {one sentence what it was about}
Decisions: {bullet per decision}
Actions: {bullet per action item, @owner format}

Keep each meeting under 10 lines total.

Narrative prose (good for email)

Write a short narrative paragraph for each meeting describing what was decided
and what happens next. Avoid bullet points. Write in past tense, third person.
Conclude with a one-paragraph day summary.

Action items only

Extract every action item across all meetings. Format as a Markdown task list:
- [ ] {task} -- {owner if known} -- {deadline if stated} -- ({meeting name})

Do not include summaries or decisions.

Reference

VTT format primer

A .vtt file looks like this:

WEBVTT

00:00:05.000 --> 00:00:09.000
Jane Smith: We need to decide on the deployment window before Friday.

00:00:09.500 --> 00:00:14.000
Bob Lee: I can have the runbook ready by Thursday EOD.

Each block has a timestamp range and the speaker name prefixed to the text. Claude reads this format without any preprocessing. The timestamps are useful context for the model (it can tell which items came late in the meeting versus early) but you do not need to strip them.

Prompt variables

These placeholders are useful to parameterize if you template your prompt:

Variable Example Use
{date} 2024-03-15 Anchor the report to a specific day
{team} Platform Engineering Focus extraction on team-relevant items
{output_format} Markdown, plain text Control rendering
{max_length} 500 words Cap verbose output

Model choice

Model Best for
claude-haiku-4-5 Fast, cheap batch processing of many short meetings
claude-sonnet-4-5 General use, good balance of quality and cost
claude-opus-4-6 Long or complex transcripts where nuance matters

For most daily report use, Sonnet is sufficient. Haiku works well if you are processing a full week of transcripts in a batch and cost is a concern.

Context window and transcript length

A one-hour meeting transcript is typically 8,000-15,000 tokens in VTT format. Claude's context window comfortably handles a full day of meetings (4-6 hours of transcripts) in a single request. If you have a day with more than roughly 8 hours of recorded meetings, split into two requests or summarize each meeting individually and then synthesize the summaries.


Explanation

Why pipe transcripts rather than summarize per file?

Summarizing each file separately loses cross-meeting context. If a decision in the 9am meeting creates an action item that someone references in the 3pm meeting, a per-file approach surfaces them as disconnected. Feeding the full day as a single prompt lets Claude identify these threads and surfaces them in the executive summary. The cost difference is negligible.

Why VTT rather than DOCX for this workflow?

DOCX files require parsing before you can feed them to Claude. VTT is plain text with a predictable structure Claude reads directly. The speaker labels (Jane Smith:) in VTT are also cleaner than the DOCX table format for attribution. If you need the DOCX for other purposes, download both with two separate runs.

Why not use Teams' built-in Copilot summaries?

Teams Copilot summaries exist inside Teams and require a Copilot license. They are not accessible programmatically, they use a fixed output format you cannot control, and they are only available per-meeting. This pipeline gives you a cross-meeting daily digest, full control over the output structure, and the ability to feed results into downstream tools (reports, ticketing systems, your own memory infrastructure) without manual copy-paste.

#!/usr/bin/env python3
"""
teams-transcripts — download Microsoft Teams meeting transcripts from OneDrive
Setup (one-time):
1. Go to https://portal.azure.com > App registrations > New registration
2. Name it anything, set account type to "Accounts in this org directory only"
(or "any org" if multi-tenant), redirect URI = http://localhost (Public client)
3. Under "Authentication" > enable "Allow public client flows"
4. Under "API permissions" > Add > Microsoft Graph > Delegated:
Files.Read OnlineMeetings.Read
5. Copy the Application (client) ID
6. export TEAMS_CLIENT_ID=<your-client-id>
export TEAMS_TENANT_ID=<your-tenant-id> # find in Azure AD > Overview
Usage:
python teams-transcripts.py # list + download all transcripts
python teams-transcripts.py --list # list only, no download
python teams-transcripts.py --days 30 # only transcripts from last N days
python teams-transcripts.py --out ~/Downloads/transcripts
python teams-transcripts.py --format docx # download .docx instead of .vtt
"""
import argparse
import json
import os
import sys
import time
from datetime import datetime, timedelta, timezone
from pathlib import Path
try:
import msal
import requests
except ImportError:
print("Missing dependencies. Run: pip install msal requests")
sys.exit(1)
# ── Config ────────────────────────────────────────────────────────────────────
CLIENT_ID = os.environ.get("TEAMS_CLIENT_ID")
TENANT_ID = os.environ.get("TEAMS_TENANT_ID", "common")
SCOPES = ["https://graph.microsoft.com/Files.Read",
"https://graph.microsoft.com/OnlineMeetings.Read"]
CACHE_FILE = Path.home() / ".teams-transcripts-token-cache.json"
GRAPH_BASE = "https://graph.microsoft.com/v1.0"
# ── Auth ──────────────────────────────────────────────────────────────────────
def build_app():
if not CLIENT_ID:
print("ERROR: TEAMS_CLIENT_ID is not set. See setup instructions at top of script.")
sys.exit(1)
cache = msal.SerializableTokenCache()
if CACHE_FILE.exists():
cache.deserialize(CACHE_FILE.read_text())
app = msal.PublicClientApplication(
CLIENT_ID,
authority=f"https://login.microsoftonline.com/{TENANT_ID}",
token_cache=cache,
)
return app, cache
def get_token():
app, cache = build_app()
accounts = app.get_accounts()
result = None
if accounts:
result = app.acquire_token_silent(SCOPES, account=accounts[0])
if not result:
flow = app.initiate_device_flow(scopes=SCOPES)
if "user_code" not in flow:
print(f"Device flow failed: {flow}")
sys.exit(1)
print(f"\n{flow['message']}\n")
result = app.acquire_token_by_device_flow(flow)
if "access_token" not in result:
print(f"Auth error: {result.get('error_description', result)}")
sys.exit(1)
CACHE_FILE.write_text(cache.serialize())
CACHE_FILE.chmod(0o600)
return result["access_token"]
# ── Graph helpers ─────────────────────────────────────────────────────────────
def graph_get(token, url, params=None):
headers = {"Authorization": f"Bearer {token}"}
resp = requests.get(url, headers=headers, params=params)
if resp.status_code == 429:
wait = int(resp.headers.get("Retry-After", 5))
print(f" rate limited — waiting {wait}s...")
time.sleep(wait)
return graph_get(token, url, params)
resp.raise_for_status()
return resp.json()
def graph_get_all(token, url, params=None):
"""Follow @odata.nextLink pagination."""
results = []
while url:
data = graph_get(token, url, params)
results.extend(data.get("value", []))
url = data.get("@odata.nextLink")
params = None # only on first call
return results
# ── Transcript discovery ──────────────────────────────────────────────────────
def find_transcripts_onedrive(token, days=None, fmt="vtt"):
"""
Search OneDrive for Teams transcript files.
Teams stores them under: /Recording/ or /Microsoft Teams Data/
as <meeting-title>.vtt and <meeting-title>.docx
"""
ext = f".{fmt}"
url = f"{GRAPH_BASE}/me/drive/search(q='{ext}')"
params = {"$top": 200, "$orderby": "lastModifiedDateTime desc",
"$select": "id,name,lastModifiedDateTime,parentReference,@microsoft.graph.downloadUrl,size"}
items = graph_get_all(token, url, params)
cutoff = None
if days:
cutoff = datetime.now(timezone.utc) - timedelta(days=days)
results = []
for item in items:
if not item["name"].endswith(ext):
continue
# Filter to Teams transcript paths
path = item.get("parentReference", {}).get("path", "")
if not any(p in path for p in ["Recording", "Transcripts", "Microsoft Teams"]):
continue
modified = datetime.fromisoformat(item["lastModifiedDateTime"].replace("Z", "+00:00"))
if cutoff and modified < cutoff:
continue
results.append(item)
return results
# ── Download ──────────────────────────────────────────────────────────────────
def sanitize(name):
return "".join(c if c.isalnum() or c in " ._-" else "_" for c in name)
def download_transcript(token, item, output_dir, dry_run=False):
modified = item["lastModifiedDateTime"][:10]
name = sanitize(item["name"])
filename = f"{modified}_{name}"
dest = output_dir / filename
size_kb = item.get("size", 0) // 1024
if dest.exists():
print(f" skip {filename} (already exists)")
return False
if dry_run:
print(f" would download {filename} ({size_kb} KB)")
return False
url = item.get("@microsoft.graph.downloadUrl")
if not url:
# Fall back to content endpoint
file_id = item["id"]
drive_id = item["parentReference"].get("driveId", "")
url = f"{GRAPH_BASE}/drives/{drive_id}/items/{file_id}/content"
resp = requests.get(url, headers={"Authorization": f"Bearer {token}"}, allow_redirects=True)
else:
resp = requests.get(url, allow_redirects=True)
resp.raise_for_status()
dest.write_bytes(resp.content)
print(f" ✓ {filename} ({size_kb} KB)")
return True
# ── Main ──────────────────────────────────────────────────────────────────────
def main():
parser = argparse.ArgumentParser(description="Download Teams meeting transcripts")
parser.add_argument("--list", action="store_true", help="List only, no download")
parser.add_argument("--days", type=int, default=None, metavar="N",
help="Only transcripts from last N days")
parser.add_argument("--out", type=Path, default=Path.home() / "teams-transcripts",
help="Output directory (default: ~/teams-transcripts)")
parser.add_argument("--format", choices=["vtt", "docx"], default="vtt",
help="File format to download (default: vtt)")
parser.add_argument("--logout", action="store_true", help="Clear cached credentials")
args = parser.parse_args()
if args.logout:
if CACHE_FILE.exists():
CACHE_FILE.unlink()
print("Credentials cleared.")
else:
print("No cached credentials found.")
return
print("Authenticating with Microsoft...")
token = get_token()
print("Authenticated.\n")
label = f"last {args.days} days" if args.days else "all time"
print(f"Searching OneDrive for .{args.format} transcripts ({label})...")
items = find_transcripts_onedrive(token, days=args.days, fmt=args.format)
if not items:
print("No transcripts found.")
return
print(f"Found {len(items)} transcript(s):\n")
for item in items:
size_kb = item.get("size", 0) // 1024
modified = item["lastModifiedDateTime"][:10]
path = item.get("parentReference", {}).get("path", "")
print(f" {modified} {item['name']} ({size_kb} KB)")
print(f" {path}")
if args.list:
return
args.out.mkdir(parents=True, exist_ok=True)
print(f"\nDownloading to {args.out}/\n")
downloaded = 0
for item in items:
if download_transcript(token, item, args.out):
downloaded += 1
print(f"\nDone. {downloaded} new file(s) downloaded.")
if __name__ == "__main__":
main()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment