Skip to content

Instantly share code, notes, and snippets.

@saamerm
Created May 14, 2025 09:58
Show Gist options
  • Save saamerm/8432ab603588b1f4107aa51efc64bdab to your computer and use it in GitHub Desktop.
Save saamerm/8432ab603588b1f4107aa51efc64bdab to your computer and use it in GitHub Desktop.
YouTube Caption or Subtitle Extractor from the Timed Text API JSON displayed here https://codepen.io/saamerm/pen/OPPdprB
<!-- View the sample here https://codepen.io/saamerm/pen/OPPdprB -->
<html>
<head>
<meta charset="UTF-8">
<title>YouTube Timed Text Extractor</title>
<style>
body {
font-family: Arial, sans-serif;
padding: 20px;
}
textarea {
width: 100%;
height: 150px;
margin-bottom: 10px;
}
button {
padding: 10px 20px;
font-size: 16px;
margin-bottom: 10px;
}
#output {
white-space: pre-wrap;
background: #f4f4f4;
padding: 10px;
border: 1px solid #ccc;
}
</style>
</head>
<body>
<h1>YouTube Caption or Subtitle Extractor
</h1>
<h2>Timed Text API values Extractor</h2>
<div>YouTube videos that have captions below videos can be used with this tool to extract all the subtitles. All you need to do is "Inspect Element" on the page to check the Network calls. You will see an "XHR" call with the name ending with "timedtext" -> Go to the preview of it and copy the json into the box below!</div>
<p>Paste the JSON below:</p>
<textarea id="jsonInput"></textarea>
<br>
<button onclick="extractText()">Extract Text</button>
<h3>Combined Transcript:</h3>
<div id="output"></div>
<div>If you are having any issues, feel free to reach out to me on <a href="https://linkedin.com/in/saamer">my LinkedIn</a>. If you aren't seeing the timedtext endpoint, you might have to refresh the page with the network tab open</div>
<div>Example of a video that has timedtext: https://www.youtube.com/watch?v=0Pn4bxa8zX0 . The url of the timedtext endpoint looks like this https://www.youtube.com/api/timedtext?v=0Pn4bxa8zX0...</div>
<script>
function extractText() {
let input = document.getElementById('jsonInput').value;
let outputDiv = document.getElementById('output');
try {
let data = JSON.parse(input);
let result = '';
if (Array.isArray(data.events)) {
data.events.forEach(event => {
if (event.segs && Array.isArray(event.segs)) {
event.segs.forEach(seg => {
if (seg.utf8) {
result += seg.utf8 + ' ';
}
});
}
});
}
outputDiv.textContent = result.trim();
} catch (e) {
outputDiv.textContent = 'Invalid JSON input.';
}
}
</script>
</body>
</html>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment