Skip to content

Instantly share code, notes, and snippets.

@rajatvikramsingh
Created January 28, 2017 22:03
Show Gist options
  • Save rajatvikramsingh/ed4cd69878f03ea39e296dd6176ba727 to your computer and use it in GitHub Desktop.
Save rajatvikramsingh/ed4cd69878f03ea39e296dd6176ba727 to your computer and use it in GitHub Desktop.
Convert srt to txt file - regex

I recently wanted to extract the text from a srt file and tried different approaches. It can be easily done using regular expressions and a text editor which can search using regex. A typical srt file:

1
00:00:00,180 --> 00:00:03,180
Spoken Line 1

2
00:00:03,180 --> 00:00:07,290
Spoken Line 2

You can use the following regex for identifying the line number and the time range:

\n?([0-9]+)\n([0-9]{2}:[0-9]{2}:[0-9]{2}.[0-9]{3}) --> ([0-9]{2}:[0-9]{2}:[0-9]{2}.[0-9]{3})\n

Use this regex in a text editor like Sublime Text and replace it with blank. You will get the following:

Spoken Line 1
Spoken Line 2

You can do more processing if you want. Try it!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment