Skip to content

Instantly share code, notes, and snippets.

@glombard
Created January 20, 2014 15:11
Show Gist options
  • Select an option

  • Save glombard/8521582 to your computer and use it in GitHub Desktop.

Select an option

Save glombard/8521582 to your computer and use it in GitHub Desktop.
Converts Pluralsight transcript HTML to Markdown.
"""Converts Pluralsight transcript HTML to Markdown.
"""
from bs4 import BeautifulSoup
import sys
soup = BeautifulSoup(open(sys.argv[1]))
name = soup.find('meta', itemprop='name')['content']
description = soup.find('meta', property='og:description')['content']
print(name)
print('=' * len(name))
print('\n' + description)
lis = soup.find_all('li', class_='transcript-module')
for li in lis:
title = li.strings.next().strip()
print('\n' + title)
print('-' * len(title))
clips = li.find('ul').find_all('li', class_='transcript-clip')
for clip in clips:
it = clip.strings
sub_title = next(it).strip()
print('\n**' + sub_title + '**\n')
print('\n'.join((s for s in it if s.rstrip())))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment