Skip to content

Instantly share code, notes, and snippets.

@danhammer
Created June 1, 2019 05:14
Show Gist options
  • Save danhammer/9dba43d4831d828c154b9484de913319 to your computer and use it in GitHub Desktop.
Save danhammer/9dba43d4831d828c154b9484de913319 to your computer and use it in GitHub Desktop.
quick and dirty test of PDF to text script
from itertools import groupby
from operator import itemgetter
with open("tester2.txt") as f:
content = f.readlines()
spoken = []
i = 1
for line in content:
leading_space = len(line) - len(line.lstrip())
print(leading_space)
if leading_space < 9 or leading_space > 40:
pass
else:
if leading_space > 19:
i += 1
spoken.append([line, i])
groups = groupby(spoken, itemgetter(1))
character_line = [[item[0] for item in data] for (key, data) in groups]
final = []
for xx in character_line:
character_name = xx[0].strip(' ').strip('\n')
lines = map(lambda x: x.strip(' ').strip('\n'), xx[1:])
concat_line = " ".join(lines)
out_line = character_name + " => " + concat_line
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment