Skip to content

Instantly share code, notes, and snippets.

@paulohrpinheiro
Created August 22, 2015 20:53
Show Gist options
  • Save paulohrpinheiro/68efc9070052f973b128 to your computer and use it in GitHub Desktop.
Save paulohrpinheiro/68efc9070052f973b128 to your computer and use it in GitHub Desktop.
Descobre a estrutura de palavras de um texto.
import sys
words = set()
plain_text = list()
codded_text= list()
# read content
plain_text = [l.strip().split() for l in sys.stdin]
# build a word list from content
for line in plain_text:
[words.add(w) for w in line]
# set() don't have an index method, then use a list
words = list(words)
# build codded content
for line in plain_text:
codded_text.append([words.index(w) for w in line])
# how many words?
print('Total of words:', len(words))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment