Skip to content

Instantly share code, notes, and snippets.

@LouisdeBruijn
Last active March 11, 2020 13:53
Show Gist options
  • Select an option

  • Save LouisdeBruijn/faf33cf3869abdc3e17504f2e000bfbf to your computer and use it in GitHub Desktop.

Select an option

Save LouisdeBruijn/faf33cf3869abdc3e17504f2e000bfbf to your computer and use it in GitHub Desktop.
def read_corpus(corpus_file, binary):
"""Read input document and return the textual reviews and the sentiment or genre."""
documents = []
labels = []
with open(corpus_file, 'r', encoding='utf-8') as f:
for line in f:
tokens = line.strip().split()
documents.append(tokens[3:])
if binary:
# 2-class problem: positive vs negative
labels.append(tokens[1])
else:
# 6-class problem: books, camera, dvd, health, music, software
labels.append(tokens[0])
return documents, labels
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment