This gist shows one way to write the topic distribution for each document to a csv file-- this can be loaded into Excel, or similar program, for viewing. The topic distribution considered here is created by the Python package lda. See my blog post on lda for more information.
Last active
April 13, 2018 09:59
-
-
Save cstrelioff/1fd7617a5ca71d1c78cf to your computer and use it in GitHub Desktop.
An example of writing LDA topic probabities to csv file
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#! /usr/bin/env python | |
# -*- coding: utf-8 -*- | |
# vim:fenc=utf-8 | |
# | |
# Copyright © 2015 Christopher C. Strelioff <[email protected]> | |
# | |
# Distributed under terms of the MIT license. | |
""" | |
topic_table_lda.py -- write topic table to csv file. | |
""" | |
from __future__ import division, print_function | |
import numpy as np | |
import lda | |
import lda.datasets | |
# document-term matrix | |
X = lda.datasets.load_reuters() | |
# the vocab | |
vocab = lda.datasets.load_reuters_vocab() | |
# titles for each story | |
titles = lda.datasets.load_reuters_titles() | |
# train the model | |
model = lda.LDA(n_topics=20, n_iter=500, random_state=1) | |
model.fit(X) | |
# get results | |
topic_word = model.topic_word_ | |
doc_topic = model.doc_topic_ | |
# print topic probabiities for each document | |
n_docs = 395 | |
n_topics = 20 | |
with open('topic_table.csv', 'w') as f: | |
# create header | |
header = 'document' | |
for k in range(n_topics): | |
header += ', pr_topic_{}'.format(k) | |
f.write(header + '\n') | |
# write one row for each document | |
# col 1 : document number | |
# cols 2 -- : topic probabilities | |
for k in range(n_docs): | |
# format probabilities into string | |
str_probs = ','.join(['{:.5e}'.format(pr) for pr in doc_topic[k,:]]) | |
# write line to file | |
f.write('{}, {}\n'.format(k, str_probs)) |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment