Skip to content

Instantly share code, notes, and snippets.

@cstrelioff
Last active April 13, 2018 09:59
Show Gist options
  • Save cstrelioff/1fd7617a5ca71d1c78cf to your computer and use it in GitHub Desktop.
Save cstrelioff/1fd7617a5ca71d1c78cf to your computer and use it in GitHub Desktop.
An example of writing LDA topic probabities to csv file

write LDA topic probabilities to csv

This gist shows one way to write the topic distribution for each document to a csv file-- this can be loaded into Excel, or similar program, for viewing. The topic distribution considered here is created by the Python package lda. See my blog post on lda for more information.

#! /usr/bin/env python
# -*- coding: utf-8 -*-
# vim:fenc=utf-8
#
# Copyright © 2015 Christopher C. Strelioff <[email protected]>
#
# Distributed under terms of the MIT license.
"""
topic_table_lda.py -- write topic table to csv file.
"""
from __future__ import division, print_function
import numpy as np
import lda
import lda.datasets
# document-term matrix
X = lda.datasets.load_reuters()
# the vocab
vocab = lda.datasets.load_reuters_vocab()
# titles for each story
titles = lda.datasets.load_reuters_titles()
# train the model
model = lda.LDA(n_topics=20, n_iter=500, random_state=1)
model.fit(X)
# get results
topic_word = model.topic_word_
doc_topic = model.doc_topic_
# print topic probabiities for each document
n_docs = 395
n_topics = 20
with open('topic_table.csv', 'w') as f:
# create header
header = 'document'
for k in range(n_topics):
header += ', pr_topic_{}'.format(k)
f.write(header + '\n')
# write one row for each document
# col 1 : document number
# cols 2 -- : topic probabilities
for k in range(n_docs):
# format probabilities into string
str_probs = ','.join(['{:.5e}'.format(pr) for pr in doc_topic[k,:]])
# write line to file
f.write('{}, {}\n'.format(k, str_probs))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment