Last active
March 25, 2017 09:13
-
-
Save guangningyu/1915f19148675e5b7d08bf12ec9d2101 to your computer and use it in GitHub Desktop.
根据给定语料的bigram条件概率分布,自动生成文字
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!/usr/bin/env python | |
# -*- coding: utf-8 -*- | |
import nltk | |
def generate_model(cfd, word, num=15): | |
''' | |
给定条件概率分布和一个随机的词,生成一段文字 | |
''' | |
for i in range(num): | |
print word, | |
word = cfd[word].max() | |
# 导入《创世纪》文本 | |
text = nltk.corpus.genesis.words() | |
# 生成bigram | |
bigrams = nltk.bigrams(text) | |
# 计算给定第1个词,第2个词出现的条件概率 | |
cfd = nltk.ConditionalFreqDist(bigrams) | |
# 给定一个词,根据条件概率生成一段文字 | |
generate_model(cfd, 'living') |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment