Skip to content

Instantly share code, notes, and snippets.

@guangningyu
Last active March 25, 2017 09:13
Show Gist options
  • Save guangningyu/1915f19148675e5b7d08bf12ec9d2101 to your computer and use it in GitHub Desktop.
Save guangningyu/1915f19148675e5b7d08bf12ec9d2101 to your computer and use it in GitHub Desktop.
根据给定语料的bigram条件概率分布,自动生成文字
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import nltk
def generate_model(cfd, word, num=15):
'''
给定条件概率分布和一个随机的词,生成一段文字
'''
for i in range(num):
print word,
word = cfd[word].max()
# 导入《创世纪》文本
text = nltk.corpus.genesis.words()
# 生成bigram
bigrams = nltk.bigrams(text)
# 计算给定第1个词,第2个词出现的条件概率
cfd = nltk.ConditionalFreqDist(bigrams)
# 给定一个词,根据条件概率生成一段文字
generate_model(cfd, 'living')
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment