This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# -*- coding:utf-8 -*- | |
'写了一个简单的支持中文的正向最大匹配的机械分词,其它不用解释了,就几十行代码' | |
'搜狗词库下载地址:http://vdisk.weibo.com/s/7RlE5' | |
import string | |
__dict = {} | |
def load_dict(dict_file='words.dic'): | |
'加载词库,把词库加载成一个key为首字符,value为相关词的列表的字典' |