This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# from hankcs | |
''' | |
求解最可能的隐状态序列是HMM的三个典型问题之一,通常用维特比算法解决。维特比算法就是求解HMM上的最短路径(-log(prob),也即是最大概率)的算法。 | |
定义V[时间][今天天气] = 概率,注意今天天气指的是,前几天的天气都确定下来了(概率最大)今天天气是X的概率,这里的概率就是一个累乘的概率了。 | |
因为第一天我的朋友去散步了,所以第一天下雨的概率V[第一天][下雨] = 初始概率[下雨] * 发射概率[下雨][散步] = 0.6 * 0.1 = 0.06,同理可得V[第一天][天晴] = 0.24 。从直觉上来看,因为第一天朋友出门了,她一般喜欢在天晴的时候散步,所以第一天天晴的概率比较大,数字与直觉统一了。 | |
从第二天开始,对于每种天气Y,都有前一天天气是X的概率 * X转移到Y的概率 * Y天气下朋友进行这天这种活动的概率。因为前一天天气X有两种可能,所以Y的概率有两个,选取其中较大一个作为V[第二天][天气Y]的概率,同时将今天的天气加入到结果序列中 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
''' | |
num_epochs: 运行epoch的轮数 | |
train: 训练数据 | |
dev: 验证集数据 | |
evalp: 几轮进行一次验证 | |
model: 模型 | |
metric_best: 记录最佳的metric | |
metric_stop: 训练停止的metric | |
cnt_stop: 训练到dev几次不能超过metric_best就停止 | |
''' |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# Level: CRITICAL, ERROR, WARNING, INFO, DEBUG, NOTSET | |
import logging | |
logging.basicConfig(level=logging.DEBUG, | |
filename="mylog.log", | |
format='%(asctime)s %(levelname)s %(filename)s %(funcName)s %(lineno)d %(message)s', | |
filemode='w') | |
logger = logging.getLogger(__name__) | |
# logging.getLogger("module_name").setLevel(logging.CRITICAL) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
def is_generator(obj): | |
return True if iter(obj) is iter(obj) else False |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
''' | |
corpus is a list of list of string | |
like: | |
[['human', 'interface', 'computer'], | |
['survey', 'user', 'computer', 'system', 'response', 'time'], | |
['eps', 'user', 'interface', 'system'], | |
['system', 'human', 'system', 'eps'], | |
['user', 'response', 'time'], | |
['trees'], | |
['graph', 'trees'], |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
def is_number(s): | |
try: | |
float(s) | |
return True | |
except ValueError: | |
pass | |
try: | |
import unicodedata | |
unicodedata.numeric(s) | |
return True |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
for j in range(len(s)-1, -1, -1): | |
pass |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
class LogMixin(object): | |
@property | |
def logger(self): | |
name = '.'.join([__name__, self.__class__.__name__]) | |
return logging.getLogger(name) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import re | |
st_after = re.sub('\W', '', st_before) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import os | |
import torch.utils.data as data | |
from tqdm import tqdm | |
def train(): | |
# config saving | |
model_path = './ckpt' | |
if not os.path.exists(model_path): | |
os.mkdir(model_path) | |
save_step = 2 |
OlderNewer