Takahashi Kanji kanjirz50

NLPer

yagays / nfkc_compare.txt

Last active May 25, 2018 00:50 — forked from ikegami-yukino/nfkc_compare.txt

Pythonのunicodedata.normalize('NFKC')で正規化される文字の一覧

	# for Python 3.6
	import unicodedata

	# from 0 through 1,114,111 (https://docs.python.org/3.6/library/functions.html#chr)
	for unicode_id in range(1114111):
	char = chr(unicode_id)
	normalized_char = unicodedata.normalize("NFKC", char)
	if char != normalized_char:
	if len(normalized_char) == 1:
	code_point = ord(normalized_char)

attakei / python_google_home_notify_ja.md

Last active November 23, 2020 13:56

Google home notifications by python

Pythonを使ってGoogle Homeに指定したテキストを喋らせる

odashi / bleu.py

Last active September 20, 2019 06:46

BLEU calculator

	# usage (single sentence):
	# ref = ['This', 'is', 'a', 'pen', '.']
	# hyp = ['There', 'is', 'a', 'pen', '.']
	# stats = get_bleu_stats(ref, hyp)
	# bleu = calculate_bleu(stats) # => 0.668740
	#
	# usage (multiple sentences):
	# stats = defaultdict(int)
	# for ref, hyp in zip(refs, hyps):
	# for k, v in get_bleu_stats(ref, hyp).items():

bwhite / rank_metrics.py

Created September 15, 2012 03:23

Ranking Metrics

	"""Information Retrieval metrics

	Useful Resources:
	http://www.cs.utexas.edu/~mooney/ir-course/slides/Evaluation.ppt
	http://www.nii.ac.jp/TechReports/05-014E.pdf
	http://www.stanford.edu/class/cs276/handouts/EvaluationNew-handout-6-per.pdf
	http://hal.archives-ouvertes.fr/docs/00/72/67/60/PDF/07-busa-fekete.pdf
	Learning to Rank for Information Retrieval (Tie-Yan Liu)
	"""
	import numpy as np

fabianp / ranking.py

Last active December 24, 2025 18:54

Pairwise ranking using scikit-learn LinearSVC

	"""
	Implementation of pairwise ranking using scikit-learn LinearSVC

	Reference:

	"Large Margin Rank Boundaries for Ordinal Regression", R. Herbrich,
	T. Graepel, K. Obermayer 1999

	"Learning to rank from medical imaging data." Pedregosa, Fabian, et al.,
	Machine Learning in Medical Imaging 2012.

mugenen / wn.py

Created March 4, 2012 11:48 — forked from yanbe/wn.py

A frontend of WordNet-Ja database file (sqlite3 format) which is available on http://nlpwww.nict.go.jp/wn-ja/

	#!/usr/bin/env python
	# encoding: utf-8
	import sys
	import sqlite3
	from collections import namedtuple

	conn = sqlite3.connect("wnjpn.db")

	Word = namedtuple('Word', 'wordid lang lemma pron pos')