Skip to content

Instantly share code, notes, and snippets.

View fasiha's full-sized avatar
💭
🐦‍🔥

Ahmed Fasih fasiha

💭
🐦‍🔥
View GitHub Profile
@fasiha
fasiha / gist:dd5eb7942c79571dd5895e05f27e8bb9
Created January 18, 2018 02:48 — forked from masayu-a/gist:b3ce862336e47736e84f
English translations of UniDic inflection types, prepared by Irena Srdanovic, 18.1.2013 and 22.1.2013
Inflection type (Ja) Inflection type (En) Inflection type - description (En)
カ行変格 ka_irr kahen_verb.irregular
サ行変格 sa_irr sahen_verb.irregular
ザ行変格 za_irr zahen_verb.irregular
上一段-ア行 V1i.a kamiichidan_verb_i_row.a_column
上一段-カ行 V1i.ka kamiichidan_verb_i_row.ka_column
上一段-ガ行 V1i.ga kamiichidan_verb_i_row.ga_column
上一段-ザ行 V1i.za kamiichidan_verb_i_row.za_column
上一段-タ行 V1i.ta kamiichidan_verb_i_row.ta_column
@fasiha
fasiha / Dockerfile
Created February 2, 2018 18:15 — forked from klokan/Dockerfile
GDAL in Docker - stable GDAL with JP2KAK, MRSID and ECW: https://registry.hub.docker.com/u/klokantech/gdal/
FROM debian:7
ENV DEBIAN_FRONTEND noninteractive
RUN apt-get -qq update \
&& apt-get -qq -y --no-install-recommends install \
autoconf \
automake \
build-essential \
curl \
# Put new words in a CSV with this format
# 表層形,左文脈ID,右文脈ID,コスト,品詞,品詞細分類1,品詞細分類2,品詞細分類3,活用形,活用型,原形,読み,発音
# surface_form,left_context_id,right_context_id,cost,part_of_speech,pos_division_1,pos_division_2,pos_division_3,inflection_type,inflection_style,lemma,reading,pronunciation
$ echo "fasihsignal,-1,-1,100,名詞,一般,*,*,*,*,fasihsignal,ファシシグナル,ファシシグナル" > a.csv
# Then use mecab-dict-index to compile the csv into a .dic file, based on an existing mecab dictionary file
$ /usr/local/Cellar/mecab/0.996/libexec/mecab/mecab-dict-index -d/usr/local/Cellar/mecab/0.996/lib/mecab/dic/ipadic/ -u a.dic -f utf8 -t utf8 a.csv
# The use it
$ mecab -ua.dic