Linuxで動作確認済みです。
まずvirtualenvでpipで必要なライブラリをインストールします。fugashiを使用していますがこの場合mecab-python3でも変わりません。
# MeCab関係
pip install fugashi unidic-lite
# EXEを作成するやつ
pip install pyinstaller
# script to test degree tokenization related changes. | |
# https://github.com/explosion/spaCy/pull/9155 | |
import spacy | |
langs = ("af am ar az bg bn ca cs da de el en es et eu fa fi fr ga grc gu he hi " | |
"hr hu hy id is it ja kn ko ky lb lij lt lv mk ml mr nb ne nl pl pt ro " | |
"ru sa si sk sl sq sr sv ta te th ti tl tn tr tt uk ur vi xx yo zh").split() | |
check = ("°c °f °k °C °F °K °c. °f. °k. °C. °F. °K. 1°c 1°f 1°k 1°C 1°F 1°K 1°c. " | |
"1°f. 1°k. 1°C. 1°F. 1°K.").split() |
1. Transformersを使う場合 | |
最新版のTransformersはそもそもmecab-python3を使っていません。こちらを実行してください。 | |
pip install transformers[ja] | |
2. Neologdを使う場合 | |
mecab-python3のバージョンが古いです。まずmecab-python3を更新します。 |
# -Ochasen の出力フォーマットを直接指定する | |
import MeCab | |
import ipadic | |
CHASEN_ARGS = r' -F "%m\t%f[7]\t%f[6]\t%F-[0,1,2,3]\t%f[4]\t%f[5]\n"' | |
CHASEN_ARGS += r' -U "%m\t%m\t%m\t%F-[0,1,2,3]\t\t\n"' | |
tagger = MeCab.Tagger(ipadic.MECAB_ARGS + CHASEN_ARGS) | |
print(tagger.parse("図書館にいた事がバレた")) | |
# 出力 |
# -Ochasen の出力フォーマットを直接指定する | |
import MeCab | |
import ipadic | |
CHASEN_ARGS = r' -F "%m\t%f[7]\t%f[6]\t%F-[0,1,2,3]\t%f[4]\t%f[5]\n"' | |
CHASEN_ARGS += r' -U "%m\t%m\t%m\t%F-[0,1,2,3]\t\t\n"' | |
tagger = MeCab.Tagger(ipadic.MECAB_ARGS + CHASEN_ARGS) | |
print(tagger.parse("図書館にいた事がバレた")) | |
# 出力 |
24731941 年 | |
15955060 日 | |
13733371 月 | |
7032890 大 | |
6115161 本 | |
5634170 学 | |
5352959 人 | |
4568971 中 | |
4437080 国 | |
4403844 一 |
#!/usr/bin/env python3 | |
""" | |
Convert GSD conll format to a format the spaCy convert script can use as-is. | |
There are two main changes: | |
1. POS tags format is changed slightly. | |
old: 名詞-普通名詞-一般 |
[ 2359.097] (WW) Failed to open protocol names file lib/xorg/protocol.txt | |
[ 2359.098] | |
X.Org X Server 1.20.7 | |
X Protocol Version 11, Revision 0 | |
[ 2359.100] Build Operating System: Linux Arch Linux | |
[ 2359.101] Current Operating System: Linux shougeimaru 5.6.5-arch3-1 #1 SMP PREEMPT Sun, 19 Apr 2020 13:14:25 +0000 x86_64 | |
[ 2359.101] Kernel command line: BOOT_IMAGE=/boot/vmlinuz-linux root=UUID=9342fe9f-1bc1-42e5-aa8e-b4f7d26ce115 rw quiet | |
[ 2359.103] Build Date: 14 January 2020 07:13:52AM | |
[ 2359.103] | |
[ 2359.104] Current version of pixman: 0.38.4 |
[ 2064.502] (WW) Failed to open protocol names file lib/xorg/protocol.txt | |
[ 2064.503] | |
X.Org X Server 1.20.7 | |
X Protocol Version 11, Revision 0 | |
[ 2064.505] Build Operating System: Linux Arch Linux | |
[ 2064.506] Current Operating System: Linux shougeimaru 5.6.5-arch3-1 #1 SMP PREEMPT Sun, 19 Apr 2020 13:14:25 +0000 x86_64 | |
[ 2064.506] Kernel command line: BOOT_IMAGE=/boot/vmlinuz-linux root=UUID=9342fe9f-1bc1-42e5-aa8e-b4f7d26ce115 rw quiet | |
[ 2064.508] Build Date: 14 January 2020 07:13:52AM | |
[ 2064.508] | |
[ 2064.509] Current version of pixman: 0.38.4 |
# nvidia-xconfig: X configuration file generated by nvidia-xconfig | |
# nvidia-xconfig: version 304.43 ([email protected]) Sun Aug 19 21:28:54 PDT 2012 | |
# nvidia-settings: X configuration file generated by nvidia-settings | |
# nvidia-settings: version 260.19.44 ([email protected]) Sun Feb 27 21:50:27 PST 2011 | |
Section "ServerLayout" | |
Identifier "Layout0" | |
Screen 0 "Screen0" 0 0 | |
InputDevice "Keyboard0" "CoreKeyboard" |