Linuxで動作確認済みです。
まずvirtualenvでpipで必要なライブラリをインストールします。fugashiを使用していますがこの場合mecab-python3でも変わりません。
# MeCab関係
pip install fugashi unidic-lite
# EXEを作成するやつ
pip install pyinstaller
| # script to test degree tokenization related changes. | |
| # https://github.com/explosion/spaCy/pull/9155 | |
| import spacy | |
| langs = ("af am ar az bg bn ca cs da de el en es et eu fa fi fr ga grc gu he hi " | |
| "hr hu hy id is it ja kn ko ky lb lij lt lv mk ml mr nb ne nl pl pt ro " | |
| "ru sa si sk sl sq sr sv ta te th ti tl tn tr tt uk ur vi xx yo zh").split() | |
| check = ("°c °f °k °C °F °K °c. °f. °k. °C. °F. °K. 1°c 1°f 1°k 1°C 1°F 1°K 1°c. " | |
| "1°f. 1°k. 1°C. 1°F. 1°K.").split() |
| 1. Transformersを使う場合 | |
| 最新版のTransformersはそもそもmecab-python3を使っていません。こちらを実行してください。 | |
| pip install transformers[ja] | |
| 2. Neologdを使う場合 | |
| mecab-python3のバージョンが古いです。まずmecab-python3を更新します。 |
| # -Ochasen の出力フォーマットを直接指定する | |
| import MeCab | |
| import ipadic | |
| CHASEN_ARGS = r' -F "%m\t%f[7]\t%f[6]\t%F-[0,1,2,3]\t%f[4]\t%f[5]\n"' | |
| CHASEN_ARGS += r' -U "%m\t%m\t%m\t%F-[0,1,2,3]\t\t\n"' | |
| tagger = MeCab.Tagger(ipadic.MECAB_ARGS + CHASEN_ARGS) | |
| print(tagger.parse("図書館にいた事がバレた")) | |
| # 出力 |
| # -Ochasen の出力フォーマットを直接指定する | |
| import MeCab | |
| import ipadic | |
| CHASEN_ARGS = r' -F "%m\t%f[7]\t%f[6]\t%F-[0,1,2,3]\t%f[4]\t%f[5]\n"' | |
| CHASEN_ARGS += r' -U "%m\t%m\t%m\t%F-[0,1,2,3]\t\t\n"' | |
| tagger = MeCab.Tagger(ipadic.MECAB_ARGS + CHASEN_ARGS) | |
| print(tagger.parse("図書館にいた事がバレた")) | |
| # 出力 |
| 24731941 年 | |
| 15955060 日 | |
| 13733371 月 | |
| 7032890 大 | |
| 6115161 本 | |
| 5634170 学 | |
| 5352959 人 | |
| 4568971 中 | |
| 4437080 国 | |
| 4403844 一 |
| #!/usr/bin/env python3 | |
| """ | |
| Convert GSD conll format to a format the spaCy convert script can use as-is. | |
| There are two main changes: | |
| 1. POS tags format is changed slightly. | |
| old: 名詞-普通名詞-一般 |
| [ 2359.097] (WW) Failed to open protocol names file lib/xorg/protocol.txt | |
| [ 2359.098] | |
| X.Org X Server 1.20.7 | |
| X Protocol Version 11, Revision 0 | |
| [ 2359.100] Build Operating System: Linux Arch Linux | |
| [ 2359.101] Current Operating System: Linux shougeimaru 5.6.5-arch3-1 #1 SMP PREEMPT Sun, 19 Apr 2020 13:14:25 +0000 x86_64 | |
| [ 2359.101] Kernel command line: BOOT_IMAGE=/boot/vmlinuz-linux root=UUID=9342fe9f-1bc1-42e5-aa8e-b4f7d26ce115 rw quiet | |
| [ 2359.103] Build Date: 14 January 2020 07:13:52AM | |
| [ 2359.103] | |
| [ 2359.104] Current version of pixman: 0.38.4 |
| [ 2064.502] (WW) Failed to open protocol names file lib/xorg/protocol.txt | |
| [ 2064.503] | |
| X.Org X Server 1.20.7 | |
| X Protocol Version 11, Revision 0 | |
| [ 2064.505] Build Operating System: Linux Arch Linux | |
| [ 2064.506] Current Operating System: Linux shougeimaru 5.6.5-arch3-1 #1 SMP PREEMPT Sun, 19 Apr 2020 13:14:25 +0000 x86_64 | |
| [ 2064.506] Kernel command line: BOOT_IMAGE=/boot/vmlinuz-linux root=UUID=9342fe9f-1bc1-42e5-aa8e-b4f7d26ce115 rw quiet | |
| [ 2064.508] Build Date: 14 January 2020 07:13:52AM | |
| [ 2064.508] | |
| [ 2064.509] Current version of pixman: 0.38.4 |
| # nvidia-xconfig: X configuration file generated by nvidia-xconfig | |
| # nvidia-xconfig: version 304.43 ([email protected]) Sun Aug 19 21:28:54 PDT 2012 | |
| # nvidia-settings: X configuration file generated by nvidia-settings | |
| # nvidia-settings: version 260.19.44 ([email protected]) Sun Feb 27 21:50:27 PST 2011 | |
| Section "ServerLayout" | |
| Identifier "Layout0" | |
| Screen 0 "Screen0" 0 0 | |
| InputDevice "Keyboard0" "CoreKeyboard" |