Skip to content

Instantly share code, notes, and snippets.

@gthb
Last active September 26, 2023 03:12
Show Gist options
  • Save gthb/0a1695b80f7ebd973baab597e4fa60a8 to your computer and use it in GitHub Desktop.
Save gthb/0a1695b80f7ebd973baab597e4fa60a8 to your computer and use it in GitHub Desktop.
Spacy under pyinstaller failing to load a language model package

Minimal example to reproduce spacy failing to load language model package under pyinstaller

Run bash build-linux.sh to reproduce on Linux (assumes python3 is installed somewhere in $PATH), or build-win.bat to reproduce on Windows (assumes Python 3.7.6 is installed at C:/Python37/)

It will fail with something like:

$ ./dist/linux/Foo/foo_r
Traceback (most recent call last):
  File "run.py", line 2, in <module>
    lm = spacy.load('en_core_web_sm')
  File "spacy/__init__.py", line 30, in load
  File "spacy/util.py", line 169, in load_model
OSError: [E050] Can't find model 'en_core_web_sm'. It doesn't seem to be a shortcut link, a Python package or a valid path to a data directory.
[1431] Failed to execute script run

Now edit spacy/util.py to hardcode the assumption that en_core_web_sm is an importable package, e.g. on Linux:

sed -Ei 's/if is_package\(name\):/if is_package\(name\) or name == "en_core_web_sm":/' venv/lib/python3.6/site-packages/spacy/util.py

(or corresponding manual edit)

and rebuild with pyinstaller --clean -y --dist ./dist/linux --workpath /tmp --debug all foo.spec and it will run fine:

$ ./dist/linux/Foo/foo_r
Hi interjection
python3 -m venv venv
. venv/bin/activate
pip install -U pip "setuptools<45.0" # because of https://github.com/pypa/setuptools/issues/1963
#pip install pyinstaller==3.6
pip install https://github.com/pyinstaller/pyinstaller/archive/develop.zip
pip install spacy==2.2.4
pip install "https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-2.2.5/en_core_web_sm-2.2.5.tar.gz#egg=en_core_web_sm"
pyinstaller --clean -y --dist ./dist/linux --workpath /tmp --debug all foo.spec
C:/Python37/python.exe -m venv venv
call venv/Scripts/activate.bat
python -m pip install -U pip
pip install "setuptools<45.0"
rem pip install pyinstaller==3.6
pip install https://github.com/pyinstaller/pyinstaller/archive/develop.zip
pip install spacy==2.2.4
pip install "https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-2.2.5/en_core_web_sm-2.2.5.tar.gz#egg=en_core_web_sm"
pyinstaller --clean -y --dist ./dist/windows --workpath /tmp --debug all foo.spec
# -*- mode: python ; coding: utf-8 -*-
a = Analysis(
['run.py'],
pathex=['./'],
hookspath=['.'],
hiddenimports=['en_core_web_sm'],
)
pyz = PYZ(a.pure, a.zipped_data)
exe = EXE(
pyz,
a.scripts,
exclude_binaries=True,
name='foo_r',
)
coll = COLLECT(
exe,
a.binaries,
a.zipfiles,
a.datas,
name='Foo',
)
from PyInstaller.utils.hooks import collect_data_files
datas = collect_data_files("en_core_web_sm")
"""
Combined hook for spacy and its dependency libraries; should probably be separated.
"""
from PyInstaller.utils.hooks import collect_data_files
import spacy
datas = collect_data_files('spacy', False)
datas.append((spacy.util.get_data_path(), 'spacy/data'))
datas.extend(collect_data_files('thinc', False))
hiddenimports=[
'blis',
'blis.py',
'cymem.cymem',
'murmurhash',
'preshed.maps',
'spacy._align',
'spacy.kb',
'spacy.lang.en',
'spacy.lang.es',
'spacy.lang.fr',
'spacy.lexeme',
'spacy.matcher._schemas',
'spacy.morphology',
'spacy.parts_of_speech',
'spacy.strings',
'spacy.syntax',
'spacy.syntax._beam_utils',
'spacy.syntax._parser_model',
'spacy.syntax.arc_eager',
'spacy.syntax.ner',
'spacy.syntax.nn_parser',
'spacy.syntax.nonproj',
'spacy.syntax.stateclass',
'spacy.syntax.transition_system',
'spacy.tokens._retokenize',
'spacy.tokens.morphanalysis',
'spacy.tokens.underscore',
'srsly.msgpack.util',
'thinc.extra.search',
'thinc.linalg',
'thinc.neural._aligned_alloc',
'thinc.neural._custom_kernels',
]
import spacy
lm = spacy.load('en_core_web_sm')
doc = lm('Hi there')
print(doc[0], spacy.explain(doc[0].tag_))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment