Skip to content

Instantly share code, notes, and snippets.

@hjhilden
Created December 20, 2021 14:53
Show Gist options
  • Save hjhilden/5b7461f317e02d562ab2737e85ed4b79 to your computer and use it in GitHub Desktop.
Save hjhilden/5b7461f317e02d562ab2737e85ed4b79 to your computer and use it in GitHub Desktop.
Install Swedish natural language processing for Python 3.9

Install and use Swedish natural language processing

Assuming using conda and python 3.9. Shoutout to all the creators of these tools! Check the respective instructions for the different packages before proceeding.

Install Lemmy with pip https://github.com/sorenlind/lemmy/ pip install lemmy

Install spaCy 3.0+ with conda https://spacy.io/usage conda install -c conda-forge spacy

Install spaCy-transformers with pip https://github.com/explosion/spacy-transformers

For GPU installation, find your CUDA version using nvcc --version and add the version in brackets, e.g. spacy[transformers,cuda92] for CUDA9.2 or spacy[transformers,cuda100] for CUDA10.0.

pip install spacy[transformers]

Install git-lfs to allow cloning of large files https://git-lfs.github.com/ then clone the pretrained Swedish multitask models by The National Library of Sweden / KB Lab from https://huggingface.co/KBLab/swedish-spacy-pipeline/tree/main

git lfs install
git clone https://huggingface.co/KBLab/swedish-spacy-pipeline

Profit? Try to run:

import lemmy
import spacy

lemmatizer_sv = lemmy.load("sv")
nlp = spacy.load('Users/PathToClonedModels/swedish-spacy-pipeline/')

text = """
Detta är ett minne från Warzawa, våren 1932:
Hon stod med ryggen mot mig på perrongen.
Hennes öron var röda och stora, två något 
kålbladsliknande utväxter på var sida om huvudet. 
"""
doc = nlp(text)

for token in doc:
    print(token.text, token.lemma_, token.pos_, token.tag_, token.dep_,
            token.shape_, token.is_alpha, token.is_stop)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment