Skip to content

Instantly share code, notes, and snippets.

View BramVanroy's full-sized avatar

Bram Vanroy BramVanroy

View GitHub Profile
@BramVanroy
BramVanroy / get_words_of_tokens.py
Created June 15, 2022 13:44
Get original words of tokens in HF Tokenizers
from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("bert-base-cased")
text = "It 's a pre-tokenized , silly sentence !"
words = text.split()
encoded = tokenizer(words, is_split_into_words=True)
for token, wordid in zip(encoded.tokens(), encoded.word_ids()):
if wordid is not None:
print(token, words[wordid])
from typing import List
import spacy
from spacy import Language, Vocab
from spacy.tokens import Doc
def load_nlp(model_name: str = "en_core_web_sm",
is_tokenized: bool = False,
exclude: List[str] = None):
"""Load a spaCy model. Disable sentence segmentation and tokenization with is_tokenized.
@BramVanroy
BramVanroy / remote-serve.md
Last active June 15, 2020 08:37
Using web-serving tool from a remote server

Oftentimes, you may want to use a web-based tool during programming, e.g. a Jupyter notebook, Tensorboard, Streamlit, and others. It is easy to set these tools up locally, on your own machine, but this computer may not be as powerful as a server that you have available. Here is a small guide to show you how to easily use the web-based tool remotely. As an example, we will use Tensorboard, allowing us to remotely monitor the live-updated progress of our machine learning system during training. This gist is simply an extension of the following Stack Overflow post. This gist does not cover how to use Tensorboard itself. To get started with that, read through the documentation (works for Tensorflow as well as PyTorch).

If we would start Tensorboard on our own machine, it would create a local server that is accessible through a s

@BramVanroy
BramVanroy / pypi-release-checklist.md
Last active July 2, 2024 08:52 — forked from audreyfeldroy/pypi-release-checklist.md
My PyPI Release Checklist
  • Update HISTORY.rst
  • Commit the changes:
git add HISTORY.rst
git commit -m "Changelog for upcoming release 0.1.1."
  • Update version number (can also be major (x.0.0), minor (0.x.0) instead of patch (0.0.x)
bumpversion patch