We'd like you all to get acquainted with the basic syntax of Python so that you feel more comfortable in the workshop and we're able to build more cool stuff!!
Ideally you'd take a look at:
- Variables and Types
- Strings
- Loops
- Conditions
- Functions
site: https://tamuhey.github.io/tokenizations/
Natural Language Processing (NLP) has made great progress in recent years because of neural networks, which allows us to solve various tasks with end-to-end architecture. However, many NLP systems still require language-specific pre- and post-processing, especially in tokenizations. In this article, I describe an algorithm that simplifies calculating correspondence between tokens (e.g. BERT vs. spaCy), one such process. And I introduce Python and Rust libraries that implement this algorithm. Here are the library and the demo site links: