Skip to content

Instantly share code, notes, and snippets.

View thakkarparth007's full-sized avatar

Parth Thakkar thakkarparth007

View GitHub Profile
@suhithr
suhithr / python-basics.md
Last active October 20, 2016 14:28
List of simple resources to get up and running with the syntax of Python

Python Basics for Delta's Linux-Python Workshop 2016

We'd like you all to get acquainted with the basic syntax of Python so that you feel more comfortable in the workshop and we're able to build more cool stuff!!

Ideally you'd take a look at:

  • Variables and Types
  • Strings
  • Loops
  • Conditions
  • Functions
@tamuhey
tamuhey / tokenizations_post.md
Last active July 27, 2024 14:46
How to calculate the alignment between BERT and spaCy tokens effectively and robustly

How to calculate the alignment between BERT and spaCy tokens effectively and robustly

image

site: https://tamuhey.github.io/tokenizations/

Natural Language Processing (NLP) has made great progress in recent years because of neural networks, which allows us to solve various tasks with end-to-end architecture. However, many NLP systems still require language-specific pre- and post-processing, especially in tokenizations. In this article, I describe an algorithm that simplifies calculating correspondence between tokens (e.g. BERT vs. spaCy), one such process. And I introduce Python and Rust libraries that implement this algorithm. Here are the library and the demo site links: