Louis Guitton louisguitton

	classroom	team
	student	candidate
	teacher	hiring manager
	teaching assistant	team member
	assignment	take-home code challenge
	grading	review
	teaching process	hiring process and employee growth

How to calculate the alignment between BERT and spaCy tokens effectively and robustly

site: https://tamuhey.github.io/tokenizations/

Natural Language Processing (NLP) has made great progress in recent years because of neural networks, which allows us to solve various tasks with end-to-end architecture. However, many NLP systems still requires language-specific pre- and post-processing, especially in tokenizations. In this article, I describe an algorithm which simplifies calculating of correspondence between tokens (e.g. BERT vs. spaCy), one such process. And I introduce Python and Rust libraries that implement this algorithm.

	<?xml version="1.0" encoding="UTF-8"?>
	<gexf version="1.2" xmlns="http://www.gexf.net/1.2draft" xmlns:viz="http:///www.gexf.net/1.1draft/viz">
	<meta>
	<title>Les Miserables.gexf</title>
	<authors>Gephi 0.9.3</authors>
	</meta>
	<graph defaultedgetype="undirected">
	<attributes class="node">
	<attribute id="modularity_class" title="modularity_class" type="integer"/>
	</attributes>

	import unicodedata
	import html
	import datetime
	from dateutil.relativedelta import relativedelta
	import csv

	# how to make a GET request in python? -> learn how to use the 'requests' library
	import requests

	# how to use an external library to create a progress bar?

	# pip install tqdm GitPython
	from git import Repo
	from git.exc import GitCommandError
	from tqdm import tqdm

	DEFAULT_BRANCH = "master"
	IGNORED_BRANCHES = [] # develop, ...

	repo = Repo(".")
	branches = repo.branches

	{
	"models/staging/localytics/stg_localytics__users_retained.sql_1": {
	"date_column": "\n birth_date ",
	"left_bound": " dateadd('day',-370,'{{ env_var(\"EXECUTION_DATE\") }}'::date)\n ",
	"right_bound": "'{{ env_var(\"EXECUTION_DATE\") }}'::date"
	},
	"models/staging/localytics/dim_localytics__users.sql_1": {
	"date_column": " occurred_at ",
	"left_bound": " dateadd('day', -2, '{{ env_var(\"EXECUTION_DATE\") }}') ",
	"right_bound": "dateadd('ms', -1, '{{ env_var(\"EXECUTION_DATE\") }}' + 1)\n"

	## Hi, here are commands that will set you up for the git and github training.
	## Open a terminal and copy paste those commands group by group
	## Read the output to check if the command really worked.

	/usr/bin/ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"

	brew install git

	curl https://raw.githubusercontent.com/git/git/master/contrib/completion/git-completion.bash > ~/.git-completion.bash