Skip to content

Instantly share code, notes, and snippets.

View evanmiltenburg's full-sized avatar

Emiel van Miltenburg evanmiltenburg

View GitHub Profile
@evanmiltenburg
evanmiltenburg / excel_dropdown_test.py
Created June 17, 2019 10:14
Generate an Excel worksheet to provide word-level annotations
import xlsxwriter
# Create workbook with a new worksheet.
workbook = xlsxwriter.Workbook('hello.xlsx')
worksheet = workbook.add_worksheet()
# Write the tokens.
worksheet.write('A1', 'Hello')
worksheet.write('B1', 'world')
worksheet.write('C1', '!')
@evanmiltenburg
evanmiltenburg / find_people.py
Created April 5, 2020 08:28
Script om personen te vinden in Nederlandse tekst
import spacy
nlp = spacy.load('nl_core_news_sm')
with open('bordewijk.txt') as f:
doc = nlp(f.read())
people = [ent.orth_ for ent in doc.ents if ent.label_ == 'PERSON']
print(people)
@evanmiltenburg
evanmiltenburg / download_book.py
Last active January 5, 2022 09:35
Download OHLDM
import requests
import re
import time
r = requests.get('https://direct.mit.edu/books/book/5244/The-Open-Handbook-of-Linguistic-Data-Management',
stream=True, headers={'User-agent': 'Mozilla/5.0'})
urls = re.findall('href="(.*?.pdf)"', r.text)
base = 'https://direct.mit.edu'
urls = [base + path for path in urls if '/book/' in path]