Skip to content

Instantly share code, notes, and snippets.

View macleginn's full-sized avatar

Dmitry Nikolayev macleginn

View GitHub Profile
@macleginn
macleginn / get_roberta_word_embeddings.py
Created June 21, 2021 07:17
Code for extracting word embeddings from RoBERTa
def rm_whitespace(s):
if s.startswith('Ġ'):
return s[1:]
else:
return s
def get_tokens_with_ranges(input_string, tokenizer):
'''
RoBERTa prepends 'Ġ' to the beginning of what it
import pandas as pd
import matplotlib.pyplot as plt
d = pd.read_excel('spectrograms-relative-20.xlsx', header=None)
# Combine the first two columns in a new index
index_col = [ f'{a}-{b}' for a, b in zip(d.iloc[:,0], d.iloc[:,1]) ]
d.index = index_col
# Delete old index columns
del d[0]
del d[1]
@macleginn
macleginn / first_comma_collocates.py
Created October 2, 2020 09:23
A script for extracting first-comma collocates from the Bible corpus.
import re
import os
from math import log
from collections import Counter
import pandas as pd
def logL(p, k, n):
return k * log(p) + (n - k) * log(1 - p)
from itertools import combinations, permutations
from collections import Counter
import gurobipy as gb
from gurobipy import GRB
def get_pairwise_ordering(all_deprels: set, training_set_constraints: Counter):
'''
Solves an integer program and returns a non-loopy ordering
@macleginn
macleginn / pages_from_pdf.sh
Created March 14, 2020 12:33
A bash/zsh function to easily extract pages from .pdf files using qpdf
# params:
# $1 input-file path,
# $2 page range (e.g., "1-1", "10-39", "5,9-12"),
# $3 output-file path
# ex.: pages_from_pdf input.pdf "1,3,8-9" test.pdf
# qpdf should be installed
function pages_from_pdf() {
qpdf $1 --pages $1 $2 -- $3
}
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
# test.csv:
# ,b,c,d
# p,1,2,3
# q,4,5,6
# r,7,8,9
import numpy as np
import matplotlib.pyplot as plt
N = 5
menMeans = (20, 35, 30, 35, 27)
womenMeans = (25, 32, 34, 20, 25)
menStd = (2, 3, 4, 1, 2)
womenStd = (3, 5, 2, 3, 3)
ind = np.arange(N) # the x locations for the groups
width = 0.35 # the width of the bars: can also be len(x) sequence
confusion_dict_pos = {}
confusion_dict_paths = {}
# NEW STUFF #
addition_stats_pos = Counter()
addition_stats_rel = Counter()
# NEW STUFF #
strip_direction = lambda x: x.split('_')[0]
@macleginn
macleginn / plotly-offline-plot.py
Created October 23, 2019 08:49
An example of an offline Plotly plot created using Python with Plotly.js included once in the <head> section of the page.
import plotly.express as px
import plotly.offline
template = """
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8"/>
<script>{plotly}</script>
</head>
// A JS version of Python's "get" method for dicts.
function get(dict: object, key: any, plug: any) {
if (dict.hasOwnProperty(key))
return dict[key];
else
return plug;
}
function convertToUnicode(input: string): string {