Skip to content

Instantly share code, notes, and snippets.

View Witiko's full-sized avatar

Vít Starý Novotný Witiko

  • Brno, Czech Republic
View GitHub Profile
We can make this file beautiful and searchable if this error is corrected: No commas found in this CSV file in line 0.
Neutral
Neutral
Neutral
Positive
Neutral
Neutral
Negative
Neutral
Neutral
Neutral
{
"embeddings": [
{
"tensorName": "The soft VSM with non-regularized word embeddings on the TWITTER dataset",
"tensorShape": [
3108,
3
],
"tensorPath": "https://gist.githubusercontent.com/Witiko/860f86ca52c89ee97714371ac2a91a62/raw/8df9801310d78223e67520fad47ba2cc7db0ac2d/docsim-dense_scm-twitter-1-False-True-True-800--1.0-2-vectors.csv",
"metadataPath": "https://gist.githubusercontent.com/Witiko/860f86ca52c89ee97714371ac2a91a62/raw/8df9801310d78223e67520fad47ba2cc7db0ac2d/docsim-dense_scm-twitter-1-False-True-True-800--1.0-2-metadata.csv"
#!/bin/sh
# Produces mean amount of financial support by extracting project codes from a PDF document and querying starfos.tacr.cz.
#
# Usage: ./get-mean-tacr-support.sh FILE, where FILE is a PDF document with a table of supported projects, such as
# https://www.tacr.cz/wp-content/uploads/documents/2019/10/29/1572358378_Vyhlaseni_vysledku_eTA_na_web_-_podporene.pdf
set -e
pdfgrep TL[0-9]+ "$1" |
sed -r 's/.*\s(TL[0-9]+)(\s.*|$)/\1/' |
@Witiko
Witiko / evaluate-speed-pie-chart.py
Created October 16, 2020 21:53
Creates a pie chart from a GNU Parallel joblog after running OCR-D
# -*- coding:utf-8 -*-
from itertools import dropwhile
import json
import re
import sys
import matplotlib.pyplot as plt
@Witiko
Witiko / interpret_soft_cosine_measure.py
Created March 14, 2021 09:25
Interprets the soft cosine measure in Gensim 4 as a sum of word pair similarities
def interpret_soft_cosine_measure(doc1, doc2, dictionary, similarity_matrix):
word_pair_importances = dict()
for word1_id, word1_weight in doc1:
for word2_id, word2_weight in doc2:
word_similarity = similarity_matrix.matrix[word1_id, word2_id]
word_pair_importance = word1_weight * word_similarity * word2_weight
if word_pair_importance == 0:
continue
word1 = dictionary.id2token[word1_id]
word2 = dictionary.id2token[word2_id]