Vít Starý Novotný Witiko

Witiko / docsim-dense_scm-twitter-1-False-True-True-800--1.0-2-metadata.csv

Created November 1, 2019 15:06

We can make this file beautiful and searchable if this error is corrected: No commas found in this CSV file in line 0.

Witiko / docsim-dense_scm-twitter-1-False-True-True-800--1.0-2.json

Created November 1, 2019 15:07

	{
	"embeddings": [
	{
	"tensorName": "The soft VSM with non-regularized word embeddings on the TWITTER dataset",
	"tensorShape": [
	3108,
	3
	],
	"tensorPath": "https://gist.githubusercontent.com/Witiko/860f86ca52c89ee97714371ac2a91a62/raw/8df9801310d78223e67520fad47ba2cc7db0ac2d/docsim-dense_scm-twitter-1-False-True-True-800--1.0-2-vectors.csv",
	"metadataPath": "https://gist.githubusercontent.com/Witiko/860f86ca52c89ee97714371ac2a91a62/raw/8df9801310d78223e67520fad47ba2cc7db0ac2d/docsim-dense_scm-twitter-1-False-True-True-800--1.0-2-metadata.csv"

Witiko / get-mean-tacr-support.sh

Created July 28, 2020 13:40

	#!/bin/sh
	# Produces mean amount of financial support by extracting project codes from a PDF document and querying starfos.tacr.cz.
	#
	# Usage: ./get-mean-tacr-support.sh FILE, where FILE is a PDF document with a table of supported projects, such as
	# https://www.tacr.cz/wp-content/uploads/documents/2019/10/29/1572358378_Vyhlaseni_vysledku_eTA_na_web_-_podporene.pdf

	set -e

	pdfgrep TL[0-9]+ "$1" \|
	sed -r 's/.\s(TL[0-9]+)(\s.\|$)/\1/' \|

Witiko / evaluate-speed-pie-chart.py

Created October 16, 2020 21:53

Creates a pie chart from a GNU Parallel joblog after running OCR-D

	# -- coding:utf-8 --

	from itertools import dropwhile
	import json
	import re
	import sys

	import matplotlib.pyplot as plt

Witiko / interpret_soft_cosine_measure.py

Created March 14, 2021 09:25

Interprets the soft cosine measure in Gensim 4 as a sum of word pair similarities

	def interpret_soft_cosine_measure(doc1, doc2, dictionary, similarity_matrix):
	word_pair_importances = dict()
	for word1_id, word1_weight in doc1:
	for word2_id, word2_weight in doc2:
	word_similarity = similarity_matrix.matrix[word1_id, word2_id]
	word_pair_importance = word1_weight * word_similarity * word2_weight
	if word_pair_importance == 0:
	continue
	word1 = dictionary.id2token[word1_id]
	word2 = dictionary.id2token[word2_id]