Skip to content

Instantly share code, notes, and snippets.

View TimRepke's full-sized avatar
🐢

Tim TimRepke

🐢
View GitHub Profile
@TimRepke
TimRepke / README.md
Last active June 29, 2022 08:39
Spark vs Python doc2vec

Spark vs Single-core python

Question: can parallel pre- and postprocessing speed up Gensim Doc2Vec?

  • Spark: 349s
  • Vanilla: 373s

(only one run, so not a very scientific comparison)

Run on a single machine with 16GB RAM and Intel i7-8550U CPU @ 1.80GHz

@TimRepke
TimRepke / README.md
Last active October 26, 2023 15:28
Solr import/export

Solr import/export

You need to move from one solr instance to another and can't be bothered with mismatching versions or whatever? These two scripts will help you :)

First you need to create a new core in the target instance. You may want to use the schema/configset from the originating instance though, as the default schema might not be ideal.

Im my scenario I moved from Solr 5.5.5 to Solr 7.4. Therefore I had to (at least) update the solrconfig.xml, where the lucene version is specified. The exact version you need can be found in the default configset ([solr_root]/server/solr/configsets/...)

@TimRepke
TimRepke / README.md
Last active October 23, 2024 13:08
PST Archive to RFC822 (*.eml) script

PST Archive to RFC822

This script extracts all emails from an Outlook PST archive and saves them into some output folder as individual RFC822 compliant *.eml files.

Installing the external dependency pypff may not be straight forward (it wasn't for me). I forked the original repository to make it work in Python 3. If you get errors, check their wiki pages for help or try my fork. Below are the steps that worked for me:

Clone https://github.com/libyal/libpff/tree/master/pypff

@TimRepke
TimRepke / README.md
Last active July 24, 2018 14:25
Solr Downloader

ElasticSearch Downloader

Very basic script for downloading a specific index from ElasticSearch into a file containing one document per line (json-formatted).

The argparse should be pretty self explanatory.

Call the script by:

python download.py --url=http://my.elastic.search.eu --port=9200 --index=my_index --out=/path/to/output/
tim@klapprechner ~/workspace/satnavpi/valhalla (git)-[2.1.8] % ./autogen.sh :(
libtoolize: putting auxiliary files in AC_CONFIG_AUX_DIR, '.'.
libtoolize: copying file './ltmain.sh'
libtoolize: putting macros in AC_CONFIG_MACRO_DIRS, 'm4'.
libtoolize: copying file 'm4/libtool.m4'
libtoolize: copying file 'm4/ltoptions.m4'
libtoolize: copying file 'm4/ltsugar.m4'
libtoolize: copying file 'm4/ltversion.m4'
libtoolize: copying file 'm4/lt~obsolete.m4'
configure.ac:9: installing './compile'
@TimRepke
TimRepke / README.md
Last active April 11, 2016 14:09
Use your ThinkPad 'i'-LED to morse stuff

ThinkPad morse (aka ThinkBlink)

Usage:

$ sudo ./morse.sh "sos"
s . . . 
o - - - 
s . . .
@TimRepke
TimRepke / index.html
Last active September 4, 2015 08:36
Schema.org ScholarlyArticle demo
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<title>RefMe Publications - Article viewer</title>
</head>
<body>
<div id="refme-cite-widget"></div>
<div itemscope itemtype="http://schema.org/ScholarlyArticle">
<strong>Title:</strong> <span itemprop="name">Reviewing the advantages of reference generators like RefME</span><br/>