Skip to content

Instantly share code, notes, and snippets.

View gousiosg's full-sized avatar

Georgios Gousios gousiosg

View GitHub Profile
@gousiosg
gousiosg / json2bson.rb
Created October 9, 2015 08:08
Convert JSON to BSON for importing into MongoDB
#!/usr/bin/env ruby
require 'json'
require 'bson' # Requires bson version > 3
json = JSON.parse(File.open(ARGV[0]).read)
w = File.open("#{ARGV[0]}.bson",'w')
json.each do |j|
@gousiosg
gousiosg / setup_bcache.sh
Created October 14, 2015 19:54
Setting up SSD as a cache for slow cloud volumes
#!/usr/bin/env bash
lsblk
apt-get install -y lvm2 mdadm bcache-tools
# Create a linux raid autodetect primary partition
# use the following keystrokes: np1tfdw
fdisk /dev/sdc
fdisk /dev/sdd
# setup raid
# start the replset nodes
$ mongod --dbpath mongodb/ --replSet ghtorrent
$ mongod --dbpath mongodb-repl1/ --port 27018 --replSet ghtorrent
$ mongod --dbpath mongodb-repl2/ --port 27019 --replSet ghtorrent
# connect to primary
$ mongo
# In mongo shell
ghtorrent:PRIMARY> rs.initiate()
@gousiosg
gousiosg / README.md
Last active September 23, 2016 14:55
Experiments with various languages on low level file parsing

So today I was experimenting with various languages in order to make the GHTorrent MySQL "CSV" dumps to behave like RFC-compliant CSV files. This involved parsing multi-GB, UTF-8 encoded files and running a small state-machine at the character level. I started with Ruby, but it was slow:

$ time ruby csvify.rb projects.csv >/dev/null

real	0m36.714s
user	0m35.689s
@gousiosg
gousiosg / unix-compatible.sh
Last active November 20, 2017 10:17
How compatible is your Unix with the original one?
#!/usr/bin/env bash
TEMPFILE=/tmp/unixcount
exist=0
notexist=0
echo 0 0 > $TEMPFILE
curl "https://raw.githubusercontent.com/dspinellis/unix-v4man/master/man0/ptxx"|
grep "(I)"|
@gousiosg
gousiosg / pink_rubies.dot
Last active April 3, 2018 11:49
33 pink rubies
digraph g {
rankdir=LR;
graph [fontname = "helvetica"];
node [shape=record, fontname = "helvetica"];
edge [fontname = "helvetica"];
1 -> 95;
1 -> 10;
2 -> 78;
@gousiosg
gousiosg / README.md
Last active November 8, 2023 05:20
Restoring the GHTorrent MongoDB database

This is a collection of scripts to restore a full GHTorrent MongoDB database from the dumps available at http://ghtorrent-downloads.ewi.tudelft.nl.

To do the restore:

  1. Open a MongoDB terminal and run the createCollections.js script to create the necessary collections. You can block_compressor to either snappy or zlib to make your databases compressed. I am using none here, as I am using compression at the filesystem level.

  2. Run restore-cummulative-dumps.sh to restore the cummulative dumps. Wait 3-4 days.

#!/usr/bin/env python
# (c) 2018 Georgios Gousios <[email protected]>
#
# Barebones linear equation solving trainer
from __future__ import division
from random import randint
import codecs
import sys
highlight -O rtf -s seashell -k Monaco -K 20 foo.rb |pbcopy
@gousiosg
gousiosg / ml4se.bib
Last active December 9, 2020 13:29
My reading list for ML4SE
@article{Alon19,
author = {Alon, Uri and Zilberstein, Meital and Levy, Omer and Yahav, Eran},
title = {Code2Vec: Learning Distributed Representations of Code},
journal = {Proc. ACM Program. Lang.},
issue_date = {January 2019},
volume = {3},
number = {POPL},
month = jan,
year = {2019},
issn = {2475-1421},