Skip to content

Instantly share code, notes, and snippets.

@dat-boris
dat-boris / savscan_fix_eml.sh
Created December 21, 2016 15:49
`savscan` is broken on scanning .eml with attachment
# see https://www.virustotal.com/en/file/d6d29b4e39029b50d3c0f9ab43cb6886ada09cb7eb295bdedc0396d8c80fe2d6/analysis/
# ripmime to avoid failure to handle .eml
find $1 -type f | xargs -n1 ripmime -d $1 -i > /dev/null
savscan -rec -archive -all -mime -f -suspicious -ss -sc $1
# ensure we do not block setup
true

Kickstarting Data Organization from scratch

I am hesitant in using the word "Big Data" to describe what I'm working on with an adjective as part of the definition feels boasting (unless I work with LHC then maybe it is justifiable for such adjective!).

When asked to describe what is "Big Data" - I use the following description:

Big data is when an organization create such a "data rock" that it cannot lift.

lifting weight

Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@dat-boris
dat-boris / 01_word_counter.py
Last active January 31, 2017 04:24
A brief demo of a word counter
import re
from collections import Counter
RE_CHAR = re.compile('\w')
def count_better_parser(stream):
counter = Counter()
word = ''
for i,char in enumerate(iter(lambda: stream.read(1), '')):
if char is None:
import re
from collections import Counter
RE_CHAR = re.compile('\w')
def functional_counts(stream):
"""
How would we scale and distribute a wordcount operation
"""
datapipe = pipeline.Pipeline([
from pipedream import pipeline
from pipedream.store import PipelineStore
def demo_broken_monitoring():
def check_word_maxlen(word):
assert 0 < len(word) < 40, "Expect be less than 40 chars"
class FrequencyValidator(object):
@dat-boris
dat-boris / find_release_locks.sql
Last active September 2, 2017 20:19
Finding and releasing locks on Redshift
select * from
stv_locks
left join stv_recents on pid = lock_owner_pid;
-- http://docs.aws.amazon.com/redshift/latest/dg/PG_TERMINATE_BACKEND.html
select * from svv_transactions t
left join stv_recents r on r.pid = t.pid
order by txn_start;
-- Query for killing jobs:
@dat-boris
dat-boris / thaler.py
Created May 15, 2017 11:48
My naive attempt to reverse distribution of "second guessers" based on result from Thaler guessing game
#!/usr/bin/env python3
"""
Thaler guessing game
---------------------
https://en.wikipedia.org/wiki/Guess_2/3_of_the_average
Guessing how much people guess ahead?
@dat-boris
dat-boris / setup_vpn_docker.sh
Created June 10, 2017 16:52
Simple VPN docker setup
#!/bin/bash -xe
# https://www.digitalocean.com/community/tutorials/how-to-run-openvpn-in-a-docker-container-on-ubuntu-14-04
OVPN_DATA="ovpn-data"
docker run --name $OVPN_DATA -v /etc/openvpn busybox
docker run --volumes-from $OVPN_DATA --rm kylemanna/openvpn ovpn_genconfig -u udp://fly.techie.im:1194
docker run --volumes-from $OVPN_DATA --rm -it kylemanna/openvpn ovpn_initpki
# generate the client - x4SLFY6MbvmqbVfe
docker run --volumes-from $OVPN_DATA --rm -it kylemanna/openvpn easyrsa build-client-full pacman nopass
docker run --volumes-from $OVPN_DATA --rm kylemanna/openvpn ovpn_getclient pacman > pacman.ovpn
@dat-boris
dat-boris / neural_style.md
Last active September 17, 2017 13:08
Getting started with Neural Style learning