Skip to content

Instantly share code, notes, and snippets.

@dat-boris
dat-boris / thaler.py
Created May 15, 2017 11:48
My naive attempt to reverse distribution of "second guessers" based on result from Thaler guessing game
#!/usr/bin/env python3
"""
Thaler guessing game
---------------------
https://en.wikipedia.org/wiki/Guess_2/3_of_the_average
Guessing how much people guess ahead?
@dat-boris
dat-boris / find_release_locks.sql
Last active September 2, 2017 20:19
Finding and releasing locks on Redshift
select * from
stv_locks
left join stv_recents on pid = lock_owner_pid;
-- http://docs.aws.amazon.com/redshift/latest/dg/PG_TERMINATE_BACKEND.html
select * from svv_transactions t
left join stv_recents r on r.pid = t.pid
order by txn_start;
-- Query for killing jobs:
from pipedream import pipeline
from pipedream.store import PipelineStore
def demo_broken_monitoring():
def check_word_maxlen(word):
assert 0 < len(word) < 40, "Expect be less than 40 chars"
class FrequencyValidator(object):
import re
from collections import Counter
RE_CHAR = re.compile('\w')
def functional_counts(stream):
"""
How would we scale and distribute a wordcount operation
"""
datapipe = pipeline.Pipeline([
@dat-boris
dat-boris / 01_word_counter.py
Last active January 31, 2017 04:24
A brief demo of a word counter
import re
from collections import Counter
RE_CHAR = re.compile('\w')
def count_better_parser(stream):
counter = Counter()
word = ''
for i,char in enumerate(iter(lambda: stream.read(1), '')):
if char is None:
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

Kickstarting Data Organization from scratch

I am hesitant in using the word "Big Data" to describe what I'm working on with an adjective as part of the definition feels boasting (unless I work with LHC then maybe it is justifiable for such adjective!).

When asked to describe what is "Big Data" - I use the following description:

Big data is when an organization create such a "data rock" that it cannot lift.

lifting weight

@dat-boris
dat-boris / savscan_fix_eml.sh
Created December 21, 2016 15:49
`savscan` is broken on scanning .eml with attachment
# see https://www.virustotal.com/en/file/d6d29b4e39029b50d3c0f9ab43cb6886ada09cb7eb295bdedc0396d8c80fe2d6/analysis/
# ripmime to avoid failure to handle .eml
find $1 -type f | xargs -n1 ripmime -d $1 -i > /dev/null
savscan -rec -archive -all -mime -f -suspicious -ss -sc $1
# ensure we do not block setup
true
@dat-boris
dat-boris / debug_flink_aws.scala
Created December 14, 2016 09:39
Some notes on our process on debugging Flink AWS parameter name
@dat-boris
dat-boris / recorder.py
Last active November 28, 2016 03:30
Simple quantopian polyfill to allow recording metrics into dataframe
from zipline.api import (
schedule_function, date_rules, time_rules, sid, symbol,
set_slippage, slippage, set_commission, commission,
get_datetime, order_target_percent,
# record, # Replaced below
attach_pipeline,
order_target, get_open_orders, history
)