Skip to content

Instantly share code, notes, and snippets.

View marksimi's full-sized avatar

Mark Simithraaratchy marksimi

View GitHub Profile
@huyng
huyng / matplotlibrc
Created February 8, 2011 15:50
my default matplotlib settings
### MATPLOTLIBRC FORMAT
# This is a sample matplotlib configuration file - you can find a copy
# of it on your system in
# site-packages/matplotlib/mpl-data/matplotlibrc. If you edit it
# there, please note that it will be overridden in your next install.
# If you want to keep a permanent local copy that will not be
# over-written, place it in HOME/.matplotlib/matplotlibrc (unix/linux
# like systems) and C:\Documents and Settings\yourname\.matplotlib
# (win32 systems).
@johntyree
johntyree / getBlockLists.sh
Last active June 4, 2024 12:30
Make one large blocklist from the bluetack lists on iblocklist.com
#!/usr/bin/env sh
# Download lists, unpack and filter, write to stdout
curl -s https://www.iblocklist.com/lists.php \
| sed -n "s/.*value='\(http:.*=bt_.*\)'.*/\1/p" \
| xargs wget -O - \
| gunzip \
| egrep -v '^#'
@tonicebrian
tonicebrian / GBT_CaliforniaHousing.py
Created November 5, 2012 16:22
Gradient Boosting Trees using Python
# =============
# Introduction
# =============
# I've been doing some data mining lately and specially looking into `Gradient
# Boosting Trees <http://en.wikipedia.org/wiki/Gradient_boosting>`_ since it is
# claimed that this is one of the techniques with best performance out of the
# box. In order to have a better understanding of the technique I've reproduced
# the example of section *10.14.1 California Housing* in the book `The Elements of Statistical Learning <http://www-stat.stanford.edu/~tibs/ElemStatLearn/>`_.
# Each point of this dataset represents the house value of a property with some
# attributes of that house. You can get the data and the description of those
@planetoftheweb
planetoftheweb / tweets_json.php
Created July 2, 2013 23:33
This gist will let you retrieve a series of tweets using the twitter v1.1 API. There's a few steps you have to do before you use this. 1. First, you need to go to http://dev.twitter.com/apps and create a new application. 2. You'll also need to download this library. https://github.com/themattharris/tmhOAuth (You only need the tmhOAuth.php file) …
<?php
require 'tmhOAuth.php'; // Get it from: https://github.com/themattharris/tmhOAuth
// Use the data from http://dev.twitter.com/apps to fill out this info
// notice the slight name difference in the last two items)
$connection = new tmhOAuth(array(
'consumer_key' => '',
'consumer_secret' => '',
'user_token' => '', //access token
{
"metadata": {
"name": "unpivoting-csv-pandas"
},
"nbformat": 3,
"nbformat_minor": 0,
"worksheets": [
{
"cells": [
{
@jdmaturen
jdmaturen / bg_nbd.py
Created October 16, 2013 05:36
Implementation of the beta-geometric/NBD (BG/NBD) model from '"Counting Your Customers" the Easy Way: An Alternative to the Pareto/NBD Model' (Fader, Hardie and Lee 2005) http://brucehardie.com/papers/018/fader_et_al_mksc_05.pdf and accompanying technical note http://www.brucehardie.com/notes/004/
"""
Implementation of the beta-geometric/NBD (BG/NBD) model from '"Counting Your Customers" the Easy Way: An Alternative to
the Pareto/NBD Model' (Fader, Hardie and Lee 2005) http://brucehardie.com/papers/018/fader_et_al_mksc_05.pdf and
accompanying technical note http://www.brucehardie.com/notes/004/
Apache 2 License
"""
from math import log, exp
import numpy as np
@stucchio
stucchio / bayesian_ab_test.py
Last active April 2, 2023 03:17
Bayesian A/B test code
from matplotlib import use
from pylab import *
from scipy.stats import beta, norm, uniform
from random import random
from numpy import *
import numpy as np
import os
# Input data
@CamDavidsonPilon
CamDavidsonPilon / 538.json
Last active November 28, 2021 07:37
Use the two files below to mimic graphs on 538. www.dataorigami.net/blogs/fivethirtyeight-mpl
{
"lines.linewidth": 2.0,
"examples.download": true,
"patch.linewidth": 0.5,
"legend.fancybox": true,
"axes.color_cycle": [
"#30a2da",
"#fc4f30",
"#e5ae38",
"#6d904f",
@glamp
glamp / customer-segmentation.py
Last active April 30, 2020 13:40
Analysis for customer segmentation blog post
import pandas as pd
# http://blog.yhathq.com/static/misc/data/WineKMC.xlsx
df_offers = pd.read_excel("./WineKMC.xlsx", sheetname=0)
df_offers.columns = ["offer_id", "campaign", "varietal", "min_qty", "discount", "origin", "past_peak"]
df_offers.head()
df_transactions = pd.read_excel("./WineKMC.xlsx", sheetname=1)
df_transactions.columns = ["customer_name", "offer_id"]
df_transactions['n'] = 1
df_transactions.head()
@jmindek
jmindek / gist:62c50dd766556b7b16d6
Last active January 31, 2024 15:48
DISTINCT ON like functionality for Redshift

distinct column -> For each row returned, return only the unique members of a set. Think of it as for each row in a projection, concatenate all the column values and return only the strings that are unique.

test_db=# SELECT DISTINCT parent_id, child_id, id FROM test.foo_table ORDER BY parent_id, child_id, id LIMIT 10;
parent_id | child_id | id
-----------+------------+-----------------------------
1000040 | 103 | 1000040|2645405726|0001|103