Skip to content

Instantly share code, notes, and snippets.

View cbonnett's full-sized avatar

Christopher Bonnett cbonnett

View GitHub Profile
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
with
dau as (
-- This part of the query can be pretty much anything.
-- The only requirement is that it have three columns:
-- dt, user_id, inc_amt
-- Where dt is a date and user_id is some unique identifier for a user.
-- Each dt-user_id pair should be unique in this table.
-- inc_amt represents the amount of value that this user created on dt.
-- The most common case is
-- inc_amt = incremental revenue from the user on dt
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@0asa
0asa / sklearn-pyspark.py
Created January 27, 2015 11:12
Run a Scikit-Learn algorithm on top of Spark with PySpark
from pyspark import SparkConf, SparkContext
from sklearn.datasets import make_classification
from sklearn.ensemble import ExtraTreesClassifier
import pandas as pd
import numpy as np
conf = (SparkConf()
.setMaster("local[*]")
.setAppName("My app")
.set("spark.executor.memory", "1g"))
@GaelVaroquaux
GaelVaroquaux / mutual_info.py
Last active June 18, 2023 12:25
Estimating entropy and mutual information with scikit-learn: visit https://github.com/mutualinfo/mutual_info
'''
Non-parametric computation of entropy and mutual-information
Adapted by G Varoquaux for code created by R Brette, itself
from several papers (see in the code).
This code is maintained at https://github.com/mutualinfo/mutual_info
Please download the latest code there, to have improvements and
bug fixes.