Skip to content

Instantly share code, notes, and snippets.

View arose13's full-sized avatar
🎯
Focusing

Stephen Anthony Rose arose13

🎯
Focusing
View GitHub Profile
@arose13
arose13 / StratifiedDummyRegressor.py
Created August 20, 2019 18:50
Computing the mean of a particular model, conditional on some categorical variable
import pandas as pd
from sklearn.base import BaseEstimator, RegressorMixin
from sklearn.preprocessing import OneHotEncoder
from sklearn.exceptions import NotFittedError
class StratifiedDummyRegressor(BaseEstimator, RegressorMixin):
"""
An extremely scalable dummy regression model for computing the mean for each group specified by a column.
@arose13
arose13 / xgboost_train.py
Last active June 11, 2019 19:27
How to train a XGBoost in how I believe is the best way (on large data)
import xgboost as xgb
# Notice the large number of trees and the low learning rate.
# There are other important parameters like `subsample`, `min_child_weight` `colsample_bytree` but I'll leave that up
# to you and grid searching.
gbm = xgb.XGBRFRegressor(n_estimators=10000, learning_rate=0.01, n_jobs=-1)
# Training with automatic termination
gbm.fit(
x_train, y_train,
@arose13
arose13 / monte_carlo_pi.py
Created September 7, 2018 21:40
A (hopefully) extremely high precision Monte Carlo estimation of pi
# Extremely high precision monte carlo estimation of pi
import numpy as np
import numpy.linalg as la
from sympy import N, pi
def calculate_pi():
inside, n = 0, 1e6
for i in range(int(n)):
nth = i+1
@arose13
arose13 / TracyWidomCDF.csv
Created November 19, 2017 15:53
Tracy Widom Cumulative Density Function values in ln probabilities.
x beta_1 beta_2 beta_4
-10.0 -49.506602584709114 -83.75764935485152 -34.944193713785346
-9.983983983983984 -49.28872553438857 -83.35768930492326 -34.762110705543805
-9.967967967967969 -49.0715029374409 -82.95900944153034 -34.58065343015742
-9.951951951951951 -48.85493377784113 -82.56160770946977 -34.39982084843748
-9.935935935935936 -48.639017039589795 -82.1654820535336 -34.2196119211645
-9.91991991991992 -48.42375170671292 -81.77063041850863 -34.04002560908787
-9.903903903903904 -48.2091367632627 -81.3770507491765 -33.86106087292524
-9.887887887887889 -47.99517119331738 -80.98474099031358 -33.68271667336256
-9.871871871871871 -47.78185398098101 -80.59369908669095 -33.504991971054295
@arose13
arose13 / notebook-steps.sh
Last active December 24, 2018 19:46
Creating a Jupyter Notebook Server on Google Cloud
#########################################################################################
### From Google Cloud Console
# from the navigation menu, under the Networking > VPC Network > Firewall rules
click 'CREATE FIREWALL RULE'
set Name
set Targets to 'All instances in the network'
set source IP range to '0.0.0.0/0'
set protocols and port to 'Allow all'
click create
@arose13
arose13 / dockerCleanup.sh
Created October 5, 2017 15:43
Docker Cleanup Commands
# Kill all running containers
sudo docker kill $(sudo docker ps -q)
# Delete all stopped containers (This is the step that frees the most disk space)
sudo docker rm $(sudo docker ps -a -q)
# Delete all docker images
sudo docker rmi $(sudo docker images -q)
@arose13
arose13 / install-conda.sh
Last active November 11, 2024 05:41
Install Miniconda in Ubuntu
# Setup Ubuntu
sudo apt update --yes
sudo apt upgrade --yes
# Get Miniconda and make it the main Python interpreter
wget https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh -O ~/miniconda.sh
bash ~/miniconda.sh -b -p ~/miniconda
rm ~/miniconda.sh
export PATH=~/miniconda/bin:$PATH
@arose13
arose13 / transpose_csv_ooc.py
Last active April 8, 2017 00:45
Out Of Core CSV Transposing. Constant memory use. Arbitrary CSV size.
import csv
def transpose_csv_out_of_core(csv_path, output_csv_path='transposed.csv', delimiter=','):
"""
On my laptop it can transpose at ~375,000 lines a sec
:param csv_path:
:param output_csv_path:
:param delimiter:
:return:
def checkio(data):
soln = {
1: 'I',
4: 'IV',
5: 'V',
9: 'IX',
10: 'X',
40: 'XL',
50: 'L',
90: 'XC',
@arose13
arose13 / normal_inverse_cdf.py
Created December 27, 2016 19:41
Scipy free implementation of Normal distribution inverse CDF
def inverse_normal_cdf(p, mean, std):
"""
This is the inverse to a normal distribution's CDF.
While much slower this means you do not need Scipy as a project requirement.
:param p: list of p = (0, 1)
:param mean:
:param std:
:return:
"""