arose13’s gists

arose13 / StratifiedDummyRegressor.py

Created August 20, 2019 18:50

Computing the mean of a particular model, conditional on some categorical variable

	import pandas as pd
	from sklearn.base import BaseEstimator, RegressorMixin
	from sklearn.preprocessing import OneHotEncoder
	from sklearn.exceptions import NotFittedError


	class StratifiedDummyRegressor(BaseEstimator, RegressorMixin):
	"""
	An extremely scalable dummy regression model for computing the mean for each group specified by a column.

arose13 / xgboost_train.py

Last active June 11, 2019 19:27

How to train a XGBoost in how I believe is the best way (on large data)

	import xgboost as xgb

	# Notice the large number of trees and the low learning rate.
	# There are other important parameters like `subsample`, `min_child_weight` `colsample_bytree` but I'll leave that up
	# to you and grid searching.
	gbm = xgb.XGBRFRegressor(n_estimators=10000, learning_rate=0.01, n_jobs=-1)

	# Training with automatic termination
	gbm.fit(
	x_train, y_train,

arose13 / monte_carlo_pi.py

Created September 7, 2018 21:40

A (hopefully) extremely high precision Monte Carlo estimation of pi

	# Extremely high precision monte carlo estimation of pi
	import numpy as np
	import numpy.linalg as la
	from sympy import N, pi


	def calculate_pi():
	inside, n = 0, 1e6
	for i in range(int(n)):
	nth = i+1

arose13 / TracyWidomCDF.csv

Created November 19, 2017 15:53

Tracy Widom Cumulative Density Function values in ln probabilities.

arose13 / notebook-steps.sh

Last active December 24, 2018 19:46

Creating a Jupyter Notebook Server on Google Cloud

	#########################################################################################
	### From Google Cloud Console
	# from the navigation menu, under the Networking > VPC Network > Firewall rules

	click 'CREATE FIREWALL RULE'
	set Name
	set Targets to 'All instances in the network'
	set source IP range to '0.0.0.0/0'
	set protocols and port to 'Allow all'
	click create

arose13 / dockerCleanup.sh

Created October 5, 2017 15:43

Docker Cleanup Commands

	# Kill all running containers
	sudo docker kill $(sudo docker ps -q)

	# Delete all stopped containers (This is the step that frees the most disk space)
	sudo docker rm $(sudo docker ps -a -q)

	# Delete all docker images
	sudo docker rmi $(sudo docker images -q)

arose13 / install-conda.sh

Last active November 11, 2024 05:41

Install Miniconda in Ubuntu

	# Setup Ubuntu
	sudo apt update --yes
	sudo apt upgrade --yes

	# Get Miniconda and make it the main Python interpreter
	wget https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh -O ~/miniconda.sh
	bash ~/miniconda.sh -b -p ~/miniconda
	rm ~/miniconda.sh

	export PATH=~/miniconda/bin:$PATH

arose13 / transpose_csv_ooc.py

Last active April 8, 2017 00:45

Out Of Core CSV Transposing. Constant memory use. Arbitrary CSV size.

	import csv

	def transpose_csv_out_of_core(csv_path, output_csv_path='transposed.csv', delimiter=','):
	"""
	On my laptop it can transpose at ~375,000 lines a sec

	:param csv_path:
	:param output_csv_path:
	:param delimiter:
	:return:

arose13 / roman_numerals.py

Created January 13, 2017 04:45

arose13 / normal_inverse_cdf.py

Created December 27, 2016 19:41

Scipy free implementation of Normal distribution inverse CDF

	def inverse_normal_cdf(p, mean, std):
	"""
	This is the inverse to a normal distribution's CDF.

	While much slower this means you do not need Scipy as a project requirement.
	:param p: list of p = (0, 1)
	:param mean:
	:param std:
	:return:
	"""

Stephen Anthony Rose arose13