Germayne germayneng

"write code so simple there are obviously no bugs in it, or write code so complex that there are no obvious bugs in it" - Tony Hoare

germayneng / export-pyspark-schema-to-json.py

Created September 23, 2020 04:22 — forked from stefanthoss/export-pyspark-schema-to-json.py

Export/import a PySpark schema to/from a JSON file

	import json
	from pyspark.sql.types import *

	# Define the schema
	schema = StructType(
	[StructField("name", StringType(), True), StructField("age", IntegerType(), True)]
	)

	# Write the schema
	with open("schema.json", "w") as f:

germayneng / BigQueryGeohashEncode.sql

Last active May 27, 2020 12:42 — forked from killerbees/BigQueryGeohashEncode.sql

Big Query STD SQL Gist for Geohash Encode

	#standardSQL
	CREATE TEMPORARY FUNCTION geohashEncode(latitude FLOAT64, logitude FLOAT64, precision FLOAT64)
	RETURNS STRING
	LANGUAGE js
	AS """
	var Geohash = {};
	/* (Geohash-specific) Base32 map */
	Geohash.base32 = '0123456789bcdefghjkmnpqrstuvwxyz';

	lat = Number(latitude);

germayneng / aiohttp-example.py

Created April 28, 2020 03:20 — forked from Den1al/aiohttp-example.py

concurrent http requests with aiohttp

	# author: @Daniel_Abeles
	# date: 18/12/2017

	import asyncio
	from aiohttp import ClientSession
	from timeit import default_timer
	import async_timeout


	async def fetch_all(urls: list):

germayneng / distcorr.py

Created March 15, 2020 12:38 — forked from satra/distcorr.py

Distance Correlation in Python

	from scipy.spatial.distance import pdist, squareform
	import numpy as np

	from numbapro import jit, float32

	def distcorr(X, Y):
	""" Compute the distance correlation function

	>>> a = [1,2,3,4,5]
	>>> b = np.array([1,2,9,4,4])

germayneng / altair_app.py

Created March 15, 2019 02:29 — forked from gschivley/altair_app.py

Altair plot in Plotly Dash

	# -- coding: utf-8 --
	import dash
	import dash_core_components as dcc
	import dash_html_components as html
	from dash.dependencies import Input, Output
	import pandas as pd
	import sqlalchemy
	import altair as alt
	import io
	from vega_datasets import data

germayneng / parallel.py

Created March 11, 2019 04:42 — forked from MInner/parallel.py

Executing jobs in parallel with a nice progress bar: a tqdm wrapper for joblib.Parallel

	from tqdm import tqdm_notebook as tqdm
	from joblib import Parallel, delayed
	import time

	import random

	def func(x):
	time.sleep(random.randint(1, 10))
	return x

germayneng / knitr_header.r

Created September 9, 2018 12:14 — forked from cfljam/knitr_header.r

Global Options Chunk for Knitr in RMarkdown Documents

	###
	### Thanks to Karl Broman http://kbroman.org/knitr_knutshell/pages/Rmarkdown.html

	```{r global_options, include=FALSE}
	rm(list=ls()) ### To clear namespace
	library(knitr)
	opts_chunk$set(fig.width=12, fig.height=8, fig.path='Figs/',
	echo=TRUE, warning=FALSE, message=FALSE)
	```

germayneng / gist:431f2821c849359fdd697528abf200f2

Last active December 11, 2017 04:57 — forked from conormm/r-to-python-data-wrangling-basics.md

R to Python: Data wrangling with dplyr and pandas (update)

	R to python useful data wrangling snippets

	The dplyr package in R makes data wrangling significantly easier.
	The beauty of dplyr is that, by design, the options available are limited.
	Specifically, a set of key verbs form the core of the package.
	Using these verbs you can solve a wide range of data problems effectively in a shorter timeframe.
	Whilse transitioning to Python I have greatly missed the ease with which I can think through and solve problems using dplyr in R.
	The purpose of this document is to demonstrate how to execute the key dplyr verbs when manipulating data using Python (with the pandas package).

	dplyr is organised around six key verbs

germayneng / stratifiedCV.r

Created August 11, 2017 06:19 — forked from mrecos/stratifiedCV.r

Stratified K-folds Cross-Validation with Caret

	require(caret)

	#load some data
	data(USArrests)

	### Prepare Data (postive observations)
	# add a column to be the strata. In this case it is states, it can be sites, or other locations
	# the original data has 50 rows, so this adds a state label to 10 consecutive observations
	USArrests$state <- c(rep(c("PA","MD","DE","NY","NJ"), each = 5))
	# this replaces the existing rownames (states) with a simple numerical index