This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import json | |
from pyspark.sql.types import * | |
# Define the schema | |
schema = StructType( | |
[StructField("name", StringType(), True), StructField("age", IntegerType(), True)] | |
) | |
# Write the schema | |
with open("schema.json", "w") as f: |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#standardSQL | |
CREATE TEMPORARY FUNCTION geohashEncode(latitude FLOAT64, logitude FLOAT64, precision FLOAT64) | |
RETURNS STRING | |
LANGUAGE js | |
AS """ | |
var Geohash = {}; | |
/* (Geohash-specific) Base32 map */ | |
Geohash.base32 = '0123456789bcdefghjkmnpqrstuvwxyz'; | |
lat = Number(latitude); |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# author: @Daniel_Abeles | |
# date: 18/12/2017 | |
import asyncio | |
from aiohttp import ClientSession | |
from timeit import default_timer | |
import async_timeout | |
async def fetch_all(urls: list): |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
from scipy.spatial.distance import pdist, squareform | |
import numpy as np | |
from numbapro import jit, float32 | |
def distcorr(X, Y): | |
""" Compute the distance correlation function | |
>>> a = [1,2,3,4,5] | |
>>> b = np.array([1,2,9,4,4]) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# -*- coding: utf-8 -*- | |
import dash | |
import dash_core_components as dcc | |
import dash_html_components as html | |
from dash.dependencies import Input, Output | |
import pandas as pd | |
import sqlalchemy | |
import altair as alt | |
import io | |
from vega_datasets import data |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
from tqdm import tqdm_notebook as tqdm | |
from joblib import Parallel, delayed | |
import time | |
import random | |
def func(x): | |
time.sleep(random.randint(1, 10)) | |
return x |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
### | |
### Thanks to Karl Broman http://kbroman.org/knitr_knutshell/pages/Rmarkdown.html | |
```{r global_options, include=FALSE} | |
rm(list=ls()) ### To clear namespace | |
library(knitr) | |
opts_chunk$set(fig.width=12, fig.height=8, fig.path='Figs/', | |
echo=TRUE, warning=FALSE, message=FALSE) | |
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
R to python useful data wrangling snippets | |
The dplyr package in R makes data wrangling significantly easier. | |
The beauty of dplyr is that, by design, the options available are limited. | |
Specifically, a set of key verbs form the core of the package. | |
Using these verbs you can solve a wide range of data problems effectively in a shorter timeframe. | |
Whilse transitioning to Python I have greatly missed the ease with which I can think through and solve problems using dplyr in R. | |
The purpose of this document is to demonstrate how to execute the key dplyr verbs when manipulating data using Python (with the pandas package). | |
dplyr is organised around six key verbs |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
require(caret) | |
#load some data | |
data(USArrests) | |
### Prepare Data (postive observations) | |
# add a column to be the strata. In this case it is states, it can be sites, or other locations | |
# the original data has 50 rows, so this adds a state label to 10 consecutive observations | |
USArrests$state <- c(rep(c("PA","MD","DE","NY","NJ"), each = 5)) | |
# this replaces the existing rownames (states) with a simple numerical index |
NewerOlder