Skip to content

Instantly share code, notes, and snippets.

@pierdom
pierdom / python_x11_colors.py
Created June 22, 2018 09:38
[Using X11 colors] #python #visualization
cnames = {
'aliceblue': '#F0F8FF',
'antiquewhite': '#FAEBD7',
'aqua': '#00FFFF',
'aquamarine': '#7FFFD4',
'azure': '#F0FFFF',
'beige': '#F5F5DC',
'bisque': '#FFE4C4',
'black': '#000000',
'blanchedalmond': '#FFEBCD',
@pierdom
pierdom / geodesic_distance.py
Created March 29, 2018 07:47
[Geodesic distance in Python] Calculate distance in meters between coordinates #python #gis
from geographiclib.geodesic import Geodesic
lat1, lon1 = 51.556021, -0.279519
lat2, lon2 = 51.595387, -0.243415
geod = Geodesic.WGS84
g = geod.Inverse(lat1, lon1, lat2, lon2)
print("Distance = {:.2f} meters".format(g['s12']))
@pierdom
pierdom / pandas_and_parquet.py
Created January 5, 2018 09:07
[Pandas DataFrame storage with Apache Parquet] using Apache Arrow (from https://tech.blue-yonder.com/efficient-dataframe-storage-with-apache-parquet/) #python #bigdata #pandas #datascience #parquet
# READING PARQUET FILES TO PANDAS
import pyarrow.parquet as pq
df = pq.read_table('<filename>').to_pandas()
# Only read a subset of the columns
df = pq.read_table('<filename>', columns=['A', 'B']).to_pandas()
# WRITING PARQUET FILES WITH PANDAS
import pyarrow as pa
import pyarrow.parquet as pq
table = pa.Table.from_pandas(data_frame, timestamps_to_ms=True)
@pierdom
pierdom / migrate_to_bitbucket.md
Created December 19, 2017 07:44
[Clone a GitLab repo to Bitbucket] #sysadmin #git
@pierdom
pierdom / pipeline.py
Created December 5, 2017 13:58
[A full data-preparation pipeline in Scikit-learn] #python #datascience #machinelearning #scikit
from sklearn.pipeline import Pipeline, FeatureUnion
from sklearn.preprocessing import StandardScaler, OneHotEncoder, LabelEncoder, Imputer, LabelBinarizer
# We will use to sepate Pipelines for numerical and categorical attributes
num_attribs = list(housing_num) # list of numerical attributes
cat_attribs = ["ocean_proximity"] # list of categorical attributes
# Define Pipeline of numerical attributes as list of encoders and a name (arbitray)
num_pipeline = Pipeline([
('selector', DataFrameSelector(num_attribs)),
@pierdom
pierdom / data_prep.py
Created December 5, 2017 13:46
[Common data-preparaion techniques in Scikit-learn] missing values and categorical attributes. From "Hands-on Machine Learning with Scikit-Learn and TensorFlow" #python #datascience #scikit #machinelearning
# imput dataset
housing_num = pd.DataFrame(...)
# Dealing with missing values (replace with median for each attribute)
from sklearn.preprocessing import Imputer # use the imputer estimator
imputer = Imputer(strategy="median") # tell it which strategy to use
housing_num = housing.drop("ocean_proximity", axis=1) # remove categorical attributes
imputer.fit(housing_num) # train the estimator with data
imputer.statistics_ # show statistics (check if ok)
X = imputer.transform(housing_num) # apply transf. (get numpy arr)
@pierdom
pierdom / marginal_spikes.py
Created November 28, 2017 08:00
[Marginal spikes plot in Holoview scatter plot] #visualization #python #holoviews #bokeh #datascience
points = points << hv.Spikes(points['y']) << hv.Spikes(points['x'])
@pierdom
pierdom / holoviews_invert_axes.py
Created November 27, 2017 14:58
[Invert X and Y axis in Holoviews] #holoviews #visualization #python
%%opts Histogram [invert_axes=True]
# note that this is different from invert_xaxis, which reverses an axes
# invert_axes, instead, will swap X and Y
@pierdom
pierdom / holoviews_dynamicmap_xrange.py
Created November 27, 2017 14:56
[Holoviews DynamicMap to change dynamically x-axis range] #holoviews #bokeh #visualization #python #datascience
import holoviews as hv
import numpy as np
hv.extension('bokeh')
hv.DynamicMap(lambda i: hv.Curve(np.arange(i)), kdims=['i']).redim.range(i=((10, 20))).opts(norm=dict(framewise=True))
@pierdom
pierdom / holoviews_rotate_axis.py
Created November 27, 2017 14:07
[Axis options in Holoviews/Bokeh plots] includes: tick labels rotation, logarithmic scale and other ticks options #python #visualization #holoviews #bokeh #datascience
hm = hv.HeatMap(((x_names, y_names, od_pivot)))
hm = hm.opts(plot={"xrotation": 90})
#Other axis options
# 'logy': True
# 'yaxis': None
# 'xrotation': 90}
# 'xticks': 3
# 'xticks': [0, 100, 300, 500]