Skip to content

Instantly share code, notes, and snippets.

@pierdom
pierdom / pandas_categorical.py
Created November 21, 2017 13:50
[Pandas and categorical data] #pandas #datascience
import calendar
import pandas as pd
[...]
df["day_of_the_week"] = pd.Categorical(df["day_of_the_week"], list(calendar.day_abbr))
@pierdom
pierdom / bokeh_and_jupyter_controls.ipynb
Created November 17, 2017 11:32
[Plot distribution in Bokeh with Jupyter widgets] #statistics #datascience #python #visualization #bokeh
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@pierdom
pierdom / histogram_frequency.py
Created September 28, 2017 09:59
[Matplotlib hystograms with frequencies] #matplotlib #python #visualization #statistics #datascience
import numpy as np
import matplotlib.pyplot as plt
# my empirical distribution (it could be a numpy array)
mydistr = [...]
# calculate histogram weights
weights = np.ones_like(distr)/float(len(distr))
# custom binning (just an example with 0.1)
@pierdom
pierdom / annotate.py
Created September 28, 2017 08:10
[Matplotlib annotate with arrow] #matplotlib #python #visualization
ax.annotate(my_text, xy=(arrow_x,arrow_y), xytext=(text_x,text_y),
arrowprops=dict(facecolor='gray', shrink=0.05))
@pierdom
pierdom / pandas_visualization_options.py
Created September 20, 2017 14:39
[Pandas visualization settings] For example: number of columns and rows showed on terminals, iPython, Jupyter, etc. #python #pandas #jupyter
import pandas as pd
pd.options.display.max_columns = 40
pd.options.display.max_rows = 999
#details here: https://pandas.pydata.org/pandas-docs/stable/options.html
@pierdom
pierdom / dataframe_to_s3.py
Created September 20, 2017 09:18
[Export Pandas dataframe to compressed CSV on Amazon S3] #pandas #python #bigdata
import pandas
import io
import gzip
import boto3
csv_buffer = io.StringIO()
my_df.to_csv(csv_buffer, index=False)
csv_buffer.seek(0)
gz_buffer = io.BytesIO()
@pierdom
pierdom / horizontal_vertical_line.py
Created September 18, 2017 15:47
[Horizontal/Vertical straight lines on Matplotlib] #matplotlib #python #visualization
# horizontal line
ax.axhline(0.5, color="gray")
# vertical line
ax.axvline(0.5, color="gray")
@pierdom
pierdom / distr_fitting.ipynb
Last active January 19, 2022 10:30
[Find best fitting distributions] Find the best fitting PDFs (power distribution functions) from a list of well-known distributions in scipy. Inspired by this: https://stackoverflow.com/questions/6620471/fitting-empirical-distribution-to-theoretical-ones-with-scipy-python #datascience #python #matplotlib #visualization #statistics
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@pierdom
pierdom / matplotlib_color_cycle.py
Last active September 6, 2017 12:04
[Cycle over colors in Matplotlib] Frist create a color map ('cm') of a given palette, the tell to bin the color map (depending on the number of requested colors, in the example taken from the size of an array). Now, every time we plot to the axe 'ax', we get the next color automatically #matplotlib #visualization
# new solution (N is the number of elements)
ax.set_prop_cycle('color',plt.cm.rainbow(np.linspace(0,1,N)))
# the solution below is deprecated
cm = plt.get_cmap('gist_rainbow')
ax.set_color_cycle([cm(1.*i/len(YOUR_LIST)) for i in np.arange(len(YOUR_LIST))])
@pierdom
pierdom / hive_timestamps.sql
Last active September 6, 2017 08:03
[Apache HIVE timestamp operations] #hive #bigdata #sql
UNIX_TIMESTAMP(timestamp)
FROM_UNIXTIME(timestamp, "format")