Skip to content

Instantly share code, notes, and snippets.

View pierdom's full-sized avatar

Pierdomenico Fiadino pierdom

View GitHub Profile
@pierdom
pierdom / jupyter_header.py
Last active September 6, 2017 08:09
[My Jupyter header for Python notebooks] a common list of libraries I import when creating a new notebook with Jupyter #python #jupyter #datascience
import matplotlib
matplotlib.use("Agg")
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
import pandas as pd
%matplotlib inline
@pierdom
pierdom / spark_df_to_hive.py
Last active September 6, 2017 12:37
[Persist/save Spark dataframe to a Hive table in ORC format] #hive #spark #bigdata
df.write.format("orc").saveAsTable("my_table_name")
@pierdom
pierdom / hadoop_s3.md
Last active September 6, 2017 08:08
[Configure Hadoop access to S3 (Hive, Spark)] #hive #spark #bigdata #sysadmin #aws

Add AWS credentials to hdfs-site.xml

fs.s3a.awsAccessKeyId=XXXX
fs.s3a.awsSecretAccessKey=YYYY
<property>
  <name>fs.s3a.access.key</name>
@pierdom
pierdom / get_day_name.sql
Last active September 6, 2017 08:08
[Postgres: from date to weekday] Get week of the day from a date column in Postgres (full name: monday, tuesday, ...) #postgresql #sql
select to_char(<mydate>, 'day')
from mytable
@pierdom
pierdom / psql_rows_to_array.md
Last active September 6, 2017 08:07
[From rows to arrays in Postgres and Hive] grouping by a key #datascience #bigdata #postgresql #hive #sql

Original table:

 k  | v
----------
 A  | 1
 A  | 2
 A  | 3
 B  | 6
 B | 7
@pierdom
pierdom / hive_queue.sh
Last active September 6, 2017 08:04
[Start Apache HIVE (on Tez) shell in a specific Yarn scheduler queue] #hive #bigdata #sysadmin #yarn #bash
hive -hiveconf tez.queue.name=<name>
@pierdom
pierdom / hive_timestamps.sql
Last active September 6, 2017 08:03
[Apache HIVE timestamp operations] #hive #bigdata #sql
UNIX_TIMESTAMP(timestamp)
FROM_UNIXTIME(timestamp, "format")
@pierdom
pierdom / matplotlib_color_cycle.py
Last active September 6, 2017 12:04
[Cycle over colors in Matplotlib] Frist create a color map ('cm') of a given palette, the tell to bin the color map (depending on the number of requested colors, in the example taken from the size of an array). Now, every time we plot to the axe 'ax', we get the next color automatically #matplotlib #visualization
# new solution (N is the number of elements)
ax.set_prop_cycle('color',plt.cm.rainbow(np.linspace(0,1,N)))
# the solution below is deprecated
cm = plt.get_cmap('gist_rainbow')
ax.set_color_cycle([cm(1.*i/len(YOUR_LIST)) for i in np.arange(len(YOUR_LIST))])
@pierdom
pierdom / distr_fitting.ipynb
Last active January 19, 2022 10:30
[Find best fitting distributions] Find the best fitting PDFs (power distribution functions) from a list of well-known distributions in scipy. Inspired by this: https://stackoverflow.com/questions/6620471/fitting-empirical-distribution-to-theoretical-ones-with-scipy-python #datascience #python #matplotlib #visualization #statistics
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@pierdom
pierdom / horizontal_vertical_line.py
Created September 18, 2017 15:47
[Horizontal/Vertical straight lines on Matplotlib] #matplotlib #python #visualization
# horizontal line
ax.axhline(0.5, color="gray")
# vertical line
ax.axvline(0.5, color="gray")