Skip to content

Instantly share code, notes, and snippets.

View yassineAlouini's full-sized avatar
⚙️
PyTorch Exploration...

Yassine Alouini yassineAlouini

⚙️
PyTorch Exploration...
View GitHub Profile
@yassineAlouini
yassineAlouini / csv_from_s3.py
Created June 30, 2017 09:05
Get zipped (with bz2) CSV files from S3 into a Pandas DataFrame.
import s3fs
import pandas as pd
def get_csv_from_s3(folder_path):
dfs = []
s3 = s3fs.S3FileSystem()
for fp in s3.ls(folder_path):
if '.csv' in fp:
with s3.open(fp) as s3f:
with bz2file.open(s3f) as f:
@yassineAlouini
yassineAlouini / compare_dfs.py
Created July 4, 2017 13:50
Compare two Pandas DataFrames
import pandas as pd
def compare_two_dfs(input_df_1, input_df_2):
df_1, df_2 = input_df_1.copy(), input_df_2.copy()
ne_stacked = (df_1 != df_2).stack()
changed = ne_stacked[ne_stacked]
changed.index.names = ['id', 'col']
difference_locations = np.where(df_1 != df_2)
changed_from = df_1.values[difference_locations]
@yassineAlouini
yassineAlouini / format_time.py
Created July 14, 2017 06:42
Format seconds to human-readable time.
# This function is extracted from this file: https://github.com/dask/dask/blob/master/dask/diagnostics/progress.py
def format_time(t):
"""Format seconds into a human readable form.
>>> format_time(10.4)
'10.4s'
>>> format_time(1000.4)
'16min 40.4s'
"""
@yassineAlouini
yassineAlouini / pickle_info.py
Last active November 6, 2017 08:53
Get information about the pickling process
# Inspired from: https://airflow.incubator.apache.org/_modules/airflow/models.html#BaseOperator
import pickle
import logging
from datetime import datetime
import traceback
def pickle_info(obj, session=None):
d = {}
d['is_picklable'] = True
@yassineAlouini
yassineAlouini / df_logs.py
Created November 15, 2017 07:54
Log shape and dtypes of a DataFrame
from functools import wraps
from logs import logger
# Two decorator to log the shape and dtypes of a DataFrame
# Inspired from here: https://tomaugspurger.github.io/method-chaining
def log_shape(func):
@wraps(func)
def wrapper(*args, **kwargs):
@yassineAlouini
yassineAlouini / inplace.md
Last active November 15, 2017 09:57
A link to an explanation of why in_place in Pandas isn't good.
@yassineAlouini
yassineAlouini / test_correct_mean_var_lognormal.py
Last active November 16, 2017 13:01
Prepare a scipy lognorm distribution given its mean and variance
from scipy.stats import lognorm
import numpy as np
def prepare_lognorm(mean, var):
# Formula from https://en.wikipedia.org/wiki/Log-normal_distribution
sigma = np.sqrt(np.log(1 + (float(var) / mean ** 2)))
mu = np.log(mean / np.sqrt(1 + (float(var) / mean ** 2)))
# Compute the scale for scipy
@yassineAlouini
yassineAlouini / set_conf.md
Created November 20, 2017 08:56
Good ways to set configuration in Python
@yassineAlouini
yassineAlouini / localize_tms.py
Last active November 28, 2017 09:06
A utility function to localize a UTC timestamp DataFrame column
import pytz
def localize_datetime(input_df, timezone, tms_col):
"""
Convert datetime column from UTC to another timezone.
"""
tmz = pytz.timezone(timezone)
df = input_df.copy()
return (df.set_index(tms_col)
.tz_localize(pytz.utc) # UTC time
@yassineAlouini
yassineAlouini / package_jupyter_install.md
Created December 6, 2017 07:21
How to correctly install packages from a Jupyter Notebook
import sys
!{sys.executable} -m pip install <package>
import sys
!conda install --yes --prefix {sys.prefix} <package>