Skip to content

Instantly share code, notes, and snippets.

View gvyshnya's full-sized avatar

George Vyshnya gvyshnya

View GitHub Profile
@gvyshnya
gvyshnya / Datetime-related feature engineering
Created July 7, 2020 08:25
Demo on how to do Datetime-related and trend/lag feature engineering (with COVID-19 pandemic data as a case study)
import numpy as np
import pandas as pd
import pdpipe as pdp
# ... data reading code goes here
# set up a transformation pipeline
pipeline_1 = pdp.ApplyByCols(
['lat', 'lon', 'lat_inspection_location', 'lon_inspection_location'],
lambda col: pd.to_numeric(col)
@gvyshnya
gvyshnya / gbm_model_dense.py
Created December 17, 2017 09:36
GBM prediction model, dense input
# GBM prediction
import numpy as np
import pandas as pd
from sklearn import *
import datetime as dt
def RMSLE(y, pred):
return metrics.mean_squared_error(y, pred) ** 0.5
@gvyshnya
gvyshnya / gbm_model_sparse.py
Created December 17, 2017 09:33
gbm prediction model, sparse data input
# GBM prediction
import numpy as np
import pandas as pd
from sklearn import *
import datetime as dt
def RMSLE(y, pred):
return metrics.mean_squared_error(y, pred) ** 0.5
@gvyshnya
gvyshnya / sparsity.py
Created December 17, 2017 09:28
The code to test the sparsity of your data input
def sparsity_ratio(X):
return 1.0 - np.count_nonzero(X) / float(X.shape[0] * X.shape[1])
print("input sparsity ratio:", sparsity_ratio(X))
@gvyshnya
gvyshnya / preprocessing.py
Created December 8, 2017 11:26
Basic pre-processing and feature engineering script for Recruit Restaurant Visitor Forecasting contest
# Project/Contest: Recruit Restaurant Visitor Forecasting (https://www.kaggle.com/c/recruit-restaurant-visitor-forecasting)
#
# Summary: this is a basic pre-processing and feature engineering script to transform original input data from the Customer
# into a ready-for modelling training and testing sets
#
# inspirations:
# - https://www.kaggle.com/the1owl/surprise-me/
import numpy as np
import pandas as pd
#!/usr/bin/python
import pandas as pd
import numpy as np
import fbprophet as fbpro
import sklearn.metrics as skm
import math
import datetime as dt
class ProphetModeller(object):
@gvyshnya
gvyshnya / gist:bc69fe987fa49f34de98af67d99ee684
Created November 21, 2017 06:37
xgboost running with tree_method = 'hist'
import xgboost as xgb
import numpy as np
from sklearn.datasets import load_digits
from sklearn.cross_validation import train_test_split
rng = np.random.RandomState(1994)
digits = load_digits(2)
X = digits['data']
y = digits['target']
@gvyshnya
gvyshnya / dvc repro code
Created August 20, 2017 19:22
DVC repro command power
# Improve ensemble configuration
$ vi code/config.R
# Commit all the changes.
$ git commit -am "Updated weights of the models in the ensemble"
# Reproduce the ensemble prediction
$ dvc repro data/submission_ensemble.csv
@gvyshnya
gvyshnya / mad_with_holidays.py
Created August 8, 2017 23:44
Simple benchmark prediction of Wikipedia traffic with median (median by page, weekdays, and holidays) and consistent Holidays management
# Project/Competition: https://www.kaggle.com/c/web-traffic-time-series-forecasting/
# Simple benchmark prediction with median (median by page, weekdays, and holidays)
#
# - You should insall Workalendar from its github repo directly
# >>> pip install git+https://github.com/novafloss/workalendar.git
import pandas as pd
import pandas.tseries.holiday as hol
import re