George Vyshnya gvyshnya

A Data Scientist & Software Dev with blended industrial experience in software development, IT, DevOps, operation and project management, and C-level roles

22 followers · 24 following

Kyiv, Ukraine - Warsaw, Poland
www.linkedin.com/in/gvyshnya, www.kaggle.com/gvyshnya

View GitHub Profile

Recently created

Least recently created

Recently updated

Least recently updated

gvyshnya / Datetime-related feature engineering

Created July 7, 2020 08:25

Demo on how to do Datetime-related and trend/lag feature engineering (with COVID-19 pandemic data as a case study)

gvyshnya / PdPipe Pipeline Example

Created July 1, 2020 09:52

	import numpy as np
	import pandas as pd
	import pdpipe as pdp

	# ... data reading code goes here

	# set up a transformation pipeline
	pipeline_1 = pdp.ApplyByCols(
	['lat', 'lon', 'lat_inspection_location', 'lon_inspection_location'],
	lambda col: pd.to_numeric(col)

gvyshnya / gbm_model_dense.py

Created December 17, 2017 09:36

GBM prediction model, dense input

	# GBM prediction

	import numpy as np
	import pandas as pd
	from sklearn import *
	import datetime as dt

	def RMSLE(y, pred):
	return metrics.mean_squared_error(y, pred) ** 0.5

gvyshnya / gbm_model_sparse.py

Created December 17, 2017 09:33

gbm prediction model, sparse data input

	# GBM prediction

	import numpy as np
	import pandas as pd
	from sklearn import *
	import datetime as dt

	def RMSLE(y, pred):
	return metrics.mean_squared_error(y, pred) ** 0.5

gvyshnya / sparsity.py

Created December 17, 2017 09:28

The code to test the sparsity of your data input

	def sparsity_ratio(X):
	return 1.0 - np.count_nonzero(X) / float(X.shape[0] * X.shape[1])
	print("input sparsity ratio:", sparsity_ratio(X))

gvyshnya / preprocessing.py

Created December 8, 2017 11:26

Basic pre-processing and feature engineering script for Recruit Restaurant Visitor Forecasting contest

	# Project/Contest: Recruit Restaurant Visitor Forecasting (https://www.kaggle.com/c/recruit-restaurant-visitor-forecasting)
	#
	# Summary: this is a basic pre-processing and feature engineering script to transform original input data from the Customer
	# into a ready-for modelling training and testing sets
	#
	# inspirations:
	# - https://www.kaggle.com/the1owl/surprise-me/

	import numpy as np
	import pandas as pd

gvyshnya / ProphetModeller.py: Class wrapper over phrophet TS forecasting library

Created December 1, 2017 18:10

	#!/usr/bin/python
	import pandas as pd
	import numpy as np
	import fbprophet as fbpro
	import sklearn.metrics as skm
	import math
	import datetime as dt


	class ProphetModeller(object):

gvyshnya / gist:bc69fe987fa49f34de98af67d99ee684

Created November 21, 2017 06:37

xgboost running with tree_method = 'hist'

	import xgboost as xgb
	import numpy as np
	from sklearn.datasets import load_digits
	from sklearn.cross_validation import train_test_split

	rng = np.random.RandomState(1994)

	digits = load_digits(2)
	X = digits['data']
	y = digits['target']

gvyshnya / dvc repro code

Created August 20, 2017 19:22

DVC repro command power

	# Improve ensemble configuration
	$ vi code/config.R

	# Commit all the changes.
	$ git commit -am "Updated weights of the models in the ensemble"

	# Reproduce the ensemble prediction
	$ dvc repro data/submission_ensemble.csv

gvyshnya / mad_with_holidays.py

Created August 8, 2017 23:44

Simple benchmark prediction of Wikipedia traffic with median (median by page, weekdays, and holidays) and consistent Holidays management

	# Project/Competition: https://www.kaggle.com/c/web-traffic-time-series-forecasting/
	# Simple benchmark prediction with median (median by page, weekdays, and holidays)
	#
	# - You should insall Workalendar from its github repo directly
	# >>> pip install git+https://github.com/novafloss/workalendar.git


	import pandas as pd
	import pandas.tseries.holiday as hol
	import re

Newer Older

	# COVID-19 case study

	# import packages
	import pandas as pd
	import pdpipe as pdp
	import numpy as np

	from sklearn import preprocessing
	import time
	from datetime import datetime