Shreyas Kulkarni shreyas90999

2 followers · 4 following

Quixy
India,Mumbai
https://www.linkedin.com/in/shreyaskulkarnii/

View GitHub Profile

Recently created

Least recently created

Recently updated

Least recently updated

shreyas90999 / rmsle.py

Last active October 5, 2020 17:23

RMSLE error function

	#RMSLE error function
	def rmsle_error(y, y_pred):
	assert len(y) == len(y_pred)
	to_sum = [(math.log(y_pred[i] + 1) - math.log(y[i] + 1)) ** 2.0 for i,pred in enumerate(y_pred)]
	return (sum(to_sum) * (1.0/len(y))) ** 0.5

shreyas90999 / num_vectorizer.py

Last active October 6, 2020 03:49

This function numerical features from textual data. sentence-"We are selling 10 packs of 512gb SSD of XYZ company" vectorization will be{packs: 10 , gb: 512}

	def num_feature(df,vectorizer_nums=None,scale=None,training=True):
	"""
	After some EDA and manually reviewing textual data I found out that there was lot of numerical information avaliable in text eg. 10ml , 2 packs, 10 lipsticks 512gb.
	This numerical data had impact on the price of item. So this a function does numerical vectorization of this data.So we first find patterns like (10ml,160gb 2 packs..etc)
	So now we have sentence "250ml 2 packs of xyz company" so here in "ml" column we had "250" and in packs column we add "2".At the end a sparse matrix is given out as output
	"""
	def get_featuers(phrase):
	"""
	this function finds all possible numercial patterns in training data and accordingly updates the dictionary.
	"""

shreyas90999 / stats.py

Created October 6, 2020 03:51

stats

	# ref - https://www.kaggle.com/gspmoreira/cnn-glove-single-model-private-lb-0-41117-35th
	def generate_cbs_stats(train,test):
	df_group = train.groupby('cat_brand_ship',as_index = False).agg({"shipping" : len,
	"log_price" : [np.median, np.mean, np.std,np.min,np.max]})
	df_group.columns = ['cat_brand_ship','cbs_count','cbs_log_price_median','cbs_log_price_mean','cbs_log_price_std',
	'cbs_log_price_min','cbs_log_price_max']

	df_group['cbs_log_price_std'] = df_group['cbs_log_price_std'].fillna(0)
	df_group['cbs_log_price_conf_variance'] = df_group['cbs_log_price_std'] / df_group['cbs_log_price_mean']
	df_group['cbs_log_count'] = np.log1p(df_group['cbs_count'])

shreyas90999 / lgbm_hyperopt_optimization.py

Last active September 19, 2024 18:17

Lgbm optimization using hyperopt


	from hyperopt import hp, fmin, tpe, Trials, STATUS_OK
	lgb_reg_params = {
	'learning_rate': hp.uniform('learning_rate',0.1,1),
	'max_depth': hp.choice('max_depth', np.arange(2, 100, 1, dtype=int)),
	'min_child_weight': hp.choice('min_child_weight', np.arange(1, 50, 1, dtype=int)),
	'colsample_bytree': hp.uniform('colsample_bytree',0.4,1),
	'subsample': hp.uniform('subsample', 0.6, 1),
	'num_leaves': hp.choice('num_leaves', np.arange(1, 200, 1, dtype=int)),
	'min_split_gain': hp.uniform('min_split_gain', 0, 1),

shreyas90999 / bitbucket-pipelines.yml

Last active January 6, 2022 07:11

bitbucket deploy to VM

	# This is a sample build configuration for docker app
	# Check our guides at https://confluence.atlassian.com/x/e8YWN for more examples.
	# Only use spaces to indent your .yml configuration.
	# -----
	# You can specify a custom docker image from Docker Hub as your build environment.
	image: ubuntu

	pipelines:
	default:
	- step: