Skip to content

Instantly share code, notes, and snippets.

@shreyas90999
shreyas90999 / bitbucket-pipelines.yml
Last active January 6, 2022 07:11
bitbucket deploy to VM
# This is a sample build configuration for docker app
# Check our guides at https://confluence.atlassian.com/x/e8YWN for more examples.
# Only use spaces to indent your .yml configuration.
# -----
# You can specify a custom docker image from Docker Hub as your build environment.
image: ubuntu
pipelines:
default:
- step:
@shreyas90999
shreyas90999 / lgbm_hyperopt_optimization.py
Last active September 19, 2024 18:17
Lgbm optimization using hyperopt
from hyperopt import hp, fmin, tpe, Trials, STATUS_OK
lgb_reg_params = {
'learning_rate': hp.uniform('learning_rate',0.1,1),
'max_depth': hp.choice('max_depth', np.arange(2, 100, 1, dtype=int)),
'min_child_weight': hp.choice('min_child_weight', np.arange(1, 50, 1, dtype=int)),
'colsample_bytree': hp.uniform('colsample_bytree',0.4,1),
'subsample': hp.uniform('subsample', 0.6, 1),
'num_leaves': hp.choice('num_leaves', np.arange(1, 200, 1, dtype=int)),
'min_split_gain': hp.uniform('min_split_gain', 0, 1),
# ref - https://www.kaggle.com/gspmoreira/cnn-glove-single-model-private-lb-0-41117-35th
def generate_cbs_stats(train,test):
df_group = train.groupby('cat_brand_ship',as_index = False).agg({"shipping" : len,
"log_price" : [np.median, np.mean, np.std,np.min,np.max]})
df_group.columns = ['cat_brand_ship','cbs_count','cbs_log_price_median','cbs_log_price_mean','cbs_log_price_std',
'cbs_log_price_min','cbs_log_price_max']
df_group['cbs_log_price_std'] = df_group['cbs_log_price_std'].fillna(0)
df_group['cbs_log_price_conf_variance'] = df_group['cbs_log_price_std'] / df_group['cbs_log_price_mean']
df_group['cbs_log_count'] = np.log1p(df_group['cbs_count'])
@shreyas90999
shreyas90999 / num_vectorizer.py
Last active October 6, 2020 03:49
This function numerical features from textual data. sentence-"We are selling 10 packs of 512gb SSD of XYZ company" vectorization will be{packs: 10 , gb: 512}
def num_feature(df,vectorizer_nums=None,scale=None,training=True):
"""
After some EDA and manually reviewing textual data I found out that there was lot of numerical information avaliable in text eg. 10ml , 2 packs, 10 lipsticks 512gb.
This numerical data had impact on the price of item. So this a function does numerical vectorization of this data.So we first find patterns like (10ml,160gb 2 packs..etc)
So now we have sentence "250ml 2 packs of xyz company" so here in "ml" column we had "250" and in packs column we add "2".At the end a sparse matrix is given out as output
"""
def get_featuers(phrase):
"""
this function finds all possible numercial patterns in training data and accordingly updates the dictionary.
"""
@shreyas90999
shreyas90999 / rmsle.py
Last active October 5, 2020 17:23
RMSLE error function
#RMSLE error function
def rmsle_error(y, y_pred):
assert len(y) == len(y_pred)
to_sum = [(math.log(y_pred[i] + 1) - math.log(y[i] + 1)) ** 2.0 for i,pred in enumerate(y_pred)]
return (sum(to_sum) * (1.0/len(y))) ** 0.5