Luca Massaron lmassaron

🦉

Running experiments

Data scientist, author of books on AI, machine learning, deep learning & Kaggle. GDE. Kaggle Competitions Grandmaster, previously 7th worldwide competition rank

363 followers · 16 following

View GitHub Profile

Recently created

Least recently created

Recently updated

Least recently updated

lmassaron / gist:35f91b5168f5c09f9d0493499f1390b3

Last active April 3, 2019 08:42

Explaining GBM

	https://xgboost.readthedocs.io/en/latest/tutorials/model.html
	https://towardsdatascience.com/entropy-how-decision-trees-make-decisions-2946b9c18c8
	https://github.com/Microsoft/LightGBM/issues/2062#issuecomment-477120125
	https://papers.nips.cc/paper/6907-lightgbm-a-highly-efficient-gradient-boosting-decision-tree.pdf
	https://explained.ai/gradient-boosting/
	https://www.youtube.com/watch?v=5CWwwtEM2TA

lmassaron / gist:22bd63516dd98d584aba9cbcb538ab1d

Last active May 8, 2019 12:20

On Retinanet

	https://uk.mathworks.com/help/vision/ug/faster-r-cnn-basics.html
	https://medium.com/@jonathan_hui/map-mean-average-precision-for-object-detection-45c121a31173
	https://medium.com/@14prakash/the-intuition-behind-retinanet-eb636755607d
	https://cv-tricks.com/object-detection/faster-r-cnn-yolo-ssd/
	https://towardsdatascience.com/retinanet-how-focal-loss-fixes-single-shot-detection-cb320e3bb0de
	https://medium.com/data-from-the-trenches/object-detection-with-deep-learning-on-aerial-imagery-2465078db8a9

	https://medium.com/deep-learning-journals/fast-scnn-explained-and-implemented-using-tensorflow-2-0-6bd17c17a49e

	https://github.com/Dharun/Tensorflow-License-Plate-Detection/blob/master/numplate_recognition_detection.py

lmassaron / target_encode

Last active September 13, 2024 05:16

Preprocessing scheme for high-cardinality categorical attributes

	def add_noise(series, noise_level):
	return series * (1 + noise_level * np.random.randn(len(series)))

	def target_encode(trn_series=None, tst_series=None, target=None, k=1, f=1, noise_level=0):
	"""
	Encoding is computed like in the following paper by:

	Micci-Barreca, Daniele. "A preprocessing scheme for high-cardinality categorical attributes in classification and prediction problems." ACM SIGKDD Explorations Newsletter 3.1 (2001): 27-32.

	trn_series (pd.Series) : categorical feature in-sample

lmassaron / ResNeXt_gan.py

Created August 8, 2019 15:01 — forked from mjdietzx/ResNeXt_gan.py

Keras/tensorflow implementation of GAN architecture where generator and discriminator networks are ResNeXt.

	from keras import layers
	from keras import models
	import tensorflow as tf


	#
	# generator input params
	#

	rand_dim = (1, 1, 2048) # dimension of the generator's input tensor (gaussian noise)

lmassaron / gist:0bce501423823ea857b9cd2375a93ccc

Created August 30, 2019 06:27

Creating a class for your model's hyper-parameters

	class AllMyFields:
	def __init__(self, dictionary):
	for k, v in dictionary.items():
	setattr(self, k, v)

	o = AllMyFields({'alpha': 1, 'beta': 2})

	o.a

lmassaron / gist:ee6f926e2fa3eb1fe204e47e1ae60c88

Last active September 5, 2021 07:04

Reduce memory usage of a pandas DataFrame

	# Derived from the original script https://www.kaggle.com/gemartin/load-data-reduce-memory-usage
	# by Guillaume Martin

	def reduce_mem_usage(df, verbose=True):
	numerics = ['int16', 'int32', 'int64', 'float16', 'float32', 'float64']
	start_mem = df.memory_usage().sum() / 1024**2
	for col in df.columns:
	col_type = df[col].dtypes
	if col_type in numerics:
	c_min = df[col].min()

lmassaron / gist:f4c00689ba2bab53c1fd7b5b63730a34

Created September 3, 2019 08:31

ClassifierTransformer

	from sklearn.base import BaseEstimator, TransformerMixin
	from sklearn.ensemble import RandomForestClassifier, ExtraTreesClassifier

	class ClassifierTransformer(BaseEstimator, TransformerMixin):
	"""
	Classifier's estimates of a regression problem using oof
	"""
	def __init__(self, estimator=None, n_classes=2, cv=3):
	self.estimator = estimator
	self.n_classes = n_classes

lmassaron / polyloss

Created April 29, 2022 21:33

	def poly1_cross_entropy(logits, labels, epsilon=1.0):
	# pt, CE, and Poly1 have shape [batch].
	pt = tf.reduce_sum(labels * tf.nn.softmax(logits), axis=-1)
	CE = tf.nn.softmax_cross_entropy_with_logits(labels, logits)
	Poly1 = CE + epsilon * (1 - pt)
	return Poly1

	def poly1_focal_loss(logits, labels, epsilon=1.0, gamma=2.0):
	# p, pt, FL, and Poly1 have shape [batch, num of classes].
	p = tf.math.sigmoid(logits)

lmassaron / gist:08022e925ae1f40d2e39bac6703a881e

Created October 18, 2022 06:14

0-1 transformation

	from scipy.stats import beta, norm
	import numpy as np

	data = np.array([0.0, 0.0, 0.1, 0.1, 0.2, 0.4, 0.5, 0.7, 0.8, 0.8, 0.9, 1.0, 1.0, 1.0])
	eps = 0.000001

	data[data==0.0] += eps
	data[data==1.0] -= eps

	a, b, loc, scale = beta.fit(data, floc=0, fscale=1)

lmassaron / gist:493384b4d84e941860b766069fa1101e

Created October 27, 2022 09:29

0-1 Beta regression

	dealing with zeros and ones in a beta regression
	------------------------------------------------
	Smithson, M. & Verkuilen, J.
	A better lemon squeezer? Maximum-likelihood regression with beta-distributed dependent variables.
	Psychol. Methods 11, 54–71 (2006).
	DOI: 10.1037/1082-989X.11.1.54

	https://stats.stackexchange.com/questions/31300/dealing-with-0-1-values-in-a-beta-regression

	zero-one inflated beta regression

Older Newer