karimkhanp’s gists

karimkhanp / train.tsv

Last active December 18, 2016 05:34

training data (Numerical val prediction)

date	hr_of_day	vals
2014-05-01	0	72
2014-05-01	1	127
2014-05-01	2	277
2014-05-01	3	411
2014-05-01	4	666
2014-05-01	5	912
2014-05-01	6	1164
2014-05-01	7	1119
2014-05-01	8	951

karimkhanp / test.tsv

Created December 18, 2016 05:35

Test data (numerical val prediction)

date	hr_of_day	vals
2014-05-01	0	0
2014-05-01	1	0
2014-05-01	2	0
2014-05-01	3	0
2014-05-01	4	0
2014-05-01	5	0
2014-05-01	6	0
2014-05-01	7	0

karimkhanp / numerical_analysis

Created October 31, 2017 16:12

Steps and resources for numerical analysis

	Links

	http://bridgei2i.com/ebook/churn-propensity-model/#page/8

karimkhanp / nltk_functions.py

Created February 12, 2018 11:49

Contains various nltk function

	import sys

	"""
	NltkSentTokenize Class for all nltk sent tokenize
	"""
	class NltkSentTokenize(object):
	"""
	Initialization function of NltkSentTokenize Class
	"""
	def __init__(self):

karimkhanp / mlbasics

Last active June 22, 2018 14:01

Basics of Machine learning

	Regression: (https://www.quora.com/What-is-regression)
	Regression is the dependence of one variable over the other variable is termed as “Regression”. the statistical method which helps us to estimate the unknown value of one variable (dependent variable) from the known value of the related variable (independent variable) is called Regession

	Regression estimates the relationship among variables for prediction.
	Regression analysis helps to understand how the dependent variable changes when some of the independent variables are varied, while the other independent variables are held fixed.
	It determines the relationship between one dependent variable and a number of other independent variables.

	Linear Regression
	A Simple Linear Regression allows you to determine functional dependency between two sets of numbers. For example, we can use regression to determine the relation between ice cream sales and average temperature.

karimkhanp / nlp_preprocess.py

Created July 23, 2018 14:54

nlp pre processing using nltk

	import sys, pdb
	import nltk, pprint
	from nltk.tokenize import word_tokenize
	from nltk.tokenize import sent_tokenize
	from nlp_opn import CuriaNLP
	from mongo_op import MongoOperation

	"""
	NltkSentTokenize Class for all nltk sent tokenize
	"""

karimkhanp / datascience_internview_question

Last active December 27, 2021 13:57

Data science interview question

	https://www.simplilearn.com/tutorials/deep-learning-tutorial/deep-learning-interview-questions
	https://www.javatpoint.com/deep-learning-interview-questions

	Difference between training, dev and test set
	A training dataset is a dataset of examples used during the learning process and is used to fit the parameters (e.g., weights) of, for example, a classifier.[7][8]
	Dev/Validation : A validation dataset is a dataset of examples used to tune the hyperparameters (i.e. the architecture) of a classifier. It is sometimes also called the development set or the "dev set". An example of a hyperparameter for artificial neural networks includes the number of hidden units in each layer.
	A test dataset is a dataset that is independent of the training dataset, but that follows the same probability distribution as the training dataset.

	What is bias?
	Bias is the difference between the average prediction of our model and the correct value which we are trying to predict. Model with high bias pays very little attent

karimkhanp / opennmt_cheatshit

Created October 8, 2018 07:42

Opennmt documentation

About Perplexity - https://planspace.org/2013/09/23/perplexity-what-it-is-and-what-yours-is/

karimkhanp / hexatext

Last active December 14, 2018 13:11

How to handle hexacode issue in python while dealing with non english text. This problem usually occurs when you are dealing with non-english news tweets or similar sort of data.

	a = u'\xd8\xad\xd9\x83\xd9\x88\xd9\x85\xd8\xa9 \xd9\x85\xd8\xad\xd9\x85\xd8\xaf \xd8\xa8\xd9\x86 \xd8\xb3\xd9\x84\xd9\x85\xd8\xa7\xd9\x86 \xd8\xa3\xd9\x86\xd9\x81\xd9\x82\xd8\xaa \xd9\x85\xd9\x84\xd9\x8a\xd8\xa7\xd8\xb1\xd8\xa7\xd8\xaa \xd8\xa7\xd9\x84\xd8\xaf\xd9\x88\xd9\x84\xd8\xa7\xd8\xb1\xd8\xa7\xd8\xaa \xd9\x84\xd8\xaf\xd8\xb9\xd9\x85 \xd8\xb3\xd9\x88\xd9\x82 \xd8\xa7\xd9\x84\xd8\xa3\xd8\xb3\xd9\x87\xd9\x85 \xd8\xa7\xd9\x84\xd9\x85\xd8\xad\xd9\x84\xd9\x8a\xd8\xa9 \xd9\x88\xd9\x85\xd9\x88\xd8\xa7\xd8\xac\xd9\x87\xd8\xa9 \xd9\x85\xd9\x88\xd8\xac\xd8\xa7\xd8\xaa \xd8\xa7\xd9\x84\xd8\xa8\xd9\x8a\xd8\xb9 \xd8\xa8\xd8\xb9\xd8\xaf \xd9\x85\xd9\x82\xd8\xaa\xd9\x84\xe2\x80\xa64'
	def convert(s):
	try:
	return s.group(0).encode('latin1').decode('utf8')
	except:
	return s.group(0)

	a = re.sub(r'[\x80-\xFF]+', convert, a)
	print a.encode('utf8')

karimkhanp / stopwordlist

Created December 17, 2018 06:58

List of stopwords for different languages

https://www.ranks.nl/stopwords

Karimkhan karimkhanp