Skip to content

Instantly share code, notes, and snippets.

View karimkhanp's full-sized avatar

Karimkhan karimkhanp

View GitHub Profile
@karimkhanp
karimkhanp / train.tsv
Last active December 18, 2016 05:34
training data (Numerical val prediction)
date hr_of_day vals
2014-05-01 0 72
2014-05-01 1 127
2014-05-01 2 277
2014-05-01 3 411
2014-05-01 4 666
2014-05-01 5 912
2014-05-01 6 1164
2014-05-01 7 1119
2014-05-01 8 951
@karimkhanp
karimkhanp / test.tsv
Created December 18, 2016 05:35
Test data (numerical val prediction)
date hr_of_day vals
2014-05-01 0 0
2014-05-01 1 0
2014-05-01 2 0
2014-05-01 3 0
2014-05-01 4 0
2014-05-01 5 0
2014-05-01 6 0
2014-05-01 7 0
@karimkhanp
karimkhanp / numerical_analysis
Created October 31, 2017 16:12
Steps and resources for numerical analysis
Links
http://bridgei2i.com/ebook/churn-propensity-model/#page/8
@karimkhanp
karimkhanp / nltk_functions.py
Created February 12, 2018 11:49
Contains various nltk function
import sys
"""
NltkSentTokenize Class for all nltk sent tokenize
"""
class NltkSentTokenize(object):
"""
Initialization function of NltkSentTokenize Class
"""
def __init__(self):
@karimkhanp
karimkhanp / mlbasics
Last active June 22, 2018 14:01
Basics of Machine learning
Regression: (https://www.quora.com/What-is-regression)
Regression is the dependence of one variable over the other variable is termed as “Regression”. the statistical method which helps us to estimate the unknown value of one variable (dependent variable) from the known value of the related variable (independent variable) is called Regession
Regression estimates the relationship among variables for prediction.
Regression analysis helps to understand how the dependent variable changes when some of the independent variables are varied, while the other independent variables are held fixed.
It determines the relationship between one dependent variable and a number of other independent variables.
Linear Regression
A Simple Linear Regression allows you to determine functional dependency between two sets of numbers. For example, we can use regression to determine the relation between ice cream sales and average temperature.
@karimkhanp
karimkhanp / nlp_preprocess.py
Created July 23, 2018 14:54
nlp pre processing using nltk
import sys, pdb
import nltk, pprint
from nltk.tokenize import word_tokenize
from nltk.tokenize import sent_tokenize
from nlp_opn import CuriaNLP
from mongo_op import MongoOperation
"""
NltkSentTokenize Class for all nltk sent tokenize
"""
@karimkhanp
karimkhanp / datascience_internview_question
Last active December 27, 2021 13:57
Data science interview question
https://www.simplilearn.com/tutorials/deep-learning-tutorial/deep-learning-interview-questions
https://www.javatpoint.com/deep-learning-interview-questions
Difference between training, dev and test set
A training dataset is a dataset of examples used during the learning process and is used to fit the parameters (e.g., weights) of, for example, a classifier.[7][8]
Dev/Validation : A validation dataset is a dataset of examples used to tune the hyperparameters (i.e. the architecture) of a classifier. It is sometimes also called the development set or the "dev set". An example of a hyperparameter for artificial neural networks includes the number of hidden units in each layer.
A test dataset is a dataset that is independent of the training dataset, but that follows the same probability distribution as the training dataset.
What is bias?
Bias is the difference between the average prediction of our model and the correct value which we are trying to predict. Model with high bias pays very little attent
@karimkhanp
karimkhanp / opennmt_cheatshit
Created October 8, 2018 07:42
Opennmt documentation
About Perplexity - https://planspace.org/2013/09/23/perplexity-what-it-is-and-what-yours-is/
@karimkhanp
karimkhanp / hexatext
Last active December 14, 2018 13:11
How to handle hexacode issue in python while dealing with non english text. This problem usually occurs when you are dealing with non-english news tweets or similar sort of data.
a = u'\xd8\xad\xd9\x83\xd9\x88\xd9\x85\xd8\xa9 \xd9\x85\xd8\xad\xd9\x85\xd8\xaf \xd8\xa8\xd9\x86 \xd8\xb3\xd9\x84\xd9\x85\xd8\xa7\xd9\x86 \xd8\xa3\xd9\x86\xd9\x81\xd9\x82\xd8\xaa \xd9\x85\xd9\x84\xd9\x8a\xd8\xa7\xd8\xb1\xd8\xa7\xd8\xaa \xd8\xa7\xd9\x84\xd8\xaf\xd9\x88\xd9\x84\xd8\xa7\xd8\xb1\xd8\xa7\xd8\xaa \xd9\x84\xd8\xaf\xd8\xb9\xd9\x85 \xd8\xb3\xd9\x88\xd9\x82 \xd8\xa7\xd9\x84\xd8\xa3\xd8\xb3\xd9\x87\xd9\x85 \xd8\xa7\xd9\x84\xd9\x85\xd8\xad\xd9\x84\xd9\x8a\xd8\xa9 \xd9\x88\xd9\x85\xd9\x88\xd8\xa7\xd8\xac\xd9\x87\xd8\xa9 \xd9\x85\xd9\x88\xd8\xac\xd8\xa7\xd8\xaa \xd8\xa7\xd9\x84\xd8\xa8\xd9\x8a\xd8\xb9 \xd8\xa8\xd8\xb9\xd8\xaf \xd9\x85\xd9\x82\xd8\xaa\xd9\x84\xe2\x80\xa64'
def convert(s):
try:
return s.group(0).encode('latin1').decode('utf8')
except:
return s.group(0)
a = re.sub(r'[\x80-\xFF]+', convert, a)
print a.encode('utf8')
@karimkhanp
karimkhanp / stopwordlist
Created December 17, 2018 06:58
List of stopwords for different languages
https://www.ranks.nl/stopwords