JamesG jamespaultg

ML practitioner

jamespaultg / readme.txt

Created February 5, 2019 12:10

Installing R package from tar.gz files

	For example to install birch package:

	1. Download birch_1.2-3.tar.gz from https://cran.r-project.org/src/contrib/Archive/birch/.

	2. Install rtools on windows following this http://jtleek.com/modules/01_DataScientistToolbox/02_10_rtools/#6
	3. After installation, check if the windows environment variable PATH contains Rtools and gcc
	You can check from within R by grepl("Rtools",Sys.getenv("PATH")). Should return TRUE. Or use grep("Rtools", strsplit(Sys.getenv("PATH"), ";")[[1]],value=TRUE) to display the path of Rtools if present in the PATH variable.
	If not then use the following code to add the PATH.

	# If the rtools path is not added in the environment variable PATH

jamespaultg / oracle_date_manipulations.sql

Created September 11, 2018 13:53

Handy sql scripts to get dates

	--Get the last date of the previous year (31-Dec-yyyy)
	select trunc(current_date, 'yyyy')-1 from dual;

	-- Get the first date of the previous year (01-Jan-yyyy)
	select add_months(trunc(current_date, 'yyyy'),-12) from dual;

jamespaultg / gist:0f62a21ea1481f18888783c4f1de91f9

Created August 22, 2018 09:17

Pip install in a proxy (corporate firewall)

pip3 install <package_name> --proxy <proxy_server:port>

jamespaultg / Venndiagram.R

Last active July 3, 2018 09:39

Venn diagram in R

	# Imagine you have more than two sets and you would want to find the overlapping elements in different sets
	# and you would like to see the overlap using VennDiagram
	require(VennDiagram)
	library(gplots)
	library(reshape2)


	# We have three different dataframes with the customer-id as Key, and some additional fields
	set1 = data.frame(Key = c(100,200,300), place = c('NY','IS','AZ'))
	set2 = data.frame(Key = c(200,300,400), val2 = c(12,12,53))

jamespaultg / VizTree.py

Created March 16, 2018 07:18

Visualise Decision Tree


	# visualise the decision tree

	from sklearn.externals.six import StringIO
	from IPython.display import Image
	from sklearn.tree import export_graphviz
	import pydotplus
	dot_data = StringIO()

	# ensure that variable tree has the decision tree, and features contains the names of the features

jamespaultg / DecisionTree.py

Created March 16, 2018 07:15

Decision tree and feature importance

	from sklearn.tree import DecisionTreeClassifier, export_graphviz
	tree = DecisionTreeClassifier(max_depth=3,random_state=0)
	tree.fit(X_train,y_train)
	plt.figure(figsize=(20, 10))
	indices = np.argsort(tree.feature_importances_)[::-1]
	#indices = np.argsort(tree.feature_importances_)[::1]

	# Visualise the importance of the features
	# To get your top 10 feature names
	features_sorted = []

jamespaultg / linearSVCgridsearch.py

Created March 16, 2018 07:12

Linear SVC grid search in Python

	from sklearn.pipeline import Pipeline
	from sklearn.svm import LinearSVC
	from sklearn.model_selection import GridSearchCV
	from sklearn.preprocessing import StandardScaler

	SVCpipe = Pipeline([('scale', StandardScaler()),
	('SVC',LinearSVC())])

	# Gridsearch to determine the value of C
	param_grid = {'SVC__C':np.arange(0.01,100,10)}

jamespaultg / logregCV.py

Created March 16, 2018 07:10

Logistic regression with Grid search in Python

	# Logistic regression
	from sklearn.pipeline import Pipeline
	from sklearn.linear_model import LogisticRegression
	from sklearn.model_selection import GridSearchCV
	from sklearn.preprocessing import StandardScaler

	logregpipe = Pipeline([('scale', StandardScaler()),
	('logreg',LogisticRegression(multi_class="multinomial",solver="lbfgs"))])

	# Gridsearch to determine the value of C