Skip to content

Instantly share code, notes, and snippets.

@jamespaultg
jamespaultg / readme.txt
Created February 5, 2019 12:10
Installing R package from tar.gz files
For example to install birch package:
1. Download birch_1.2-3.tar.gz from https://cran.r-project.org/src/contrib/Archive/birch/.
2. Install rtools on windows following this http://jtleek.com/modules/01_DataScientistToolbox/02_10_rtools/#6
3. After installation, check if the windows environment variable PATH contains Rtools and gcc
You can check from within R by grepl("Rtools",Sys.getenv("PATH")). Should return TRUE. Or use grep("Rtools", strsplit(Sys.getenv("PATH"), ";")[[1]],value=TRUE) to display the path of Rtools if present in the PATH variable.
If not then use the following code to add the PATH.
# If the rtools path is not added in the environment variable PATH
@jamespaultg
jamespaultg / oracle_date_manipulations.sql
Created September 11, 2018 13:53
Handy sql scripts to get dates
--Get the last date of the previous year (31-Dec-yyyy)
select trunc(current_date, 'yyyy')-1 from dual;
-- Get the first date of the previous year (01-Jan-yyyy)
select add_months(trunc(current_date, 'yyyy'),-12) from dual;
@jamespaultg
jamespaultg / gist:0f62a21ea1481f18888783c4f1de91f9
Created August 22, 2018 09:17
Pip install in a proxy (corporate firewall)
pip3 install <package_name> --proxy <proxy_server:port>
@jamespaultg
jamespaultg / Venndiagram.R
Last active July 3, 2018 09:39
Venn diagram in R
# Imagine you have more than two sets and you would want to find the overlapping elements in different sets
# and you would like to see the overlap using VennDiagram
require(VennDiagram)
library(gplots)
library(reshape2)
# We have three different dataframes with the customer-id as Key, and some additional fields
set1 = data.frame(Key = c(100,200,300), place = c('NY','IS','AZ'))
set2 = data.frame(Key = c(200,300,400), val2 = c(12,12,53))
@jamespaultg
jamespaultg / VizTree.py
Created March 16, 2018 07:18
Visualise Decision Tree
# visualise the decision tree
from sklearn.externals.six import StringIO
from IPython.display import Image
from sklearn.tree import export_graphviz
import pydotplus
dot_data = StringIO()
# ensure that variable tree has the decision tree, and features contains the names of the features
@jamespaultg
jamespaultg / DecisionTree.py
Created March 16, 2018 07:15
Decision tree and feature importance
from sklearn.tree import DecisionTreeClassifier, export_graphviz
tree = DecisionTreeClassifier(max_depth=3,random_state=0)
tree.fit(X_train,y_train)
plt.figure(figsize=(20, 10))
indices = np.argsort(tree.feature_importances_)[::-1]
#indices = np.argsort(tree.feature_importances_)[::1]
# Visualise the importance of the features
# To get your top 10 feature names
features_sorted = []
@jamespaultg
jamespaultg / linearSVCgridsearch.py
Created March 16, 2018 07:12
Linear SVC grid search in Python
from sklearn.pipeline import Pipeline
from sklearn.svm import LinearSVC
from sklearn.model_selection import GridSearchCV
from sklearn.preprocessing import StandardScaler
SVCpipe = Pipeline([('scale', StandardScaler()),
('SVC',LinearSVC())])
# Gridsearch to determine the value of C
param_grid = {'SVC__C':np.arange(0.01,100,10)}
@jamespaultg
jamespaultg / logregCV.py
Created March 16, 2018 07:10
Logistic regression with Grid search in Python
# Logistic regression
from sklearn.pipeline import Pipeline
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import GridSearchCV
from sklearn.preprocessing import StandardScaler
logregpipe = Pipeline([('scale', StandardScaler()),
('logreg',LogisticRegression(multi_class="multinomial",solver="lbfgs"))])
# Gridsearch to determine the value of C