This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| For example to install birch package: | |
| 1. Download birch_1.2-3.tar.gz from https://cran.r-project.org/src/contrib/Archive/birch/. | |
| 2. Install rtools on windows following this http://jtleek.com/modules/01_DataScientistToolbox/02_10_rtools/#6 | |
| 3. After installation, check if the windows environment variable PATH contains Rtools and gcc | |
| You can check from within R by grepl("Rtools",Sys.getenv("PATH")). Should return TRUE. Or use grep("Rtools", strsplit(Sys.getenv("PATH"), ";")[[1]],value=TRUE) to display the path of Rtools if present in the PATH variable. | |
| If not then use the following code to add the PATH. | |
| # If the rtools path is not added in the environment variable PATH |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| --Get the last date of the previous year (31-Dec-yyyy) | |
| select trunc(current_date, 'yyyy')-1 from dual; | |
| -- Get the first date of the previous year (01-Jan-yyyy) | |
| select add_months(trunc(current_date, 'yyyy'),-12) from dual; |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| pip3 install <package_name> --proxy <proxy_server:port> |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| # Imagine you have more than two sets and you would want to find the overlapping elements in different sets | |
| # and you would like to see the overlap using VennDiagram | |
| require(VennDiagram) | |
| library(gplots) | |
| library(reshape2) | |
| # We have three different dataframes with the customer-id as Key, and some additional fields | |
| set1 = data.frame(Key = c(100,200,300), place = c('NY','IS','AZ')) | |
| set2 = data.frame(Key = c(200,300,400), val2 = c(12,12,53)) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| # visualise the decision tree | |
| from sklearn.externals.six import StringIO | |
| from IPython.display import Image | |
| from sklearn.tree import export_graphviz | |
| import pydotplus | |
| dot_data = StringIO() | |
| # ensure that variable tree has the decision tree, and features contains the names of the features |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| from sklearn.tree import DecisionTreeClassifier, export_graphviz | |
| tree = DecisionTreeClassifier(max_depth=3,random_state=0) | |
| tree.fit(X_train,y_train) | |
| plt.figure(figsize=(20, 10)) | |
| indices = np.argsort(tree.feature_importances_)[::-1] | |
| #indices = np.argsort(tree.feature_importances_)[::1] | |
| # Visualise the importance of the features | |
| # To get your top 10 feature names | |
| features_sorted = [] |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| from sklearn.pipeline import Pipeline | |
| from sklearn.svm import LinearSVC | |
| from sklearn.model_selection import GridSearchCV | |
| from sklearn.preprocessing import StandardScaler | |
| SVCpipe = Pipeline([('scale', StandardScaler()), | |
| ('SVC',LinearSVC())]) | |
| # Gridsearch to determine the value of C | |
| param_grid = {'SVC__C':np.arange(0.01,100,10)} |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| # Logistic regression | |
| from sklearn.pipeline import Pipeline | |
| from sklearn.linear_model import LogisticRegression | |
| from sklearn.model_selection import GridSearchCV | |
| from sklearn.preprocessing import StandardScaler | |
| logregpipe = Pipeline([('scale', StandardScaler()), | |
| ('logreg',LogisticRegression(multi_class="multinomial",solver="lbfgs"))]) | |
| # Gridsearch to determine the value of C |
NewerOlder