This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
install.packages("devtools") | |
devtools::install_github("rstudio/sparklyr") | |
library(sparklyr) | |
# check spark installed versions | |
spark_installed_versions() | |
# install spark versions | |
spark_install(version = "1.6.2") | |
spark_install(version = "2.0.0") |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# Load sparlyr library in R environment | |
library(sparklyr) | |
# connecting to spark local cluster | |
sc <- spark_connect(master = "local", version="2.1.0") | |
# print the spark version | |
spark_version(sc) | |
# check data tables in spark local cluster | |
src_tbls(sc) # If no table copied in local cluster, then NULL or character(0) will be returned | |
# Copy data to spark local instance | |
flights_tbl <- copy_to(sc, nycflights13::flights, "flights", overwrite = TRUE) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Wine | Alcohol | Malic.acid | Ash | Acl | Mg | Phenols | Flavanoids | Nonflavanoid.phenols | Proanth | Color.int | Hue | OD | Proline | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | 14.23 | 1.71 | 2.43 | 15.6 | 127 | 2.8 | 3.06 | .28 | 2.29 | 5.64 | 1.04 | 3.92 | 1065 | |
1 | 13.2 | 1.78 | 2.14 | 11.2 | 100 | 2.65 | 2.76 | .26 | 1.28 | 4.38 | 1.05 | 3.4 | 1050 | |
1 | 13.16 | 2.36 | 2.67 | 18.6 | 101 | 2.8 | 3.24 | .3 | 2.81 | 5.68 | 1.03 | 3.17 | 1185 | |
1 | 14.37 | 1.95 | 2.5 | 16.8 | 113 | 3.85 | 3.49 | .24 | 2.18 | 7.8 | .86 | 3.45 | 1480 | |
1 | 13.24 | 2.59 | 2.87 | 21 | 118 | 2.8 | 2.69 | .39 | 1.82 | 4.32 | 1.04 | 2.93 | 735 | |
1 | 14.2 | 1.76 | 2.45 | 15.2 | 112 | 3.27 | 3.39 | .34 | 1.97 | 6.75 | 1.05 | 2.85 | 1450 | |
1 | 14.39 | 1.87 | 2.45 | 14.6 | 96 | 2.5 | 2.52 | .3 | 1.98 | 5.25 | 1.02 | 3.58 | 1290 | |
1 | 14.06 | 2.15 | 2.61 | 17.6 | 121 | 2.6 | 2.51 | .31 | 1.25 | 5.05 | 1.06 | 3.58 | 1295 | |
1 | 14.83 | 1.64 | 2.17 | 14 | 97 | 2.8 | 2.98 | .29 | 1.98 | 5.2 | 1.08 | 2.85 | 1045 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
library(plyr) # for create_progress_bar() | |
library(randomForest) | |
data <- iris | |
# in this cross validation example, we use the iris data set to | |
# predict the Sepal Length from the other variables in the dataset | |
# with the random forest model | |
k = 5 #Folds |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#Randomly shuffle the data | |
yourdata<-yourdata[sample(nrow(yourdata)),] | |
#Create 10 equally size folds | |
folds <- cut(seq(1,nrow(yourdata)),breaks=10,labels=FALSE) | |
#Perform 10 fold cross validation | |
for(i in 1:10){ | |
#Segement your data by fold using the which() function | |
testIndexes <- which(folds==i,arr.ind=TRUE) | |
testData <- yourdata[testIndexes, ] | |
trainData <- yourdata[-testIndexes, ] |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Say I have | |
x=data.frame(q=1,w=2,e=3, ...and many many columns...) | |
what is the most elegant way to rename an arbitrary subset of columns, whose position I don't necessarily know, into some other arbitrary names? | |
e.g. Say I want to rename "q" and "e" into "A" and "B", what is the most elegant code to do this? | |
Obviously, I can do a loop | |
oldnames=c("q","e") |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
library(data.table) | |
DT[,coltodelete:=NULL] | |
# OR | |
DT[,c("col1","col20"):=NULL] | |
# OR | |
DT[,(125:135):=NULL] | |
# OR | |
DT[,(variableHoldingNamesOrNumbers):=NULL] |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# Method 1 | |
# Merge these four separate data frame into a single table on SCHCD | |
x.data<- join(enrolrep.data, facilty.data, by=c("SCHCD")) | |
y.data<- join(basic.data, x.data, by=c("SCHCD")) | |
master.data<- join(x.data, teachr.data, by=c("SCHCD")) | |
# Method 2: using setDT() of data.table(). I have to find out how its done in setDT() |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
> detach("package:mice", unload=TRUE) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# The below code is adoped from the answer by user `Thomas` posted on StackOverflow https://stackoverflow.com/questions/26573368/uninstall-remove-r-package-with-dependencies | |
library("tools") | |
removeDepends <- function(pkg, recursive = FALSE){ | |
d <- package_dependencies(,installed.packages(), recursive = recursive) | |
depends <- if(!is.null(d[[pkg]])) d[[pkg]] else character() | |
needed <- unique(unlist(d[!names(d) %in% c(pkg,depends)])) | |
toRemove <- depends[!depends %in% needed] | |
if(length(toRemove)){ | |
toRemove <- select.list(c(pkg,sort(toRemove)), multiple = TRUE, |
OlderNewer