This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| ## generate integer-encoded categoricals | |
| for SIZE in 1; do | |
| time R --vanilla --quiet << EOF | |
| library(data.table) | |
| d1 <- as.data.frame(fread("train-${SIZE}m.csv")) | |
| d2 <- as.data.frame(fread("test.csv")) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| spark-1.3.0-bin-hadoop2.4/bin/spark-shell --driver-memory 100G --executor-memory 100G | |
| import org.apache.spark.mllib.regression.LabeledPoint | |
| import org.apache.spark.mllib.linalg.Vectors | |
| import org.apache.spark.mllib.tree.RandomForest | |
| import org.apache.spark.mllib.evaluation.BinaryClassificationMetrics | |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| ## get the data | |
| for yr in 2005 2006 2007; do | |
| wget http://stat-computing.org/dataexpo/2009/$yr.csv.bz2 | |
| bunzip2 $yr.csv.bz2 | |
| done | |
| ## install R and data.table |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| import numpy as np | |
| from scipy.stats import chi2 | |
| from sklearn.ensemble import RandomForestClassifier | |
| n = 1000 | |
| p = 100 | |
| def genr_data(n,p): | |
| X = np.random.randn(n,p) | |
| y = np.zeros(n) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| library(randomForest) | |
| library(parallel) | |
| genr_data <- function(n,p) { | |
| X <- matrix(rnorm(n*p),n,p) | |
| y <- as.factor(apply(X,1, function(x) | |
| ifelse(sum(x^2)>qchisq(0.5,p),"+","-"))) | |
| ## Hastie etal 10.2 | |
| data.frame(X,y) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| abbr <- validinp_character(input$inp_abbr) | |
| ## or: | |
| abbr <- validinp_character(input$inp_abbr, pattern="^((CA)|(NY))$") |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| abbr <- input$inp_abbr | |
| dbGetQuery(con, paste0("select * from states where abbr = '",abbr,"'")) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| ## Minimal example of R's data.table vs pandas aggregation and join benchmark | |
| ## ( more detailed but still basic benchmark here: | |
| ## http://datascience.la/dplyr-and-a-very-basic-benchmark/ ) | |
| ## Just copy paste into R and Ipython, respectively | |
| ## Timings on a decent server with data.table 1.9.4 & pandas 0.15.1 (Nov 2014) | |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| library(dplyr) | |
| read.csv("members_export.csv") %>% | |
| mutate(pcrnk = percent_rank(CONFIRM_TIME), wdbl = ifelse(LinkedIn!="",2,1), | |
| w = (3-2*pcrnk)*wdbl) %>% | |
| sample_n(30, weight = w) %>% | |
| mutate(name = paste(First.Name,Last.Name)) %>% select(name) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| --- | |
| runtime: shiny | |
| output: html_document | |
| --- | |
| ```{r, echo=FALSE} | |
| inputPanel( | |
| selectInput("n", label = "n:", choices = c(10,50)), | |
| submitButton("Update") |