tmastny · June 7, 2018 20:33
diff --git a/omahar_nse_presentation.Rmd b/omahar_nse_presentation.Rmd
 ---
 title: "Standard Non-Standard Evaluation"
 output: html_notebook
 ---


 ```{r}
 library(tidyverse)
 ```

 ```{r}
 mtcars
 ```
 ```{r}
 mtcars %>%
  summarise(avg = mean(mpg))
 ```

 ```{r}
 mtcars %>%
  summarise(avg = mean(hp))
 ```

 ```{r}
 mtcars %>%
  summarise(avg = mean(wt))
 ```

 ## Hadley's Rule

 Repeat yourself three times? Time for a function

 ```{r}
 meaner <- function(data, column) {
  data %>%
    summarise(avg = mean(column))
 }
 ```

 ```{r}
 meaner(mtcars, hp)
 ```

 ```{r}
 hp
 ```

 ```{r}
 meaner(mtcars, "hp")
 ```

 ```{r}
 mtcars %>%
  summarise(avg = mean("hp"))
 ```

 ```{r}
 meaner2 <- function(data, column) {
  column <- enquo(column)
  data %>%
    summarise(avg = mean(!!column))
 }
 ```

 ```{r}
 meaner2(mtcars, hp)
 ```

 ## Enquo Magic!!!

 `!!` and `enquo` magic worked!

 Why?




 ## Names and Variables

 ```{r}
 foo <- 4
 ```


 The name of the variable is `foo`, but the value is 4


 But what about `hp`?

 ```{r}
 mtcars %>%
  summarise(avg = mean(hp))
 ```

 `hp` is not a variable. It has no value.

 ```{r}
 hp
 ```

 But the `hp` has meaning in the context of the `mtcars`:

 ```{r}
 with(mtcars, mean(hp))
 ```

 ```{r}
 mean(mtcars$hp)
 ```


 So in the tidyverse, names have special meaning.



 So why doesn't this work?

 ```{r}
 meaner <- function(data, column) {
  data %>%
    summarise(avg = mean(column))
 }
 ```
 ```{r}
 meaner(mtcars, hp)
 ```

 Because the tidyverse `summarise` is looking for the column named `column`.




 Tidyverse only cares about *names* not *values*.

 ```{r}
 meaner2 <- function(data, column) {
  column <- enquo(column)
  data %>%
    summarise(avg = mean(!!column))
 }
 ```
 ```{r}
 meaner2(mtcars, hp)
 ```
 `enquo` looks for the *name* typed in by the user: `hp`


 `!!` tells tidyverse functions to use the *value* (which is a name)




 ## Why is it so complicated???

 * most other programming languages don't use it:

 > Zen of Python: Explicit is better than implicit

 * progammers are taught to use variable *values* not *names*





 ## What are the benefits?

 * Less typing during interactive programming

 ```{r}
 mtcars %>%
  filter(cyl == 6, mpg > 20, am == 1)
 ```
 vs.
 ```{r}
 mtcars[mtcars$cyl == 6 & mtcars$mpg > 20 & mtcars$am == 1, ]
 ```


 * Can use complicated expressions in functions

 ```{r}
 mtcars %>%
  transmute(standardized = (hp - mean(hp)/sd(hp)))
 ```

 ```{r}
 standardizer <- function(data, col) {
  col <- enquo(col)
  data %>%
    transmute(standardized = (!!col - mean(!!col)/sd(!!col)))
 }
 ```
 ```{r}
 standardizer(mtcars, hp)
 ```

 * Can use `!!` + `enquo` like normal variables:
  - multiply, divide
  - built-in functions







 ## What about strings?




 I want this:

 ```{r}
 mtcars %>%
  group_by(cyl) %>%
  summarise(count = n())
 ```

 By using a string:

 ```{r}
 column <- "cyl"
 mtcars %>%
  group_by(column) %>%
  summarise(count = n())
 ```

 Same problem as before:
 - tidyverse uses the *name* not the *value*

 There is no column with *name* `column`







 Does `enquo` + `!!` work?

 ```{r}
 enquoed_column <- enquo(column)
 mtcars %>%
  group_by(!!column) %>%
  summarise(count = n())
 ```

 No.



 `enquo` only works for *names*. 






 We want the *value* `"cyl"` to be converted to a *name*.




 Introducing: `sym`

 ```{r}
 symed_column <- sym(column)
 mtcars %>%
  group_by(!!symed_column) %>%
  summarise(count = n())
 ```

 `!!` stays the same

 We use `sym` instead of `enquo` if we want to turn 
 a *value* into a *name*




 We can also put this into a function

 ```{r}
 grouper <- function(data, col) {
  col <- sym(col)
  data %>%
    group_by(!!col) %>%
    summarise(count = n())
 }
 ```
 ```{r}
 grouper(mtcars, "cyl")
 ```



 ## Any questions?

 ### Any functions you want to see?
	---
	title: "Standard Non-Standard Evaluation"
	output: html_notebook
	---


	```{r}
	library(tidyverse)
	```

	```{r}
	mtcars
	```
	```{r}
	mtcars %>%
	summarise(avg = mean(mpg))
	```

	```{r}
	mtcars %>%
	summarise(avg = mean(hp))
	```

	```{r}
	mtcars %>%
	summarise(avg = mean(wt))
	```

	## Hadley's Rule

	Repeat yourself three times? Time for a function

	```{r}
	meaner <- function(data, column) {
	data %>%
	summarise(avg = mean(column))
	}
	```

	```{r}
	meaner(mtcars, hp)
	```

	```{r}
	hp
	```

	```{r}
	meaner(mtcars, "hp")
	```

	```{r}
	mtcars %>%
	summarise(avg = mean("hp"))
	```

	```{r}
	meaner2 <- function(data, column) {
	column <- enquo(column)
	data %>%
	summarise(avg = mean(!!column))
	}
	```

	```{r}
	meaner2(mtcars, hp)
	```

	## Enquo Magic!!!

	`!!` and `enquo` magic worked!

	Why?




	## Names and Variables

	```{r}
	foo <- 4
	```


	The name of the variable is `foo`, but the value is 4


	But what about `hp`?

	```{r}
	mtcars %>%
	summarise(avg = mean(hp))
	```

	`hp` is not a variable. It has no value.

	```{r}
	hp
	```

	But the `hp` has meaning in the context of the `mtcars`:

	```{r}
	with(mtcars, mean(hp))
	```

	```{r}
	mean(mtcars$hp)
	```


	So in the tidyverse, names have special meaning.



	So why doesn't this work?

	```{r}
	meaner <- function(data, column) {
	data %>%
	summarise(avg = mean(column))
	}
	```
	```{r}
	meaner(mtcars, hp)
	```

	Because the tidyverse `summarise` is looking for the column named `column`.




	Tidyverse only cares about names not values.

	```{r}
	meaner2 <- function(data, column) {
	column <- enquo(column)
	data %>%
	summarise(avg = mean(!!column))
	}
	```
	```{r}
	meaner2(mtcars, hp)
	```
	`enquo` looks for the name typed in by the user: `hp`


	`!!` tells tidyverse functions to use the value (which is a name)




	## Why is it so complicated???

	* most other programming languages don't use it:

	> Zen of Python: Explicit is better than implicit

	* progammers are taught to use variable values not names





	## What are the benefits?

	* Less typing during interactive programming

	```{r}
	mtcars %>%
	filter(cyl == 6, mpg > 20, am == 1)
	```
	vs.
	```{r}
	mtcars[mtcars$cyl == 6 & mtcars$mpg > 20 & mtcars$am == 1, ]
	```


	* Can use complicated expressions in functions

	```{r}
	mtcars %>%
	transmute(standardized = (hp - mean(hp)/sd(hp)))
	```

	```{r}
	standardizer <- function(data, col) {
	col <- enquo(col)
	data %>%
	transmute(standardized = (!!col - mean(!!col)/sd(!!col)))
	}
	```
	```{r}
	standardizer(mtcars, hp)
	```

	* Can use `!!` + `enquo` like normal variables:
	- multiply, divide
	- built-in functions







	## What about strings?




	I want this:

	```{r}
	mtcars %>%
	group_by(cyl) %>%
	summarise(count = n())
	```

	By using a string:

	```{r}
	column <- "cyl"
	mtcars %>%
	group_by(column) %>%
	summarise(count = n())
	```

	Same problem as before:
	- tidyverse uses the name not the value

	There is no column with name `column`







	Does `enquo` + `!!` work?

	```{r}
	enquoed_column <- enquo(column)
	mtcars %>%
	group_by(!!column) %>%
	summarise(count = n())
	```

	No.



	`enquo` only works for names.






	We want the value `"cyl"` to be converted to a name.




	Introducing: `sym`

	```{r}
	symed_column <- sym(column)
	mtcars %>%
	group_by(!!symed_column) %>%
	summarise(count = n())
	```

	`!!` stays the same

	We use `sym` instead of `enquo` if we want to turn
	a value into a name




	We can also put this into a function

	```{r}
	grouper <- function(data, col) {
	col <- sym(col)
	data %>%
	group_by(!!col) %>%
	summarise(count = n())
	}
	```
	```{r}
	grouper(mtcars, "cyl")
	```



	## Any questions?

	### Any functions you want to see?