Created
April 30, 2020 14:14
-
-
Save PatrickRWright/f64270f506b8fdbc4f1fbeaa4dedd5af to your computer and use it in GitHub Desktop.
Created on Skills Network Labs
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
{ | |
"cells": [ | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"# Writing functions which include `tidyverse` grouping operations" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Sometimes you will have repetitive code chunks which include functionalities from the `tidyverse`. Rewriting such code as a function\n", | |
"is not quite as straight forward as for \"regular\" R functions especially if the input variables of your function are variable names within the `tidyverse` parts of your function (e.g. `group_by()` statements). The below example is an extreme simplification in order to make it easier to understand." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 30, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"library(tidyverse)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Let's have a look at an example which is not yet functionalized... \n", | |
"Add a column with the fraction of cars in groups defined by cylinders (`cyl`) and gears (`gear`)." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 31, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/html": [ | |
"<table>\n", | |
"<caption>A grouped_df: 8 × 4</caption>\n", | |
"<thead>\n", | |
"\t<tr><th scope=col>cyl</th><th scope=col>gear</th><th scope=col>n</th><th scope=col>frac</th></tr>\n", | |
"\t<tr><th scope=col><dbl></th><th scope=col><dbl></th><th scope=col><int></th><th scope=col><dbl></th></tr>\n", | |
"</thead>\n", | |
"<tbody>\n", | |
"\t<tr><td>4</td><td>3</td><td> 1</td><td>0.09090909</td></tr>\n", | |
"\t<tr><td>4</td><td>4</td><td> 8</td><td>0.72727273</td></tr>\n", | |
"\t<tr><td>4</td><td>5</td><td> 2</td><td>0.18181818</td></tr>\n", | |
"\t<tr><td>6</td><td>3</td><td> 2</td><td>0.28571429</td></tr>\n", | |
"\t<tr><td>6</td><td>4</td><td> 4</td><td>0.57142857</td></tr>\n", | |
"\t<tr><td>6</td><td>5</td><td> 1</td><td>0.14285714</td></tr>\n", | |
"\t<tr><td>8</td><td>3</td><td>12</td><td>0.85714286</td></tr>\n", | |
"\t<tr><td>8</td><td>5</td><td> 2</td><td>0.14285714</td></tr>\n", | |
"</tbody>\n", | |
"</table>\n" | |
], | |
"text/latex": [ | |
"A grouped\\_df: 8 × 4\n", | |
"\\begin{tabular}{llll}\n", | |
" cyl & gear & n & frac\\\\\n", | |
" <dbl> & <dbl> & <int> & <dbl>\\\\\n", | |
"\\hline\n", | |
"\t 4 & 3 & 1 & 0.09090909\\\\\n", | |
"\t 4 & 4 & 8 & 0.72727273\\\\\n", | |
"\t 4 & 5 & 2 & 0.18181818\\\\\n", | |
"\t 6 & 3 & 2 & 0.28571429\\\\\n", | |
"\t 6 & 4 & 4 & 0.57142857\\\\\n", | |
"\t 6 & 5 & 1 & 0.14285714\\\\\n", | |
"\t 8 & 3 & 12 & 0.85714286\\\\\n", | |
"\t 8 & 5 & 2 & 0.14285714\\\\\n", | |
"\\end{tabular}\n" | |
], | |
"text/markdown": [ | |
"\n", | |
"A grouped_df: 8 × 4\n", | |
"\n", | |
"| cyl <dbl> | gear <dbl> | n <int> | frac <dbl> |\n", | |
"|---|---|---|---|\n", | |
"| 4 | 3 | 1 | 0.09090909 |\n", | |
"| 4 | 4 | 8 | 0.72727273 |\n", | |
"| 4 | 5 | 2 | 0.18181818 |\n", | |
"| 6 | 3 | 2 | 0.28571429 |\n", | |
"| 6 | 4 | 4 | 0.57142857 |\n", | |
"| 6 | 5 | 1 | 0.14285714 |\n", | |
"| 8 | 3 | 12 | 0.85714286 |\n", | |
"| 8 | 5 | 2 | 0.14285714 |\n", | |
"\n" | |
], | |
"text/plain": [ | |
" cyl gear n frac \n", | |
"1 4 3 1 0.09090909\n", | |
"2 4 4 8 0.72727273\n", | |
"3 4 5 2 0.18181818\n", | |
"4 6 3 2 0.28571429\n", | |
"5 6 4 4 0.57142857\n", | |
"6 6 5 1 0.14285714\n", | |
"7 8 3 12 0.85714286\n", | |
"8 8 5 2 0.14285714" | |
] | |
}, | |
"metadata": {}, | |
"output_type": "display_data" | |
} | |
], | |
"source": [ | |
"frac1_tab <- mtcars %>% \n", | |
" group_by(cyl, gear) %>%\n", | |
" tally() %>%\n", | |
" mutate(frac = n/sum(n))\n", | |
"# show output\n", | |
"frac1_tab" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Similarly, you can perform the same task with any other categorical variable set (e.g. `am` and `vs`)." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 32, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/html": [ | |
"<table>\n", | |
"<caption>A grouped_df: 4 × 4</caption>\n", | |
"<thead>\n", | |
"\t<tr><th scope=col>am</th><th scope=col>vs</th><th scope=col>n</th><th scope=col>frac</th></tr>\n", | |
"\t<tr><th scope=col><dbl></th><th scope=col><dbl></th><th scope=col><int></th><th scope=col><dbl></th></tr>\n", | |
"</thead>\n", | |
"<tbody>\n", | |
"\t<tr><td>0</td><td>0</td><td>12</td><td>0.6315789</td></tr>\n", | |
"\t<tr><td>0</td><td>1</td><td> 7</td><td>0.3684211</td></tr>\n", | |
"\t<tr><td>1</td><td>0</td><td> 6</td><td>0.4615385</td></tr>\n", | |
"\t<tr><td>1</td><td>1</td><td> 7</td><td>0.5384615</td></tr>\n", | |
"</tbody>\n", | |
"</table>\n" | |
], | |
"text/latex": [ | |
"A grouped\\_df: 4 × 4\n", | |
"\\begin{tabular}{llll}\n", | |
" am & vs & n & frac\\\\\n", | |
" <dbl> & <dbl> & <int> & <dbl>\\\\\n", | |
"\\hline\n", | |
"\t 0 & 0 & 12 & 0.6315789\\\\\n", | |
"\t 0 & 1 & 7 & 0.3684211\\\\\n", | |
"\t 1 & 0 & 6 & 0.4615385\\\\\n", | |
"\t 1 & 1 & 7 & 0.5384615\\\\\n", | |
"\\end{tabular}\n" | |
], | |
"text/markdown": [ | |
"\n", | |
"A grouped_df: 4 × 4\n", | |
"\n", | |
"| am <dbl> | vs <dbl> | n <int> | frac <dbl> |\n", | |
"|---|---|---|---|\n", | |
"| 0 | 0 | 12 | 0.6315789 |\n", | |
"| 0 | 1 | 7 | 0.3684211 |\n", | |
"| 1 | 0 | 6 | 0.4615385 |\n", | |
"| 1 | 1 | 7 | 0.5384615 |\n", | |
"\n" | |
], | |
"text/plain": [ | |
" am vs n frac \n", | |
"1 0 0 12 0.6315789\n", | |
"2 0 1 7 0.3684211\n", | |
"3 1 0 6 0.4615385\n", | |
"4 1 1 7 0.5384615" | |
] | |
}, | |
"metadata": {}, | |
"output_type": "display_data" | |
} | |
], | |
"source": [ | |
"frac2_tab <- mtcars %>%\n", | |
" group_by(am, vs) %>%\n", | |
" tally() %>%\n", | |
" mutate(frac = n/sum(n))\n", | |
"# show output\n", | |
"frac2_tab" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"## Rewrite as function using `enquo()` and `!!` (bang bang):" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 33, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"add_frac_column <- function(data, group_one, group_two) {\n", | |
" group_one <- enquo(group_one)\n", | |
" group_two <- enquo(group_two)\n", | |
" \n", | |
" data %>%\n", | |
" group_by(!! group_one, !! group_two) %>%\n", | |
" tally() %>%\n", | |
" mutate(frac = n/sum(n))\n", | |
"}" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 34, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"frac1_tab_func <- add_frac_column(mtcars, group_one = cyl, group_two = gear)\n", | |
"frac2_tab_func <- add_frac_column(mtcars, group_one = am, group_two = vs)" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 35, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/html": [ | |
"TRUE" | |
], | |
"text/latex": [ | |
"TRUE" | |
], | |
"text/markdown": [ | |
"TRUE" | |
], | |
"text/plain": [ | |
"[1] TRUE" | |
] | |
}, | |
"metadata": {}, | |
"output_type": "display_data" | |
}, | |
{ | |
"data": { | |
"text/html": [ | |
"TRUE" | |
], | |
"text/latex": [ | |
"TRUE" | |
], | |
"text/markdown": [ | |
"TRUE" | |
], | |
"text/plain": [ | |
"[1] TRUE" | |
] | |
}, | |
"metadata": {}, | |
"output_type": "display_data" | |
} | |
], | |
"source": [ | |
"# Test for equality with non functionalized couterparts\n", | |
"all_equal(frac1_tab_func, frac1_tab)\n", | |
"all_equal(frac2_tab_func, frac2_tab)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Clearly, it may not be necessary to write a function for such a small task. However, if you only add a few more lines or call the function more often with different parameters, a functionalized form of your code is likely already better." | |
] | |
} | |
], | |
"metadata": { | |
"kernelspec": { | |
"display_name": "R", | |
"language": "R", | |
"name": "conda-env-r-r" | |
}, | |
"language_info": { | |
"codemirror_mode": "r", | |
"file_extension": ".r", | |
"mimetype": "text/x-r-source", | |
"name": "R", | |
"pygments_lexer": "r", | |
"version": "3.5.1" | |
} | |
}, | |
"nbformat": 4, | |
"nbformat_minor": 4 | |
} |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment