Skip to content

Instantly share code, notes, and snippets.

@PatrickRWright
Created April 30, 2020 14:14
Show Gist options
  • Save PatrickRWright/f64270f506b8fdbc4f1fbeaa4dedd5af to your computer and use it in GitHub Desktop.
Save PatrickRWright/f64270f506b8fdbc4f1fbeaa4dedd5af to your computer and use it in GitHub Desktop.
Created on Skills Network Labs
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Writing functions which include `tidyverse` grouping operations"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Sometimes you will have repetitive code chunks which include functionalities from the `tidyverse`. Rewriting such code as a function\n",
"is not quite as straight forward as for \"regular\" R functions especially if the input variables of your function are variable names within the `tidyverse` parts of your function (e.g. `group_by()` statements). The below example is an extreme simplification in order to make it easier to understand."
]
},
{
"cell_type": "code",
"execution_count": 30,
"metadata": {},
"outputs": [],
"source": [
"library(tidyverse)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's have a look at an example which is not yet functionalized... \n",
"Add a column with the fraction of cars in groups defined by cylinders (`cyl`) and gears (`gear`)."
]
},
{
"cell_type": "code",
"execution_count": 31,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<table>\n",
"<caption>A grouped_df: 8 × 4</caption>\n",
"<thead>\n",
"\t<tr><th scope=col>cyl</th><th scope=col>gear</th><th scope=col>n</th><th scope=col>frac</th></tr>\n",
"\t<tr><th scope=col>&lt;dbl&gt;</th><th scope=col>&lt;dbl&gt;</th><th scope=col>&lt;int&gt;</th><th scope=col>&lt;dbl&gt;</th></tr>\n",
"</thead>\n",
"<tbody>\n",
"\t<tr><td>4</td><td>3</td><td> 1</td><td>0.09090909</td></tr>\n",
"\t<tr><td>4</td><td>4</td><td> 8</td><td>0.72727273</td></tr>\n",
"\t<tr><td>4</td><td>5</td><td> 2</td><td>0.18181818</td></tr>\n",
"\t<tr><td>6</td><td>3</td><td> 2</td><td>0.28571429</td></tr>\n",
"\t<tr><td>6</td><td>4</td><td> 4</td><td>0.57142857</td></tr>\n",
"\t<tr><td>6</td><td>5</td><td> 1</td><td>0.14285714</td></tr>\n",
"\t<tr><td>8</td><td>3</td><td>12</td><td>0.85714286</td></tr>\n",
"\t<tr><td>8</td><td>5</td><td> 2</td><td>0.14285714</td></tr>\n",
"</tbody>\n",
"</table>\n"
],
"text/latex": [
"A grouped\\_df: 8 × 4\n",
"\\begin{tabular}{llll}\n",
" cyl & gear & n & frac\\\\\n",
" <dbl> & <dbl> & <int> & <dbl>\\\\\n",
"\\hline\n",
"\t 4 & 3 & 1 & 0.09090909\\\\\n",
"\t 4 & 4 & 8 & 0.72727273\\\\\n",
"\t 4 & 5 & 2 & 0.18181818\\\\\n",
"\t 6 & 3 & 2 & 0.28571429\\\\\n",
"\t 6 & 4 & 4 & 0.57142857\\\\\n",
"\t 6 & 5 & 1 & 0.14285714\\\\\n",
"\t 8 & 3 & 12 & 0.85714286\\\\\n",
"\t 8 & 5 & 2 & 0.14285714\\\\\n",
"\\end{tabular}\n"
],
"text/markdown": [
"\n",
"A grouped_df: 8 × 4\n",
"\n",
"| cyl &lt;dbl&gt; | gear &lt;dbl&gt; | n &lt;int&gt; | frac &lt;dbl&gt; |\n",
"|---|---|---|---|\n",
"| 4 | 3 | 1 | 0.09090909 |\n",
"| 4 | 4 | 8 | 0.72727273 |\n",
"| 4 | 5 | 2 | 0.18181818 |\n",
"| 6 | 3 | 2 | 0.28571429 |\n",
"| 6 | 4 | 4 | 0.57142857 |\n",
"| 6 | 5 | 1 | 0.14285714 |\n",
"| 8 | 3 | 12 | 0.85714286 |\n",
"| 8 | 5 | 2 | 0.14285714 |\n",
"\n"
],
"text/plain": [
" cyl gear n frac \n",
"1 4 3 1 0.09090909\n",
"2 4 4 8 0.72727273\n",
"3 4 5 2 0.18181818\n",
"4 6 3 2 0.28571429\n",
"5 6 4 4 0.57142857\n",
"6 6 5 1 0.14285714\n",
"7 8 3 12 0.85714286\n",
"8 8 5 2 0.14285714"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"frac1_tab <- mtcars %>% \n",
" group_by(cyl, gear) %>%\n",
" tally() %>%\n",
" mutate(frac = n/sum(n))\n",
"# show output\n",
"frac1_tab"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Similarly, you can perform the same task with any other categorical variable set (e.g. `am` and `vs`)."
]
},
{
"cell_type": "code",
"execution_count": 32,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<table>\n",
"<caption>A grouped_df: 4 × 4</caption>\n",
"<thead>\n",
"\t<tr><th scope=col>am</th><th scope=col>vs</th><th scope=col>n</th><th scope=col>frac</th></tr>\n",
"\t<tr><th scope=col>&lt;dbl&gt;</th><th scope=col>&lt;dbl&gt;</th><th scope=col>&lt;int&gt;</th><th scope=col>&lt;dbl&gt;</th></tr>\n",
"</thead>\n",
"<tbody>\n",
"\t<tr><td>0</td><td>0</td><td>12</td><td>0.6315789</td></tr>\n",
"\t<tr><td>0</td><td>1</td><td> 7</td><td>0.3684211</td></tr>\n",
"\t<tr><td>1</td><td>0</td><td> 6</td><td>0.4615385</td></tr>\n",
"\t<tr><td>1</td><td>1</td><td> 7</td><td>0.5384615</td></tr>\n",
"</tbody>\n",
"</table>\n"
],
"text/latex": [
"A grouped\\_df: 4 × 4\n",
"\\begin{tabular}{llll}\n",
" am & vs & n & frac\\\\\n",
" <dbl> & <dbl> & <int> & <dbl>\\\\\n",
"\\hline\n",
"\t 0 & 0 & 12 & 0.6315789\\\\\n",
"\t 0 & 1 & 7 & 0.3684211\\\\\n",
"\t 1 & 0 & 6 & 0.4615385\\\\\n",
"\t 1 & 1 & 7 & 0.5384615\\\\\n",
"\\end{tabular}\n"
],
"text/markdown": [
"\n",
"A grouped_df: 4 × 4\n",
"\n",
"| am &lt;dbl&gt; | vs &lt;dbl&gt; | n &lt;int&gt; | frac &lt;dbl&gt; |\n",
"|---|---|---|---|\n",
"| 0 | 0 | 12 | 0.6315789 |\n",
"| 0 | 1 | 7 | 0.3684211 |\n",
"| 1 | 0 | 6 | 0.4615385 |\n",
"| 1 | 1 | 7 | 0.5384615 |\n",
"\n"
],
"text/plain": [
" am vs n frac \n",
"1 0 0 12 0.6315789\n",
"2 0 1 7 0.3684211\n",
"3 1 0 6 0.4615385\n",
"4 1 1 7 0.5384615"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"frac2_tab <- mtcars %>%\n",
" group_by(am, vs) %>%\n",
" tally() %>%\n",
" mutate(frac = n/sum(n))\n",
"# show output\n",
"frac2_tab"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Rewrite as function using `enquo()` and `!!` (bang bang):"
]
},
{
"cell_type": "code",
"execution_count": 33,
"metadata": {},
"outputs": [],
"source": [
"add_frac_column <- function(data, group_one, group_two) {\n",
" group_one <- enquo(group_one)\n",
" group_two <- enquo(group_two)\n",
" \n",
" data %>%\n",
" group_by(!! group_one, !! group_two) %>%\n",
" tally() %>%\n",
" mutate(frac = n/sum(n))\n",
"}"
]
},
{
"cell_type": "code",
"execution_count": 34,
"metadata": {},
"outputs": [],
"source": [
"frac1_tab_func <- add_frac_column(mtcars, group_one = cyl, group_two = gear)\n",
"frac2_tab_func <- add_frac_column(mtcars, group_one = am, group_two = vs)"
]
},
{
"cell_type": "code",
"execution_count": 35,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"TRUE"
],
"text/latex": [
"TRUE"
],
"text/markdown": [
"TRUE"
],
"text/plain": [
"[1] TRUE"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/html": [
"TRUE"
],
"text/latex": [
"TRUE"
],
"text/markdown": [
"TRUE"
],
"text/plain": [
"[1] TRUE"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"# Test for equality with non functionalized couterparts\n",
"all_equal(frac1_tab_func, frac1_tab)\n",
"all_equal(frac2_tab_func, frac2_tab)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Clearly, it may not be necessary to write a function for such a small task. However, if you only add a few more lines or call the function more often with different parameters, a functionalized form of your code is likely already better."
]
}
],
"metadata": {
"kernelspec": {
"display_name": "R",
"language": "R",
"name": "conda-env-r-r"
},
"language_info": {
"codemirror_mode": "r",
"file_extension": ".r",
"mimetype": "text/x-r-source",
"name": "R",
"pygments_lexer": "r",
"version": "3.5.1"
}
},
"nbformat": 4,
"nbformat_minor": 4
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment