Created
April 30, 2020 12:54
-
-
Save PatrickRWright/7f1a39dff7ddb1e20de487ae1b70282d to your computer and use it in GitHub Desktop.
Created on Skills Network Labs
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
{ | |
"cells": [ | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"# Calculate a summary table with percentages per group" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"R automatically includes the dataset `mtcars`. From this we will select the cylinders (`cyl`) and gears (`gear`) as grouping categories and produce a summary table which includes counts and percentages. For this we will need to load the `tidyverse`." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 9, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"library(tidyverse)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"First you select which variables to `group_by`. Then `tally` created the summary total numbers `n`. Finally, `mutate` will create a new column for the percentages of the groups." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 12, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/html": [ | |
"<table>\n", | |
"<caption>A grouped_df: 8 × 4</caption>\n", | |
"<thead>\n", | |
"\t<tr><th scope=col>cyl</th><th scope=col>gear</th><th scope=col>n</th><th scope=col>percent</th></tr>\n", | |
"\t<tr><th scope=col><dbl></th><th scope=col><dbl></th><th scope=col><int></th><th scope=col><chr></th></tr>\n", | |
"</thead>\n", | |
"<tbody>\n", | |
"\t<tr><td>4</td><td>3</td><td> 1</td><td>9.09 % </td></tr>\n", | |
"\t<tr><td>4</td><td>4</td><td> 8</td><td>72.73 %</td></tr>\n", | |
"\t<tr><td>4</td><td>5</td><td> 2</td><td>18.18 %</td></tr>\n", | |
"\t<tr><td>6</td><td>3</td><td> 2</td><td>28.57 %</td></tr>\n", | |
"\t<tr><td>6</td><td>4</td><td> 4</td><td>57.14 %</td></tr>\n", | |
"\t<tr><td>6</td><td>5</td><td> 1</td><td>14.29 %</td></tr>\n", | |
"\t<tr><td>8</td><td>3</td><td>12</td><td>85.71 %</td></tr>\n", | |
"\t<tr><td>8</td><td>5</td><td> 2</td><td>14.29 %</td></tr>\n", | |
"</tbody>\n", | |
"</table>\n" | |
], | |
"text/latex": [ | |
"A grouped\\_df: 8 × 4\n", | |
"\\begin{tabular}{llll}\n", | |
" cyl & gear & n & percent\\\\\n", | |
" <dbl> & <dbl> & <int> & <chr>\\\\\n", | |
"\\hline\n", | |
"\t 4 & 3 & 1 & 9.09 \\% \\\\\n", | |
"\t 4 & 4 & 8 & 72.73 \\%\\\\\n", | |
"\t 4 & 5 & 2 & 18.18 \\%\\\\\n", | |
"\t 6 & 3 & 2 & 28.57 \\%\\\\\n", | |
"\t 6 & 4 & 4 & 57.14 \\%\\\\\n", | |
"\t 6 & 5 & 1 & 14.29 \\%\\\\\n", | |
"\t 8 & 3 & 12 & 85.71 \\%\\\\\n", | |
"\t 8 & 5 & 2 & 14.29 \\%\\\\\n", | |
"\\end{tabular}\n" | |
], | |
"text/markdown": [ | |
"\n", | |
"A grouped_df: 8 × 4\n", | |
"\n", | |
"| cyl <dbl> | gear <dbl> | n <int> | percent <chr> |\n", | |
"|---|---|---|---|\n", | |
"| 4 | 3 | 1 | 9.09 % |\n", | |
"| 4 | 4 | 8 | 72.73 % |\n", | |
"| 4 | 5 | 2 | 18.18 % |\n", | |
"| 6 | 3 | 2 | 28.57 % |\n", | |
"| 6 | 4 | 4 | 57.14 % |\n", | |
"| 6 | 5 | 1 | 14.29 % |\n", | |
"| 8 | 3 | 12 | 85.71 % |\n", | |
"| 8 | 5 | 2 | 14.29 % |\n", | |
"\n" | |
], | |
"text/plain": [ | |
" cyl gear n percent\n", | |
"1 4 3 1 9.09 % \n", | |
"2 4 4 8 72.73 %\n", | |
"3 4 5 2 18.18 %\n", | |
"4 6 3 2 28.57 %\n", | |
"5 6 4 4 57.14 %\n", | |
"6 6 5 1 14.29 %\n", | |
"7 8 3 12 85.71 %\n", | |
"8 8 5 2 14.29 %" | |
] | |
}, | |
"metadata": {}, | |
"output_type": "display_data" | |
} | |
], | |
"source": [ | |
"mtcars %>%\n", | |
" group_by(cyl, gear) %>%\n", | |
" tally() %>%\n", | |
" mutate(percent = paste(round((n/sum(n)) * 100, digits = 2), \"%\"))" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Everything in the percentage column should add up to `3` since there are three overall gear groups. `pull` returns the last column (i.e. the `percentage` in this case)." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 13, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/html": [ | |
"3" | |
], | |
"text/latex": [ | |
"3" | |
], | |
"text/markdown": [ | |
"3" | |
], | |
"text/plain": [ | |
"[1] 3" | |
] | |
}, | |
"metadata": {}, | |
"output_type": "display_data" | |
} | |
], | |
"source": [ | |
"mtcars %>% group_by(cyl, gear) %>% tally() %>% mutate(percentage = n/sum(n)) %>% pull() %>% sum()" | |
] | |
} | |
], | |
"metadata": { | |
"kernelspec": { | |
"display_name": "R", | |
"language": "R", | |
"name": "conda-env-r-r" | |
}, | |
"language_info": { | |
"codemirror_mode": "r", | |
"file_extension": ".r", | |
"mimetype": "text/x-r-source", | |
"name": "R", | |
"pygments_lexer": "r", | |
"version": "3.5.1" | |
} | |
}, | |
"nbformat": 4, | |
"nbformat_minor": 4 | |
} |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment