Created
July 19, 2018 13:34
-
-
Save kaz-yos/43aa2deb8d30108b2162ef18b7d93cb2 to your computer and use it in GitHub Desktop.
Lightning talk at Partners R User Group Meeting on 2018-07-19
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
--- | |
title: "tableone (Lightning talk at Partners R User Group Meeting)" | |
author: "Kazuki Yoshida" | |
date: "`r format(Sys.time(), '%Y-%m-%d')`" | |
output: html_document | |
--- | |
```{r, message = FALSE, tidy = FALSE, echo = F} | |
## knitr configuration: http://yihui.name/knitr/options#chunk_options | |
library(knitr) | |
showMessage <- FALSE | |
showWarning <- TRUE | |
set_alias(w = "fig.width", h = "fig.height", res = "results") | |
opts_chunk$set(comment = "##", error= TRUE, warning = showWarning, message = showMessage, | |
tidy = FALSE, cache = F, echo = T, | |
fig.width = 7, fig.height = 7, dev.args = list(family = "sans")) | |
## for rgl | |
## knit_hooks$set(rgl = hook_rgl, webgl = hook_webgl) | |
## for animation | |
opts_knit$set(animation.fun = hook_ffmpeg_html) | |
## R configuration | |
options(width = 116, scipen = 5) | |
``` | |
## What is this? | |
This is a material for a lightning talk at the [Partners R User Group](https://rc.partners.org/support-training/training/partners-r-user-group) meeting on 2018-07-19. | |
## References | |
- CRAN: https://cran.r-project.org/web/packages/tableone/index.html | |
- [Introduction](https://cran.r-project.org/web/packages/tableone/vignettes/introduction.html) | |
- [Using SMD](https://cran.r-project.org/web/packages/tableone/vignettes/smd.html) | |
## Introduction | |
tableone is an R package that assist the creation of "Table 1", patient baseline characteristics in a format that is often seen in biomedical journals. | |
## Load packages | |
```{r} | |
library(tidyverse) | |
library(tableone) | |
``` | |
## Load data | |
We load the pbc (primary biliary cirrhosis) dataset from Mayo Clinic. | |
```{r} | |
data(pbc, package = "survival") | |
pbc <- as_data_frame(pbc) | |
pbc | |
``` | |
## Overall tables | |
Invocation of CreateTableOne() with just the data argument shows all variables. | |
```{r} | |
CreateTableOne(data = pbc) | |
``` | |
Some variables are not appropriate as patient baseline characteristics, so let's specify variables via the vars argument. Here we remove patient ID and outcome variables (time and status). | |
```{r} | |
dput(names(pbc)) | |
vars <- c("trt", "age", "sex", "ascites", "hepato", | |
"spiders", "edema", "bili", "chol", "albumin", "copper", "alk.phos", | |
"ast", "trig", "platelet", "protime", "stage") | |
CreateTableOne(vars = vars, data = pbc) | |
``` | |
See ?pbc to better understand the dataset. | |
``` | |
pbc package:survival R Documentation | |
Mayo Clinic Primary Biliary Cirrhosis Data | |
Description: | |
D This data is from the Mayo Clinic trial in primary biliary | |
cirrhosis (PBC) of the liver conducted between 1974 and 1984. A | |
total of 424 PBC patients, referred to Mayo Clinic during that | |
ten-year interval, met eligibility criteria for the randomized | |
placebo controlled trial of the drug D-penicillamine. The first | |
312 cases in the data set participated in the randomized trial and | |
contain largely complete data. The additional 112 cases did not | |
participate in the clinical trial, but consented to have basic | |
measurements recorded and to be followed for survival. Six of | |
those cases were lost to follow-up shortly after diagnosis, so the | |
data here are on an additional 106 cases as well as the 312 | |
randomized participants. | |
A nearly identical data set found in appendix D of Fleming and | |
Harrington; this version has fewer missing values. | |
Usage: | |
pbc | |
Format: | |
age: in years | |
albumin: serum albumin (g/dl) | |
alk.phos: alkaline phosphotase (U/liter) | |
ascites: presence of ascites | |
ast: aspartate aminotransferase, once called SGOT (U/ml) | |
bili: serum bilirunbin (mg/dl) | |
chol: serum cholesterol (mg/dl) | |
copper: urine copper (ug/day) | |
edema: 0 no edema, 0.5 untreated or successfully treated | |
1 edema despite diuretic therapy | |
hepato: presence of hepatomegaly or enlarged liver | |
id: case number | |
platelet: platelet count | |
protime: standardised blood clotting time | |
sex: m/f | |
spiders: blood vessel malformations in the skin | |
stage: histologic stage of disease (needs biopsy) | |
status: status at endpoint, 0/1/2 for censored, transplant, dead | |
time: number of days between registration and the earlier of death, | |
transplantion, or study analysis in July, 1986 | |
trt: 1/2/NA for D-penicillmain, placebo, not randomised | |
trig: triglycerides (mg/dl) | |
Source: | |
T Therneau and P Grambsch (2000), _Modeling Survival Data: | |
Extending the Cox Model_, Springer-Verlag, New York. ISBN: | |
0-387-98784-3. | |
``` | |
We can see some variables are numerically coded categorical variables (ascites, edema, hepato, trt). Here we convert these to factors for correct handling. For binary variables, make the second level the one you want to show the percentage for. | |
```{r} | |
pbc <- pbc %>% | |
mutate(ascites = factor(ascites, levels = c(0,1), labels = c("Absent","Present")), | |
edema = factor(edema, levels = c(0, 0.5, 1), labels = c("No edema","Untreated or successfully treated","edema despite diuretic therapy")), | |
hepato = factor(hepato, levels = c(0,1), labels = c("Absent","Present")), | |
stage = factor(stage), | |
trt = factor(trt, levels = c(1,2), labels = c("D-penicillmain", "Placebo"))) | |
``` | |
Now these variables are handled better. | |
```{r} | |
CreateTableOne(vars = vars, data = pbc) | |
``` | |
Show missing proportions with the missing option to the print method. | |
```{r} | |
print(CreateTableOne(vars = vars, data = pbc), missing = TRUE) | |
``` | |
## Group-stratified tables | |
trt is the treatment assignment variable, we should stratify the table with this variable. P-values are added by reasonable default functions. | |
```{r} | |
vars <- setdiff(vars, "trt") | |
CreateTableOne(vars = vars, strata = "trt", data = pbc) | |
``` | |
Some continuous variables are quite skewed like most biomarkers are. Median [IQR] may be a preferred format for these. Note test column indicates, p-values are based on different function, Wilcoxon test in this case. | |
```{r} | |
print(CreateTableOne(vars = vars, strata = "trt", data = pbc), nonnormal = c("bili","chol")) | |
``` | |
In the propensity score analysis, standardized mean differences (SMDs) are often preferred. Use the smd argument for | |
```{r} | |
print(CreateTableOne(vars = vars, strata = "trt", data = pbc), nonnormal = c("bili","chol"), smd = TRUE, test = FALSE) | |
``` | |
## Variable labels | |
Variable names are typically short and not appropriate for the final version of the table. Use the labelled package to assign variable labels. | |
```{r} | |
var_label_list <- list(age = "Age in years", | |
sex = "Female", | |
ascites = "Ascites", | |
hepato = "Hepatomegaly", | |
spiders = "Spider angioma", | |
edema = "Edema", | |
bili = "Serum bilirunbin, mg/dl", | |
chol = "Serum cholesterol, mg/dl", | |
copper = "Urine copper ug/day", | |
stage = "Histologic stage of disease", | |
trig = "Triglycerides, mg/dl", | |
albumin = "Serum albumin, g/dl", | |
alk.phos = "Alkaline phosphotase, U/liter", | |
ast = "Aspartate aminotransferase, U/ml", | |
platelet = "Platelet count", | |
protime = "Prothrombin time in seconds") | |
labelled::var_label(pbc) <- var_label_list | |
labelled::var_label(pbc) | |
``` | |
Let's see the table with variable labels. | |
```{r} | |
print(CreateTableOne(vars = vars, strata = "trt", data = pbc), nonnormal = c("bili","chol"), smd = TRUE, test = FALSE, varLabels = TRUE) | |
``` | |
Once binary categories look OK, we can suppress level indication. | |
```{r} | |
print(CreateTableOne(vars = vars, strata = "trt", data = pbc), nonnormal = c("bili","chol"), smd = TRUE, test = FALSE, varLabels = TRUE, dropEqual = TRUE) | |
``` | |
## Export to a CSV file | |
The print method is invisibly returning a matrix object. We can export this to a file. In the console, the formating | |
via spaces, but we don't need them when exporting. The noSpaces option controls this aspect. If assigning the matrix is all you need, you can turn off printing by the printToggle option. | |
```{r} | |
tab1mat <- print(CreateTableOne(vars = vars, strata = "trt", data = pbc), nonnormal = c("bili","chol"), smd = TRUE, test = FALSE, varLabels = TRUE, dropEqual = TRUE, noSpaces = TRUE, printToggle = FALSE) | |
``` | |
Now this is just a matrix of text. | |
```{r} | |
tab1mat | |
``` | |
You can write to a CSV file easily. | |
```{r} | |
write.csv(tab1mat, file = "./tab1.csv") | |
``` | |
-------------------- | |
- Top Page: http://rpubs.com/kaz_yos/ | |
- Github: https://github.com/kaz-yos | |
- Twitter: https://twitter.com/kaz_yos |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment