kaz-yos · July 19, 2018 13:34
diff --git a/tableone_partners_r_group.Rmd b/tableone_partners_r_group.Rmd
 ---
 title: "tableone (Lightning talk at Partners R User Group Meeting)"
 author: "Kazuki Yoshida"
 date: "`r format(Sys.time(), '%Y-%m-%d')`"
 output: html_document
 ---

 ```{r, message = FALSE, tidy = FALSE, echo = F}
 ## knitr configuration: http://yihui.name/knitr/options#chunk_options
 library(knitr)
 showMessage <- FALSE
 showWarning <- TRUE
 set_alias(w = "fig.width", h = "fig.height", res = "results")
 opts_chunk$set(comment = "##", error= TRUE, warning = showWarning, message = showMessage,
               tidy = FALSE, cache = F, echo = T,
               fig.width = 7, fig.height = 7, dev.args = list(family = "sans"))
 ## for rgl
 ## knit_hooks$set(rgl = hook_rgl, webgl = hook_webgl)
 ## for animation
 opts_knit$set(animation.fun = hook_ffmpeg_html)

 ## R configuration
 options(width = 116, scipen = 5)
 ```

 ## What is this?
 This is a material for a lightning talk at the [Partners R User Group](https://rc.partners.org/support-training/training/partners-r-user-group) meeting on 2018-07-19.


 ## References
 - CRAN: https://cran.r-project.org/web/packages/tableone/index.html
 - [Introduction](https://cran.r-project.org/web/packages/tableone/vignettes/introduction.html)
 - [Using SMD](https://cran.r-project.org/web/packages/tableone/vignettes/smd.html)


 ## Introduction
 tableone is an R package that assist the creation of "Table 1", patient baseline characteristics in a format that is often seen in biomedical journals.


 ## Load packages

 ```{r}
 library(tidyverse)
 library(tableone)
 ```


 ## Load data
 We load the pbc (primary biliary cirrhosis) dataset from Mayo Clinic.
 ```{r}
 data(pbc, package = "survival")
 pbc <- as_data_frame(pbc)
 pbc
 ```


 ## Overall tables

 Invocation of CreateTableOne() with just the data argument shows all variables.

 ```{r}
 CreateTableOne(data = pbc)
 ```

 Some variables are not appropriate as patient baseline characteristics, so let's specify variables via the vars argument. Here we remove patient ID and outcome variables (time and status).

 ```{r}
 dput(names(pbc))
 vars <- c("trt", "age", "sex", "ascites", "hepato",
          "spiders", "edema", "bili", "chol", "albumin", "copper", "alk.phos",
          "ast", "trig", "platelet", "protime", "stage")
 CreateTableOne(vars = vars, data = pbc)
 ```

 See ?pbc to better understand the dataset.
 ```
 pbc                  package:survival                  R Documentation

 Mayo Clinic Primary Biliary Cirrhosis Data

 Description:

     D This data is from the Mayo Clinic trial in primary biliary
     cirrhosis (PBC) of the liver conducted between 1974 and 1984.  A
     total of 424 PBC patients, referred to Mayo Clinic during that
     ten-year interval, met eligibility criteria for the randomized
     placebo controlled trial of the drug D-penicillamine.  The first
     312 cases in the data set participated in the randomized trial and
     contain largely complete data.  The additional 112 cases did not
     participate in the clinical trial, but consented to have basic
     measurements recorded and to be followed for survival.  Six of
     those cases were lost to follow-up shortly after diagnosis, so the
     data here are on an additional 106 cases as well as the 312
     randomized participants.

     A nearly identical data set found in appendix D of Fleming and
     Harrington; this version has fewer missing values.

 Usage:

     pbc

 Format:

       age:       in years
       albumin:   serum albumin (g/dl)
       alk.phos:  alkaline phosphotase (U/liter)
       ascites:   presence of ascites
       ast:       aspartate aminotransferase, once called SGOT (U/ml)
       bili:      serum bilirunbin (mg/dl)
       chol:      serum cholesterol (mg/dl)
       copper:    urine copper (ug/day)
       edema:     0 no edema, 0.5 untreated or successfully treated
                  1 edema despite diuretic therapy
       hepato:    presence of hepatomegaly or enlarged liver
       id:        case number
       platelet:  platelet count
       protime:   standardised blood clotting time
       sex:       m/f
       spiders:   blood vessel malformations in the skin
       stage:     histologic stage of disease (needs biopsy)
       status:    status at endpoint, 0/1/2 for censored, transplant, dead
       time:      number of days between registration and the earlier of death,
                  transplantion, or study analysis in July, 1986
       trt:       1/2/NA for D-penicillmain, placebo, not randomised
       trig:      triglycerides (mg/dl)

 Source:

     T Therneau and P Grambsch (2000), _Modeling Survival Data:
     Extending the Cox Model_, Springer-Verlag, New York.  ISBN:
     0-387-98784-3.
 ```

 We can see some variables are numerically coded categorical variables (ascites, edema, hepato, trt). Here we convert these to factors for correct handling. For binary variables, make the second level the one you want to show the percentage for.

 ```{r}
 pbc <- pbc %>%
    mutate(ascites = factor(ascites, levels = c(0,1), labels = c("Absent","Present")),
           edema = factor(edema, levels = c(0, 0.5, 1), labels = c("No edema","Untreated or successfully treated","edema despite diuretic therapy")),
           hepato = factor(hepato, levels = c(0,1), labels = c("Absent","Present")),
           stage = factor(stage),
           trt = factor(trt, levels = c(1,2), labels = c("D-penicillmain", "Placebo")))
 ```

 Now these variables are handled better.

 ```{r}
 CreateTableOne(vars = vars, data = pbc)
 ```

 Show missing proportions with the missing option to the print method.

 ```{r}
 print(CreateTableOne(vars = vars, data = pbc), missing = TRUE)
 ```


 ## Group-stratified tables

 trt is the treatment assignment variable, we should stratify the table with this variable. P-values are added by reasonable default functions.

 ```{r}
 vars <- setdiff(vars, "trt")
 CreateTableOne(vars = vars, strata = "trt", data = pbc)
 ```

 Some continuous variables are quite skewed like most biomarkers are. Median [IQR] may be a preferred format for these. Note test column indicates, p-values are based on different function, Wilcoxon test in this case.

 ```{r}
 print(CreateTableOne(vars = vars, strata = "trt", data = pbc), nonnormal = c("bili","chol"))
 ```

 In the propensity score analysis, standardized mean differences (SMDs) are often preferred. Use the smd argument for

 ```{r}
 print(CreateTableOne(vars = vars, strata = "trt", data = pbc), nonnormal = c("bili","chol"), smd = TRUE, test = FALSE)
 ```


 ## Variable labels

 Variable names are typically short and not appropriate for the final version of the table. Use the labelled package to assign variable labels.
 ```{r}
 var_label_list <- list(age = "Age in years",
                       sex = "Female",
                       ascites = "Ascites",
                       hepato = "Hepatomegaly",
                       spiders = "Spider angioma",
                       edema = "Edema",
                       bili = "Serum bilirunbin, mg/dl",
                       chol = "Serum cholesterol, mg/dl",
                       copper = "Urine copper ug/day",
                       stage = "Histologic stage of disease",
                       trig = "Triglycerides, mg/dl",
                       albumin = "Serum albumin, g/dl",
                       alk.phos = "Alkaline phosphotase, U/liter",
                       ast = "Aspartate aminotransferase, U/ml",
                       platelet = "Platelet count",
                       protime = "Prothrombin time in seconds")
 labelled::var_label(pbc) <- var_label_list
 labelled::var_label(pbc)
 ```

 Let's see the table with variable labels.
 ```{r}
 print(CreateTableOne(vars = vars, strata = "trt", data = pbc), nonnormal = c("bili","chol"), smd = TRUE, test = FALSE, varLabels = TRUE)
 ```

 Once binary categories look OK, we can suppress level indication.
 ```{r}
 print(CreateTableOne(vars = vars, strata = "trt", data = pbc), nonnormal = c("bili","chol"), smd = TRUE, test = FALSE, varLabels = TRUE, dropEqual = TRUE)
 ```


 ## Export to a CSV file

 The print method is invisibly returning a matrix object. We can export this to a file. In the console, the formating
 via spaces, but we don't need them when exporting. The noSpaces option controls this aspect. If assigning the matrix is all you need, you can turn off printing by the printToggle option.

 ```{r}
 tab1mat <- print(CreateTableOne(vars = vars, strata = "trt", data = pbc), nonnormal = c("bili","chol"), smd = TRUE, test = FALSE, varLabels = TRUE, dropEqual = TRUE, noSpaces = TRUE, printToggle = FALSE)
 ```

 Now this is just a matrix of text.
 ```{r}
 tab1mat
 ```

 You can write to a CSV file easily.
 ```{r}
 write.csv(tab1mat, file = "./tab1.csv")
 ```

 --------------------
 - Top Page: http://rpubs.com/kaz_yos/
 - Github: https://github.com/kaz-yos
 - Twitter: https://twitter.com/kaz_yos
	---
	title: "tableone (Lightning talk at Partners R User Group Meeting)"
	author: "Kazuki Yoshida"
	date: "`r format(Sys.time(), '%Y-%m-%d')`"
	output: html_document
	---

	```{r, message = FALSE, tidy = FALSE, echo = F}
	## knitr configuration: http://yihui.name/knitr/options#chunk_options
	library(knitr)
	showMessage <- FALSE
	showWarning <- TRUE
	set_alias(w = "fig.width", h = "fig.height", res = "results")
	opts_chunk$set(comment = "##", error= TRUE, warning = showWarning, message = showMessage,
	tidy = FALSE, cache = F, echo = T,
	fig.width = 7, fig.height = 7, dev.args = list(family = "sans"))
	## for rgl
	## knit_hooks$set(rgl = hook_rgl, webgl = hook_webgl)
	## for animation
	opts_knit$set(animation.fun = hook_ffmpeg_html)

	## R configuration
	options(width = 116, scipen = 5)
	```

	## What is this?
	This is a material for a lightning talk at the [Partners R User Group](https://rc.partners.org/support-training/training/partners-r-user-group) meeting on 2018-07-19.


	## References
	- CRAN: https://cran.r-project.org/web/packages/tableone/index.html
	- [Introduction](https://cran.r-project.org/web/packages/tableone/vignettes/introduction.html)
	- [Using SMD](https://cran.r-project.org/web/packages/tableone/vignettes/smd.html)


	## Introduction
	tableone is an R package that assist the creation of "Table 1", patient baseline characteristics in a format that is often seen in biomedical journals.


	## Load packages

	```{r}
	library(tidyverse)
	library(tableone)
	```


	## Load data
	We load the pbc (primary biliary cirrhosis) dataset from Mayo Clinic.
	```{r}
	data(pbc, package = "survival")
	pbc <- as_data_frame(pbc)
	pbc
	```


	## Overall tables

	Invocation of CreateTableOne() with just the data argument shows all variables.

	```{r}
	CreateTableOne(data = pbc)
	```

	Some variables are not appropriate as patient baseline characteristics, so let's specify variables via the vars argument. Here we remove patient ID and outcome variables (time and status).

	```{r}
	dput(names(pbc))
	vars <- c("trt", "age", "sex", "ascites", "hepato",
	"spiders", "edema", "bili", "chol", "albumin", "copper", "alk.phos",
	"ast", "trig", "platelet", "protime", "stage")
	CreateTableOne(vars = vars, data = pbc)
	```

	See ?pbc to better understand the dataset.
	```
	pbc package:survival R Documentation

	Mayo Clinic Primary Biliary Cirrhosis Data

	Description:

	D This data is from the Mayo Clinic trial in primary biliary
	cirrhosis (PBC) of the liver conducted between 1974 and 1984. A
	total of 424 PBC patients, referred to Mayo Clinic during that
	ten-year interval, met eligibility criteria for the randomized
	placebo controlled trial of the drug D-penicillamine. The first
	312 cases in the data set participated in the randomized trial and
	contain largely complete data. The additional 112 cases did not
	participate in the clinical trial, but consented to have basic
	measurements recorded and to be followed for survival. Six of
	those cases were lost to follow-up shortly after diagnosis, so the
	data here are on an additional 106 cases as well as the 312
	randomized participants.

	A nearly identical data set found in appendix D of Fleming and
	Harrington; this version has fewer missing values.

	Usage:

	pbc

	Format:

	age: in years
	albumin: serum albumin (g/dl)
	alk.phos: alkaline phosphotase (U/liter)
	ascites: presence of ascites
	ast: aspartate aminotransferase, once called SGOT (U/ml)
	bili: serum bilirunbin (mg/dl)
	chol: serum cholesterol (mg/dl)
	copper: urine copper (ug/day)
	edema: 0 no edema, 0.5 untreated or successfully treated
	1 edema despite diuretic therapy
	hepato: presence of hepatomegaly or enlarged liver
	id: case number
	platelet: platelet count
	protime: standardised blood clotting time
	sex: m/f
	spiders: blood vessel malformations in the skin
	stage: histologic stage of disease (needs biopsy)
	status: status at endpoint, 0/1/2 for censored, transplant, dead
	time: number of days between registration and the earlier of death,
	transplantion, or study analysis in July, 1986
	trt: 1/2/NA for D-penicillmain, placebo, not randomised
	trig: triglycerides (mg/dl)

	Source:

	T Therneau and P Grambsch (2000), _Modeling Survival Data:
	Extending the Cox Model_, Springer-Verlag, New York. ISBN:
	0-387-98784-3.
	```

	We can see some variables are numerically coded categorical variables (ascites, edema, hepato, trt). Here we convert these to factors for correct handling. For binary variables, make the second level the one you want to show the percentage for.

	```{r}
	pbc <- pbc %>%
	mutate(ascites = factor(ascites, levels = c(0,1), labels = c("Absent","Present")),
	edema = factor(edema, levels = c(0, 0.5, 1), labels = c("No edema","Untreated or successfully treated","edema despite diuretic therapy")),
	hepato = factor(hepato, levels = c(0,1), labels = c("Absent","Present")),
	stage = factor(stage),
	trt = factor(trt, levels = c(1,2), labels = c("D-penicillmain", "Placebo")))
	```

	Now these variables are handled better.

	```{r}
	CreateTableOne(vars = vars, data = pbc)
	```

	Show missing proportions with the missing option to the print method.

	```{r}
	print(CreateTableOne(vars = vars, data = pbc), missing = TRUE)
	```


	## Group-stratified tables

	trt is the treatment assignment variable, we should stratify the table with this variable. P-values are added by reasonable default functions.

	```{r}
	vars <- setdiff(vars, "trt")
	CreateTableOne(vars = vars, strata = "trt", data = pbc)
	```

	Some continuous variables are quite skewed like most biomarkers are. Median [IQR] may be a preferred format for these. Note test column indicates, p-values are based on different function, Wilcoxon test in this case.

	```{r}
	print(CreateTableOne(vars = vars, strata = "trt", data = pbc), nonnormal = c("bili","chol"))
	```

	In the propensity score analysis, standardized mean differences (SMDs) are often preferred. Use the smd argument for

	```{r}
	print(CreateTableOne(vars = vars, strata = "trt", data = pbc), nonnormal = c("bili","chol"), smd = TRUE, test = FALSE)
	```


	## Variable labels

	Variable names are typically short and not appropriate for the final version of the table. Use the labelled package to assign variable labels.
	```{r}
	var_label_list <- list(age = "Age in years",
	sex = "Female",
	ascites = "Ascites",
	hepato = "Hepatomegaly",
	spiders = "Spider angioma",
	edema = "Edema",
	bili = "Serum bilirunbin, mg/dl",
	chol = "Serum cholesterol, mg/dl",
	copper = "Urine copper ug/day",
	stage = "Histologic stage of disease",
	trig = "Triglycerides, mg/dl",
	albumin = "Serum albumin, g/dl",
	alk.phos = "Alkaline phosphotase, U/liter",
	ast = "Aspartate aminotransferase, U/ml",
	platelet = "Platelet count",
	protime = "Prothrombin time in seconds")
	labelled::var_label(pbc) <- var_label_list
	labelled::var_label(pbc)
	```

	Let's see the table with variable labels.
	```{r}
	print(CreateTableOne(vars = vars, strata = "trt", data = pbc), nonnormal = c("bili","chol"), smd = TRUE, test = FALSE, varLabels = TRUE)
	```

	Once binary categories look OK, we can suppress level indication.
	```{r}
	print(CreateTableOne(vars = vars, strata = "trt", data = pbc), nonnormal = c("bili","chol"), smd = TRUE, test = FALSE, varLabels = TRUE, dropEqual = TRUE)
	```


	## Export to a CSV file

	The print method is invisibly returning a matrix object. We can export this to a file. In the console, the formating
	via spaces, but we don't need them when exporting. The noSpaces option controls this aspect. If assigning the matrix is all you need, you can turn off printing by the printToggle option.

	```{r}
	tab1mat <- print(CreateTableOne(vars = vars, strata = "trt", data = pbc), nonnormal = c("bili","chol"), smd = TRUE, test = FALSE, varLabels = TRUE, dropEqual = TRUE, noSpaces = TRUE, printToggle = FALSE)
	```

	Now this is just a matrix of text.
	```{r}
	tab1mat
	```

	You can write to a CSV file easily.
	```{r}
	write.csv(tab1mat, file = "./tab1.csv")
	```

	--------------------
	- Top Page: http://rpubs.com/kaz_yos/
	- Github: https://github.com/kaz-yos
	- Twitter: https://twitter.com/kaz_yos