Skip to content

Instantly share code, notes, and snippets.

@dKvale
Last active August 4, 2016 15:02
Show Gist options
  • Save dKvale/ad9b5a70f5a2189eae47190bea6656a6 to your computer and use it in GitHub Desktop.
Save dKvale/ad9b5a70f5a2189eae47190bea6656a6 to your computer and use it in GitHub Desktop.
---
title: "Non-detect method examples"
output:
html_document:
highlight: tango
theme: readable
toc: yes
toc_depth: 3
---
<style type="text/css">
body, td {font-size: 18px;}
code.r{font-size: 18px;}
pre {font-size: 18px}
</style>
## Data
The example data is organized by monitoring site, region, and date. Different sites and years may have different detection limits.
```{r kable, message=F, echo=F}
library(knitr)
library(dplyr)
library(readr)
df <- read_csv(
'"Site_ID","Region","Date","Concentration","Detect_Limit"
270535501,"1A","2009-07-30",0.00148,0.1
270535501,"1A","2009-09-30",0.00064,0.1
270535501,"1A","2009-11-30",0.24256,0.1
270535501,"1A","2009-12-30",0.00064,0.1
270535501,"1A","2010-01-30",0.16001,0.1
270535501,"1A","2010-03-30",0.19300,0.1
270535502,"1A","2009-03-30",0.21219,0.01
270535502,"1A","2009-07-30",0.01113,0.01
270535502,"1A","2009-09-30",0.00044,0.01
270535502,"1A","2009-11-30",0.00127,0.01
270535502,"1A","2009-12-30",0.00613,0.01
270535502,"1A","2010-01-30",0.02216,0.01
270535502,"1A","2010-03-30",0.08113,0.02')
kable(df)
```
## Methods
For air monitoring data we currently use the `EnvStats` package to generate annual summary values and confidence intervals. The package provides multiple non-detect methods that will automatically calculate confidence intervals for you. The methods below will help reduce the effects of different detection limits between sites and years.
<br>
To begin, add a column indicating if the sample was detected or not. When the concentration is less than the detection limit the _Censored_ column is set to `TRUE`.
```{r}
df$Censored <- (df$Concentration) < (df$Detect_Limit)
```
`r kable(head(df))`
</br>
Similar to _NADA_, `EnvStats` will assume a detection limit is located at the value of the lowest detected concentration. To force the summary to show the correct detection limits you can substitute the detection limit in for the non-detected values.
```{r}
# Save the original mean of the raw values
raw_mean <- mean(df$Concentration, na.rm=T)
# Replace non-detects with the detection limit
df$Concentration <- ifelse(df$Censored, df$Detect_Limit, df$Concentration)
```
</br>
Next we can use `EnvStats` to estimate the mean for the entire region. Because there are multiple detection limits the potential methods we can use are more limited. For multiple detection limits with limited amounts of detections (less than 10), Helsel recommends using Kaplan-Meier. Here's an example of running Kaplan-Meier in `EnvStats` using the function `enparCensored()`.
### Kaplan-Meier
```{r message=F, warning=F}
library(EnvStats)
# Kaplan-Meier
# summary for all sites and dates
km_summary <- enparCensored(df$Concentration,
df$Censored,
ci=T,
ci.method = "bootstrap",
n.bootstraps = 3000)
print(km_summary)
```
<br>
To pull out the `mean` value from the results use: `$parameters[[1]]`
``` {r, message=F, warning=F}
km_mean <- enparCensored(df$Concentration, df$Censored)$parameters[[1]]
paste("The Kaplan-Meier mean is", round(km_mean, 3))
```
<br>
### NADA
Compare with the `NADA` package
``` {r, message=F, warning=F}
library(NADA)
Nada_KM_Mean <- cenfit(df$Concentration, df$Censored)
print(Nada_KM_Mean)
```
<br>
### ROS and MLE
When you have larger data sets there are additional methods such as ROS and MLE, which use the the observed distribution of the detected values and applies it to predict the location of non-detect values. These methods usually generate less conservative (lower estimates of the mean), but in most cases are a closer approximation to reality. Below are 2 examples of using the ROS and MLE methods in the `EnvStats` package.
```{r warning=F, message=F}
library(EnvStats)
# ROS
# Summary across all sites and dates
ros_summary <- enormCensored(df$Concentration,
df$Censored,
method = "rROS",
ci = F,
lb.impute = min(df$Detect_Limit) * 0.10,
ub.impute = max(df$Detect_Limit))
# The lower bound imputed limt (lb.impute) sets the minimum assigned value for non-detects. If you know the background concentration for a body of water that could be a good minimum value. The example above uses 10% of the minimum detection limit for the lower bound.
# The upper bound imputed limt (ub.impute) sets the maximum assigned value for non-detects. It's set at the highest detection limit above.
print(ros_summary)
```
<br>
```{r warning=F, message=F}
# MLE
#summary across all sites and dates
mle_summary <- enormCensored(df$Concentration,
df$Censored,
method = "mle",
ci = F)
print(mle_summary)
```
<br>
### 1/2 the DL
Compare to substitution of 1/2 the detection limt
``` {r, message=F, warning=F}
sub_half <- df
sub_half$Concentration <- ifelse(df$Censored,
df$Detect_Limit * 0.5,
df$Concentration)
sub_half_DL_Mean <- mean(sub_half$Concentration)
print(sub_half_DL_Mean)
```
<br>
### Random substitution
Compare to random substitution between 0 and the detection limit
``` {r, message=F, warning=F}
set.seed(8*3*2016)
sub_random <- df
sub_random$Concentration <- ifelse(df$Censored,
runif(1, 0, df$Detect_Limit),
df$Concentration)
sub_random_Mean <- mean(sub_random$Concentration)
print(sub_random_Mean)
```
<br>
### Single detection limit
When your data has a single detection limit, as for air monitoring data from a single year, annual statistics can be generated using the imputed MLE method below.
```{r message=F, warning=F, results="hide"}
# Imputed MLE
#summary across a single site with one detection limit
site1 <- filter(df, Site_ID == 270535501)
mle_summary <- enormCensored(site1$Concentration,
site1$Censored,
method = "impute.w.mle",
lb.impute = min(site1$Detect_Limit) * 0.10,
ci = T,
ci.method = "bootstrap")
#print(mle_summary)
```
<br>
## Result Summary
Below is a summary table of the non-detect methods used for the region overall. For comparison I've also included the mean of the original values before censoring, and the mean when we replace all of the non-detects with the detection limit.
```{r}
library(dplyr)
summary <- df %>%
summarize('Raw values' = raw_mean,
'Kap-M' = enparCensored(Concentration, Censored)$parameters[[1]],
'ROS' = enormCensored(Concentration,
Censored,
method = "rROS",
lb.impute = min(df$Detect_Limit)*0.10,
ub.impute = max(df$Detect_Limit))$parameters[[1]],
'MLE' = enormCensored(Concentration,
Censored,
method = "mle")$parameters[[1]],
'Sub 1/2 DL' = sub_half_DL_Mean,
'Sub random' = sub_random_Mean,
'Sub DL' = mean(df$Concentration)
)
knitr::kable(summary, digits = 3)
```
<br>
## References
DENNIS HELSEL - http://annhyg.oxfordjournals.org/content/54/3/257.full
RONALD C. ANTWEILER - http://pubs.acs.org/doi/pdf/10.1021/es071301c
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment