Skip to content

Instantly share code, notes, and snippets.

@hadley
Created December 28, 2013 19:02
Show Gist options
  • Save hadley/8162972 to your computer and use it in GitHub Desktop.
Save hadley/8162972 to your computer and use it in GitHub Desktop.
#include <Rcpp.h>
using namespace Rcpp;
// [[Rcpp::export]]
double sdSample(NumericVector x) {
int n = x.size();
NumericVector sampled(n);
for (int i = 0; i < n; ++i) {
sampled[i] = x[rand() % n];
}
return sd(sampled);
}
// Don't store the resampled x, instead compute a running variance. This
// should be faster because it doesn't have to store the intermediate values.
// Algorithm http://en.wikipedia.org/wiki/Algorithms_for_calculating_variance#Online_algorithm
// [[Rcpp::export]]
double sdSample2(NumericVector x) {
int n = x.size();
double mean = 0, m2 = 0;
for (int i = 0; i < n; ++i) {
double y = x[rand() % n];
double delta = y - mean;
mean += delta / (i + 1);
m2 += delta * (y - mean);
}
double var = m2 / (n - 1);
return sqrt(var);
}
/*** R
options(digits = 3)
library(microbenchmark)
x <- runif(1e3)
microbenchmark(
sd(sample(x, rep = T)),
sdSample(x),
sdSample2(x)
)
# Unit: microseconds
# expr min lq median uq max neval
# sd(sample(x, rep = T)) 47.2 50.8 52.1 53.2 114.2 100
# sdSample(x) 11.8 12.7 13.6 14.4 26.2 100
# sdSample2(x) 11.6 11.9 12.1 12.7 32.0 100
*/
@hadley
Copy link
Author

hadley commented Dec 28, 2013

Note that using rand() % n is biased.

@dickoa
Copy link

dickoa commented Dec 28, 2013

You can use RcppArmadillo:sample it's faster and you'll avoid the rand() % n trick

// -*- mode:poly-C++R -*-
#include <RcppArmadilloExtensions/sample.h>
// [[Rcpp::depends(RcppArmadillo)]]
using namespace Rcpp;

//[[Rcpp::export]]
double sdSample3(NumericVector x) {
    int n = x.size();
    NumericVector sampled = RcppArmadillo::sample(x, n, true);
    return sd(sampled);
}


// [[Rcpp::export]]
double sdSample(NumericVector x) {
    int n = x.size();
    NumericVector sampled(n);
    for (int i = 0; i < n; ++i) {
    sampled[i] = x[rand() % n];
    }
    return sd(sampled);
}



/*** R
options(digits = 3)
library(microbenchmark)

x <- runif(1e3)

microbenchmark(
    sd(sample(x, rep = T)),
    sdSample(x),
    sdSample3(x)
    )
## Unit: microseconds
##                    expr  min   lq median   uq   max neval
##  sd(sample(x, rep = T)) 39.1 42.0   43.3 44.4 798.7   100
##             sdSample(x) 22.0 23.7   24.1 24.6  37.8   100
##            sdSample3(x) 14.5 16.5   17.2 17.9  23.3   100

*/

@hadley
Copy link
Author

hadley commented Dec 29, 2013

@dickoa great idea - thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment