Created
September 10, 2013 13:22
-
-
Save bobthecat/6509321 to your computer and use it in GitHub Desktop.
Rcpp implementation of the Dice coefficient
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
--- | |
title: Dice coefficient with RcppEigen | |
author: David Ruau | |
license: GPL (>= 2) | |
tags: Rcpp RcppEigen | |
summary: Compute the Dice coefficient (1945) between column of a matrix. | |
--- | |
The Dice coefficient is a simple measure of similarity / dissimilarity (depending how you take it). It is intended to compare asymmetric binary vectors, meaning one of the combination (usually 0-0) is not important and agreement (1-1 pairs) have more weight than disagreement (1-0 or 0-1 pairs). Imagine the following contingency table: | |
```{} | |
1 0 | |
1 a b | |
0 c d | |
``` | |
The Dice coef is: `(2*a) / (2*a +b + c)` | |
The implementation of the Dice coefficient in the package `arules` relied to the algorithm from Leisch (2005) using the `tcrossproduct()` function in R. It was still too slow for me. So I wrote a Rcpp version of crossprod that is 2-3 time faster. This is based on the example code in the `RcppEigen` vignette. | |
```{} | |
library(Rcpp) | |
library(RcppEigen) | |
library(inline) | |
crossprodCpp <- ' | |
using Eigen::Map; | |
using Eigen::MatrixXi; | |
using Eigen::Lower; | |
const Map<MatrixXi> A(as<Map<MatrixXi> >(AA)); | |
const int m(A.rows()), n(A.cols()); | |
MatrixXi AtA(MatrixXi(n, n).setZero().selfadjointView<Lower>().rankUpdate(A.adjoint())); | |
return wrap(AtA); | |
' | |
fcprd <- cxxfunction(signature(AA = "matrix"), crossprodCpp, "RcppEigen") | |
``` | |
Using the following wrapper function in R we compute the Dice coefficient. | |
```{r} | |
diceR <- function(X){ | |
a <- fcprd(X) | |
nx <- ncol(X) | |
rsx <- colSums(X) | |
c <- matrix(rsx, nrow = nx, ncol = nx) - a | |
b <- t(c) | |
m <- (2 * a) / (2*a + b + c) | |
return(m) | |
} | |
``` | |
Let's do a little benchmark. | |
This new function is roughly 3 time faster than the one in arules. | |
``` | |
library(microbenchmark) | |
library(arules) | |
# test matrix | |
x <- rbinom(1:200000, 1, 0.5) | |
X <- matrix(x, nrow = 200, ncol = 1000) | |
m <- microbenchmark(diceR(X), dissimilarity(t(X), method="dice"), times=100) | |
m | |
Unit: milliseconds | |
expr min lq median uq max neval | |
diceR(X) 77.12359 163.8737 168.0566 171.4417 185.0053 100 | |
dissimilarity(t(X), method = "dice") 275.06503 365.3696 372.0783 453.1885 1118.4639 100 | |
``` | |
This implementation above use the `inline` library. Below is the pure C++ function. This is the form you want if you want to put this code in a R package as a source file. | |
``` | |
#include <Rcpp.h> | |
#include <RcppEigen.h> | |
using namespace Rcpp; | |
SEXP crossprodCpp(SEXP binaryMat){ | |
using Eigen::Map; | |
using Eigen::MatrixXi; | |
using Eigen::Lower; | |
const Map<MatrixXi> A(as<Map<MatrixXi> >(binaryMat)); | |
const int m(A.rows()), n(A.cols()); | |
MatrixXi AtA(MatrixXi(n, n).setZero().selfadjointView<Lower>().rankUpdate(A.adjoint())); | |
return wrap(AtA); | |
} | |
``` | |
Will do, probably during the soccer game this evening.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Hi,
If you could do it that would be great. I have no idea about the new Rcpp attributes.