Skip to content

Instantly share code, notes, and snippets.

@bobthecat
Created September 10, 2013 13:22
Show Gist options
  • Select an option

  • Save bobthecat/6509321 to your computer and use it in GitHub Desktop.

Select an option

Save bobthecat/6509321 to your computer and use it in GitHub Desktop.
Rcpp implementation of the Dice coefficient
---
title: Dice coefficient with RcppEigen
author: David Ruau
license: GPL (>= 2)
tags: Rcpp RcppEigen
summary: Compute the Dice coefficient (1945) between column of a matrix.
---
The Dice coefficient is a simple measure of similarity / dissimilarity (depending how you take it). It is intended to compare asymmetric binary vectors, meaning one of the combination (usually 0-0) is not important and agreement (1-1 pairs) have more weight than disagreement (1-0 or 0-1 pairs). Imagine the following contingency table:
```{}
1 0
1 a b
0 c d
```
The Dice coef is: `(2*a) / (2*a +b + c)`
The implementation of the Dice coefficient in the package `arules` relied to the algorithm from Leisch (2005) using the `tcrossproduct()` function in R. It was still too slow for me. So I wrote a Rcpp version of crossprod that is 2-3 time faster. This is based on the example code in the `RcppEigen` vignette.
```{}
library(Rcpp)
library(RcppEigen)
library(inline)
crossprodCpp <- '
using Eigen::Map;
using Eigen::MatrixXi;
using Eigen::Lower;
const Map<MatrixXi> A(as<Map<MatrixXi> >(AA));
const int m(A.rows()), n(A.cols());
MatrixXi AtA(MatrixXi(n, n).setZero().selfadjointView<Lower>().rankUpdate(A.adjoint()));
return wrap(AtA);
'
fcprd <- cxxfunction(signature(AA = "matrix"), crossprodCpp, "RcppEigen")
```
Using the following wrapper function in R we compute the Dice coefficient.
```{r}
diceR <- function(X){
a <- fcprd(X)
nx <- ncol(X)
rsx <- colSums(X)
c <- matrix(rsx, nrow = nx, ncol = nx) - a
b <- t(c)
m <- (2 * a) / (2*a + b + c)
return(m)
}
```
Let's do a little benchmark.
This new function is roughly 3 time faster than the one in arules.
```
library(microbenchmark)
library(arules)
# test matrix
x <- rbinom(1:200000, 1, 0.5)
X <- matrix(x, nrow = 200, ncol = 1000)
m <- microbenchmark(diceR(X), dissimilarity(t(X), method="dice"), times=100)
m
Unit: milliseconds
expr min lq median uq max neval
diceR(X) 77.12359 163.8737 168.0566 171.4417 185.0053 100
dissimilarity(t(X), method = "dice") 275.06503 365.3696 372.0783 453.1885 1118.4639 100
```
This implementation above use the `inline` library. Below is the pure C++ function. This is the form you want if you want to put this code in a R package as a source file.
```
#include <Rcpp.h>
#include <RcppEigen.h>
using namespace Rcpp;
SEXP crossprodCpp(SEXP binaryMat){
using Eigen::Map;
using Eigen::MatrixXi;
using Eigen::Lower;
const Map<MatrixXi> A(as<Map<MatrixXi> >(binaryMat));
const int m(A.rows()), n(A.cols());
MatrixXi AtA(MatrixXi(n, n).setZero().selfadjointView<Lower>().rankUpdate(A.adjoint()));
return wrap(AtA);
}
```
@eddelbuettel

Copy link
Copy Markdown

Nice. I think we need to a) standardize the tags (just 'eigen') and b) convert the code to use the newer Rcpp attributes (see the corresponding vignette). I can do that quickly if you;d want me to.

@bobthecat

Copy link
Copy Markdown
Author

Hi,
If you could do it that would be great. I have no idea about the new Rcpp attributes.

@eddelbuettel

Copy link
Copy Markdown

Will do, probably during the soccer game this evening.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment