Last active
January 24, 2017 20:50
-
-
Save beader/119049e95df37ef9814c to your computer and use it in GitHub Desktop.
Convert a dgcMatrix to libsvm format
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#' convert a dgcMatrix to libsvm format | |
#' @param sm A sparse matrix of class "dgcMatrix" | |
#' @param label label for dataset, default is 0 | |
#' @return a vector of characters containing index:value | |
#' @example | |
#' regMat <- matrix(runif(16), 4, 4) | |
#' regMat[sample(16, 5)] <- 0 | |
#' sparseMat <- Matrix(regMat, sparse = T) | |
#' conv2libsvm(sparseMat) | |
conv2libsvm <- function(sm, label = rep(0, dim(sm)[1])) { | |
stopifnot(dim(sm)[1] == length(label)) | |
tsm <- Matrix::t(sm) | |
i <- tsm@i | |
p <- tsm@p | |
x <- tsm@x | |
vapply(seq(dim(tsm)[2]), function(c) { | |
idx <- (p[c]+1):p[c+1] | |
paste(label[c], paste(i[idx], x[idx], sep = ":", collapse = " ")) | |
}, FUN.VALUE = character(1)) | |
} |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Thanks for this 👍 I think the index is off by 1 though. Here is a suggested fix:
Gives first line output of
1 3:1 10:1 11:1 21:1 30:1 34:1 36:1 40:1 41:1 53:1 58:1 65:1 69:1 77:1 86:1 88:1 92:1 95:1 102:1 105:1 117:1 124:1
which matches first line of file
xgboost-master/demo/binary_classification/agaricus.txt.train
Without the +1 we get first 3 lines
Note by the 3rd row we have zero based index, which is not consistent with R being 1 based.