Skip to content

Instantly share code, notes, and snippets.

@umbertogriffo
Created October 26, 2016 08:05
Show Gist options
  • Save umbertogriffo/25050693923cd751105fe98443caa156 to your computer and use it in GitHub Desktop.
Save umbertogriffo/25050693923cd751105fe98443caa156 to your computer and use it in GitHub Desktop.
Utility Methods to Transpose a org.apache.spark.mllib.linalg.distributed.RowMatrix
def transposeRowMatrix(m: RowMatrix): RowMatrix = {
val transposedRowsRDD = m.rows.zipWithIndex.map{case (row, rowIndex) => rowToTransposedTriplet(row, rowIndex)}
.flatMap(x => x) // now we have triplets (newRowIndex, (newColIndex, value))
.groupByKey
.sortByKey().map(_._2) // sort rows and remove row indexes
.map(buildRow) // restore order of elements in each row and remove column indexes
new RowMatrix(transposedRowsRDD)
}
def rowToTransposedTriplet(row: Vector, rowIndex: Long): Array[(Long, (Long, Double))] = {
val indexedRow = row.toArray.zipWithIndex
indexedRow.map{case (value, colIndex) => (colIndex.toLong, (rowIndex, value))}
}
def buildRow(rowWithIndexes: Iterable[(Long, Double)]): Vector = {
val resArr = new Array[Double](rowWithIndexes.size)
rowWithIndexes.foreach{case (index, value) =>
resArr(index.toInt) = value
}
Vectors.dense(resArr)
}
@nikhilbaby
Copy link

Can you share the PySpark version for this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment