# Fast sparse, exact KNN Manhattan Distance Given a 1xN (row) vector and a MxN matrix, the code below calculates the manhattan distance between the row vector and each row in the matrix. It uses Numba for significant speadup. Currenlty I'm using this to calculate distances between a 1 x 60K row vector and a 64k x 60x matrix (with about 500k non-empty values) in ~ 10ms on a single processor ## Example usage: K nearest neightbors ```py import numpy as np import sparse_manhattan_distance from sparse_manhattan_distance def sparse_manhattan_KNN(mat,vec,N,sorted=True): """ @param mat: A sparse M x N matrix @param vec: A sparse N vector to be compared to each row of `mat` @param N: The number of nearest neighbors to @param sorted: If `True`, sort the row indexes before returing @return a vector of row indices of the nearest N rows in `mat` to `vec` """ # cant return more observations than data... N = min(N, mat.data.shape[0]) distances = sparse_manhattan_distance(mat, vec) # note that argpartition and then sorting the top N indexes is *much* # faster than sorting the full distance vector and then choosing the top N nearest_N_rows = np.argpartition(distances, -N)[-N:] if sorted: return nearest_N_rows[np.argsort(-distances[nearest_N_rows])] return nearest_N_rows ```