Last active
August 1, 2016 20:07
-
-
Save Abhiroop/2a452a230f2a78ac91fa2dbd8d4046ed to your computer and use it in GitHub Desktop.
Top k elements from a vector
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Timing the performance of various ways of extracting top k elements from a vector: | |
Here sequence size is about 10000 | |
and k=300 | |
Here is a randomly generated vector containing 10K maps. Each of the maps look like this | |
{ | |
:id 123 | |
:data (0 1 2 3 ....) | |
:more-data (0 1 2 3 ....) | |
} | |
Each object weighs about 4.3 KB | |
Function for generating the vector of maps: | |
(def | |
big-objects-vec | |
(shuffle | |
(vec | |
(map | |
(fn [i] {:id i :data (range 100) :more-data (range 1000)}) | |
(range 10000))))) | |
Extracting top k elemnts from the vector can be done in 3 ways. Lets profile them: | |
1. (take 300 (sort-by :id big-objects-vec)) | |
Timing 1 iteration | |
user=> (time (take 300 (sort-by :id big-objects-vec))) | |
"Elapsed time: 11.052344 msecs" | |
Timing 1000 iterations | |
user=> (time (dotimes [i 1000] (do (take 300 (sort-by :id big-objects-vec))))) | |
"Elapsed time: 9939.033409 msecs" | |
2. | |
(defn f [coll] | |
(apply sorted-set-by (fn [x y] (< (:id x) (:id y))) coll)) | |
(take 300 (f big-objects-vec)) | |
Timing 1 iteration | |
user=> (time (take 300 (f big-objects-vec))) | |
"Elapsed time: 30.529017 msecs" | |
Timing 1000 iterations | |
user=> (time (dotimes [i 1000] (do (take 300 (f big-objects-vec))))) | |
"Elapsed time: 27963.321466 msecs" | |
3.Using heap of size k | |
(defn top-by [k f coll] | |
(reduce | |
(fn [top x] | |
(let [top (conj top x)] | |
(if (> (count top) k) | |
(disj top (first top)) | |
top))) | |
(sorted-set-by #(< (f %1) (f %2))) | |
coll)) | |
Timing 1 iteration | |
user=> (time (top-by 300 :id big-objects-vec)) | |
"Elapsed time: 27.388446 msecs" | |
Timing 1000 iterations | |
(time (dotimes [i 1000] (do (top-by 300 :id big-objects-vec)))) | |
"Elapsed time: 26540.44524 msecs" | |
4. Using lazy quick sort. Creates a variant of quick select. | |
(defn qsort [xs k] | |
(lazy-seq | |
(if (seq xs) | |
(let [pivot (rand-nth xs) | |
smaller (filterv #(> (pivot k) (% k)) xs) | |
larger (filterv #(< (pivot k) (% k)) xs) | |
same (filterv #(= (pivot k) (% k)) xs)] | |
(concat (qsort smaller k) | |
same | |
(qsort larger k)))))) | |
Timing 1 iteration | |
user=> (time (take 300 (qsort big-objects-vec :id))) | |
"Elapsed time: 0.029915 msecs" | |
Timing 1000 iterations | |
(time (dotimes [i 1000] (do (take 300 (qsort big-objects-vec :id))))) | |
"Elapsed time: 0.294812 msecs" | |
Even more... | |
Timing 100000 iterations | |
(time (dotimes [i 100000] (do (take 300 (qsort big-objects-vec :id))))) | |
"Elapsed time: 9.96925 msecs" | |
Even if you increase k to 300000 | |
(time (dotimes [i 100000] (do (take 300000 (qsort big-objects-vec :id))))) | |
"Elapsed time: 11.255201 msecs" | |
nil | |
Moral of the story: Quick select kills the other algos in my machine. Should profile this in other machines. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment