-
-
Save nathanmarz/9d8805d5e7d9c689cd629bd0551e84f1 to your computer and use it in GitHub Desktop.
Benchmark code: https://github.com/nathanmarz/specter/blob/master/scripts/benchmarks.clj#L176 | |
---------------------------------------------------------------------------------------------- | |
Benchmark: first value of a size 10 vector (10000000 iterations) | |
Avg(ms) vs best Code | |
261.98 1.00 (select-any FIRST data) | |
277.11 1.06 (select-any ALL data) | |
337.43 1.29 (select-first ALL data) | |
561.89 2.14 (first data) |
You can get the first element of a vector with optimal performance with (nth v 0)
, but first
is the higher-level way of expressing that. Saying use a different data structure if you care about first
is strange to me. This isn't like prepending an element where a list is clearly the superior data structure (I think that should be supported too but that's a totally separate discussion where we have major philosophical differences). This is an operation which the data structure is perfectly capable of doing extremely fast, as (nth v 0)
shows.
I also question whether or not there's even a tradeoff here if implemented carefully. If first()
is factored out of ISeq
into its own interface, why can't vectors implement that? Are there concerns about inline caches becoming megamorphic?
@ztellman: vectors are persistent collections and are always fully realized, so having it "stay in memory" is naturally part of the deal. I'm not arguing with you that first
could be always O(1) on vectors. The point I made above is that the sequence abstraction concerns both first
and rest
and the latter is the one that seems harder to me.
@nathanmarz the last suggestion is interesting. would need some criteria and tests to evaluate more.
@puredanger Thanks for the explanation.
Re: prioritization, understood. As far as the performance, I haven't looked at this code in a bit, but looks like the seq implementation is just using offsets to
nth
: https://github.com/clojure/clojure/blob/master/src/jvm/clojure/lang/APersistentVector.java#L464. This means thatfirst
is always O(log n), and the entire collection stays in memory as you iterate over it.In the alternate implementation I linked to,
first
is always O(1), since it peels off a 32-wide buffer from the front of the collection whenever there's an underflow. I acknowledge this doesn't make a big difference most of the time, but I spent a bunch of time weighing different approaches so I'm going to take this opportunity to be pedantic.