Skip to content

Instantly share code, notes, and snippets.

@rocketnia
Created November 18, 2019 07:53
Show Gist options
  • Save rocketnia/e4091dc9b6bfe4790501e3062a3fbb62 to your computer and use it in GitHub Desktop.
Save rocketnia/e4091dc9b6bfe4790501e3062a3fbb62 to your computer and use it in GitHub Desktop.
A benchmark comparing Rebellion's `sorting` to a lazy merge sort. See https://github.com/jackfirth/rebellion/issues/301 for context.
#lang racket/base
(require racket/contract/base)
(provide
(contract-out
[in-merge-sorted (-> (-> any/c any/c boolean?) stream? stream?)]))
(require racket/match
racket/promise
racket/stream)
;@------------------------------------------------------------------------------
;; Returns a nonempty stream that lazily forces the given promise. The promise
;; must return a nonempty stream.
(define (nonempty-stream-promise->nonempty-stream nonempty-stream-promise)
(stream-cons
(stream-first (force nonempty-stream-promise))
(stream-rest (force nonempty-stream-promise))))
;; Returns a stream that interleaves the elements of two given streams. If the
;; given comparison procedure behaves similarly to `<=` and the given streams
;; are already sorted according to that comparison procedure, then the result
;; will be sorted as well, and the sort will be stable: Elements from the first
;; stream will occur before elements from the second stream.
;;
;; The result stream is returned in `O(1)` time to begin with. The first time
;; the stream is forced with `stream-first` or `stream-rest`, it will force up
;; to two elements of the input streams, make up to one comparison, and other
;; than that, return in constant time. The first time each nonempty tail of the
;; stream is forced with `stream-first` or `stream-rest`, it will force up to
;; one element of the input streams, make up to one comparison, and other than
;; that, return in constant time.
;;
(define (in-merge-step <=? as bs)
;; If either input stream is empty, merging them is trivial.
(cond
[(stream-empty? as) bs]
[(stream-empty? bs) as]
[else
;; Otherwise, we return a stream that lazily forces the first elements of
;; the two input streams, compares them to determine which one is the
;; first element of the result, and uses recursion to merge the remaining
;; elements.
(nonempty-stream-promise->nonempty-stream
(delay
(let ([a (stream-first as)] [b (stream-first bs)])
(if (<=? a b)
(stream-cons a (in-merge-step <=? (stream-rest as) bs))
(stream-cons b (in-merge-step <=? as (stream-rest bs)))))))]))
;; Given a nonempty list and a two-argument merge function, this zig-zags
;; through the list, merging the elements pairwise. On the first pass, this
;; merges the first and second elements, the third and fourth elements, etc. If
;; there's a leftover element at the end, the first pass doesn't modify it. On
;; the second pass, this merges the second to last element with the last, the
;; fourth to last element with the third to last, etc. If there's a leftover
;; elementat the beginning, the second pass doesn't modify it. The third pass is
;; the same as the first. At some point there will be only one element left
;; over, and that element will be the return value.
;;
;; As a result of this approach, when this is called with `(N + 1)` elements, it
;; takes `O(N)` time and calls the merge function `N` times along the way. Those
;; `N` calls depend on each other in a balanced binary tree shape.
;;
;; The merging is stable: The order this passes arguments to the merge function
;; always reflects the order the values' ancestors appeared in the original
;; list.
;;
(define (merge-back-and-forth elems merge)
(let loop ([elems elems]
[merge merge]
[rev-elems (list)]
[rev-merge (λ (a b) (merge b a))])
(match elems
;; If there are two or more unprocessed elements left in the current pass,
;; we merge them. We put the result in `rev-elems`, the collection of
;; values that will be merged again on the next pass.
[(list* a b elems)
(loop elems merge (cons (merge a b) rev-elems) rev-merge)]
[_
(match rev-elems
;; If there's only one element left altogether, we return it. Note that
;; we never have fewer than one element left, and when we do, it's in
;; `elems`, not in `rev-elems`.
[(list)
(match elems [(list elem) elem])]
;; If there are fewer than two unprocessed elements left in the current
;; pass but there are still one or more elements that need to be
;; processed in the next pass, we add this element to those, and we
;; flip our perspective: What we called `rev-elems` and `rev-merge`, we
;; now call `elems` and `merge`, and vice versa. This begins the next
;; pass, going the other way through the list.
[_ (loop (append elems rev-elems) rev-merge (list) merge)])])))
;; Given a list of `N` instances of the value `#f` and a promise of a stream,
;; returns a stream that iterates through the first `N` elements of the given
;; stream. Forcing the result stream using `stream-rest` doesn't force any tails
;; of the input stream; the tails of the input are only forced when the elements
;; of the result stream are forced.
;;
;; If the given stream has fewer than `N` elements, the result stream will still
;; have a length of `N`, but forcing elements that don't exist in the original
;; stream will cause an error.
;;
(define (stream-adopt-length n xs-promise)
(match n
[(list) empty-stream]
[(cons #f n)
(stream-cons (stream-first (force xs-promise))
(stream-adopt-length n (delay (stream-rest (force xs-promise)))))]))
;; Returns a stream that iterates through the elements of the given stream in
;; some order. If the given comparison procedure behaves similarly to `<=`, the
;; resulting stream will be a stably sorted permutation of the original stream.
;;
;; (NOTE: The following computational complexity claims are pretty tricky and
;; might be incorrect.)
;;
;; This returns the result stream in `O(1)` time to begin with. The first time
;; the result stream is forced with `stream-first` or `stream-rest`, it will
;; force the entire input stream and take `O(N)` time aside from that, where `N`
;; is the input stream's length. Forcing the other tails of the result stream
;; will take `O(1)` time. Forcing the first element will make `(N - 1)`
;; comparisons. Forcing element `(K + 1)` will also force element `K`, and on
;; top of that cost, it will take up to `O(log N)` additional time and make up
;; to `(ceil(log_2 N) - 1)` additional comparisons. As `K` increases, the actual
;; amount of time and number of comparisons taken will tend to decrease.
;;
;; As a result of those properties, for nonzero `K`, using `in-merge-sorted` to
;; find the first `K` elements of a stream of length `N` will take up to
;; `O(max(N, K * log N))` time and make no more than
;; `(N - 1 + (K - 1) * (ceil(log_2 N) - 1))` comparisons, which simplifies to
;; `(N - K + (K - 1) * ceil(log_2 N))`. The actual bound is a bit tighter,
;; presumably closer to the *sorting numbers* as K approaches N.
;;
;; NOTE: We have some options for what computational complexity to expect of the
;; `stream-rest` operations:
;;
;; * On the first `stream-rest`, compute the entire length of the input in
;; `O(N)` time. Use that to determine the rest of the tails' `stream-rest`
;; operations in constant time and without memory leaks. (This is the one
;; we've chosen. This is cheaper than what we'd have to settle for if Racket
;; streams caused the head of a stream to be forced whenever the tail was.)
;;
;; * Compute every `stream-rest` in constant time, but use a reference to the
;; original stream to detect when we're at the end. While this is very cheap
;; in terms of time, it means the tails of the original stream might be
;; retained longer than necessary by the garbage collector.
;;
;; Using a combination of `delay/sync`, `wrap-evt`, `choice-evt`, and
;; `delay/idle`, we might be able to get the best of both worlds, at first
;; retaining the tails of the original stream, but then leaving those behind as
;; soon as the length is computed some other way. However, this approach would
;; make use of short-lived threads, which might have higher constant factor
;; costs and would likely be more difficult to maintain.
;;
(define (in-merge-sorted <=? xs)
(if (stream-empty? xs)
;; If the original stream is empty, it's already sorted.
empty-stream
;; Otherwise, we return a stream that, the first time it's forced, forces
;; the entire input stream, wraps each element in a single-element sorted
;; stream, and performs a balanced binary tree of merge operations to make
;; them all into one sorted stream.
;;
;; This expression returns in `O(N)` time, where `N` is the length of the
;; stream. It doesn't perform any comparisons. Instead, further computation
;; and comparisons are performed when parts of the resulting stream are
;; forced.
;;
(nonempty-stream-promise->nonempty-stream
(delay
(let ([xs (stream->list xs)])
(stream-adopt-length (for/list ([x (in-list xs)]) #f)
(merge-back-and-forth (for/list ([x (in-list xs)]) (stream x))
(λ (as bs) (in-merge-step <=? as bs)))))))))
#lang racket
(require "lazy-merge-sort.rkt"
math/number-theory
racket/random
rebellion/collection/entry
rebellion/collection/hash
rebellion/collection/list
rebellion/streaming/transducer
rebellion/type/record
rebellion/type/wrapper)
(define-record-type gemstone (kind weight))
(define (random-gemstone weight)
(gemstone #:kind (random-ref (set 'ruby 'sapphire 'emerald 'topaz))
#:weight (random 1 (add1 weight))))
(define (random-gems count)
(for/vector ([_ (in-range count)])
(random-gemstone count)))
(define (bottom-10/racket gems)
(transduce gems
(transducer-pipe
(batching into-list)
(append-mapping
(λ (lst) (in-list (sort lst < #:key gemstone-weight)))))
(taking 10)
#:into into-list))
(define (bottom-10/lazy-merge-sort gems)
(transduce gems
(transducer-pipe
(batching into-list)
(append-mapping
(λ (lst)
(in-merge-sorted
(λ (a b) (<= (gemstone-weight a) (gemstone-weight b)))
(in-list lst)))))
(taking 10)
#:into into-list))
(define (bottom-10/rebellion gems)
(transduce gems
(sorting #:key gemstone-weight)
(taking 10)
#:into into-list))
(define-record-type timing-data (label cpu-time real-time gc-time))
(define (measure-runtime label thunk)
(define-values (_ cpu real gc) (time-apply thunk empty-list))
(timing-data #:label label #:cpu-time cpu #:real-time real #:gc-time gc))
(define (bottom-10-benchmark input-length)
(define gems (random-gems input-length))
(set (measure-runtime "Standard racket sort (milliseconds)"
(λ () (bottom-10/racket gems)))
(measure-runtime "Lazy merge sort (milliseconds)"
(λ () (bottom-10/lazy-merge-sort gems)))
(measure-runtime "Rebellion lazy sort (milliseconds)"
(λ () (bottom-10/rebellion gems)))))
(define-record-type stats (sum count max min average))
(define (single-datum-stats x)
(stats #:sum x #:count 1 #:max x #:min x #:average x))
(define (stats+ s p)
(define sum (+ (stats-sum s) (stats-sum p)))
(define count (+ (stats-count s) (stats-count p)))
(stats #:sum sum
#:count count
#:max (max (stats-max s) (stats-max p))
#:min (min (stats-min s) (stats-min p))
#:average (/ sum count)))
(define (run-benchmark benchmark #:size size #:iterations iterations)
(define timing-stats-by-label
(transduce (make-list iterations size)
(append-mapping benchmark)
(bisecting timing-data-label timing-data-cpu-time)
(mapping-values single-datum-stats)
#:into (combine-into-hash stats+)))
(transduce (in-hash-pairs timing-stats-by-label)
(bisecting car cdr)
#:into into-hash))
(module+ main
(require profile)
(match (current-command-line-arguments)
[(vector "profile")
(profile-thunk
(lambda ()
(void (bottom-10/rebellion (random-gems 1000))))
#:repeat 1000
#:order 'self)]
[(vector "benchmark")
(run-benchmark bottom-10-benchmark
#:size 1000
#:iterations 1000)]))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment