-
-
Save palladin/c34a3a761f755b57244c to your computer and use it in GitHub Desktop.
#time | |
#r "bin/Release/Streams.Core.dll" | |
open Nessos.Streams.Core | |
let rnd = new System.Random() | |
let data = [|1..10000000|] |> Array.map (fun _ -> int64 <| rnd.Next(1000000)) | |
#r "../../packages/FSharp.Collections.ParallelSeq.1.0/lib/net40/FSharp.Collections.ParallelSeq.dll" | |
open FSharp.Collections.ParallelSeq | |
// Real: 00:00:03.095, CPU: 00:00:14.679, GC gen0: 3, gen1: 2, gen2: 2 | |
data | |
|> PSeq.map (fun x -> x + 1L) | |
|> PSeq.sortBy id | |
|> PSeq.toArray | |
// Real: 00:00:00.917, CPU: 00:00:04.321, GC gen0: 1, gen1: 0, gen2: 0 | |
data | |
|> ParStream.ofArray | |
|> ParStream.map (fun x -> x + 1L) | |
|> ParStream.sortBy id |
In Streams the order of operations follows the pattern
source/generator |> lazy |> lazy |> lazy |> eager/reduce
which means that you can have arbitary number of lazy operations in any order but the composition must start with source/generator and end with eager/reduce. The performance so far is great and for more info check our github repo https://github.com/nessos/Streams.
What is it that makes the Stream operations so fast? Is it because they are inlined? Seq also "streams" its elements through its operations. That's why I am surprised that there is such a difference. And does the size of elements matter? Have you benchmarked them with custom structs and records as elements?
Yes inline is a partial answer and also the composition of the lazy steps is more lightweight. Pls try it yourself and share with us your perf results. If you have time check also this "F# microbenchmark study" http://codedivine.org/2014/07/21/f-microbenchmark-study/
Could you please elaborate on what "more lightweight" composition of the lazy steps means? I'll definitely benchmark this later. Your Stream API looks really interesting! I also just checked out https://speakerdeck.com/biboudis/clash-of-the-lambdas. It would have been interesting if you included this optimization for F# in that benchmark . "Seq (optimized)" and "Par (optimized)" were refering ScalaBlitz and LinqOptimizer, respectively, right? Thanks!
I'll try to elaborate a little bit more. In Streams the map function is implemented like
let inline map (f : 'T -> 'R) (stream : Stream<'T>) : Stream<'R> =
let (Stream streamf) = stream
Stream (fun iterf -> streamf (fun value -> iterf (f value)))
It easy to see that the essence of Streams is the "CPS composition of lazy operations".
It is more lightweight because you pay only one virtual call per element per operation.
Check this https://github.com/nessos/Streams/blob/master/src/Streams.Core/Streams.fs
for more operations. Maybe another way to understand it is that in streams we compose lazy operations for internal iteration and in LINQ/IEnumerable/IEnumerator is based on external iteration.
Yes "optimized" in cotl is about ScalaBlitz and LinqOptimizer.
Ah, I see. Thanks for the detailed explanations!
Btw, it might be helpful to add that explanation to the readme. First time I discovered your Streams API and read "inspired by Java 8 Streams" I didn't understand what makes Java 8 Streams special since it I thought that Java 8 Streams were just the Java version of Seq/IEnumerable+LINQ.
Great suggestion, thnx for the feedback!
Great work! Does the order of operations matter?