Unordered means that we don't care about the order of results from the evalMap action.
It allows for higher throughput because it never waits to start a new job. It has N job permits (with a semaphore), and the moment one job finishes, it requests the next element from the stream and begins operation.
When you use parEvalMap
with ordered results, it means that it only begins the next job if the oldest input's job is ready to emit.
This matters when individual elements can take a variable amount of time to complete - and that's the case here, because backfill can take more or less time depending on how many transactions are present within the time window.
Suppose we have 4 jobs we want to run with up to 2 at a time. Job 1 takes 60 seconds to complete, and all the rest take 10 seconds.
Using parEvalMap
would mean the entire set of inputs would take ~70 seconds to complete.