Skip to content

Instantly share code, notes, and snippets.

@dotemacs
Forked from philandstuff/clojurex-2014.org
Created June 30, 2017 21:39
Show Gist options
  • Save dotemacs/a14360a0a4ed827f3fd07eea91181d23 to your computer and use it in GitHub Desktop.
Save dotemacs/a14360a0a4ed827f3fd07eea91181d23 to your computer and use it in GitHub Desktop.
Clojure eXchange 2014

Korny Sietsma, Pragmatic Performance Testing

quotes

  • “we don’t care about perf”
  • “clojure should be fast”
  • “we’re using mongo!”
  • “clojure might not be fast enough, we should use type hints”

JMeter

classic perf testing approach:

  • build it, then test it

monitor all the things

  • instrument with graphite
(GET "/slow" [] (t/time! (t/timer "slow-page")
                         (do (Thread/sleep (+ 100 (rand-int 100))) "oy!")))
  • “maybe we could use a macro” – no!
  • madness. use bidi!

robert/hooke

  • use robert/hooke to instrument anything
(add-hook #'function #'hook)

laziness messes with measurement

  • (postwalk identity r) to disable laziness recursively

outcomes

  • fix 2: don’t iterate over slow things
    • remove Schema
    • just return fields we need
    • or maybe paginate
      • don’t do this prematurely!
  • fix 3: tags rarely change - we can memoize
    • (what about cache invalidation?)
  • fix 4: queries without indices
  • mongoDB handy setting:
// tablescans are now banned!
db.getSiblingDB("admin").runCommand(
    { setParameter: 1, notablescan: 1}
);
  • fix 5: unrolled queries
    • temptation:
(->> (all-tumblrs db)
     (map (partial all-posts-for-tumble db))
     (map (partial all-pics-for-post db)))
  • slow in code, fast in SQL

digging deeper - profiling

  • can roll your own profiler
  • JVisualVM - free, but limited
  • Yourkit - free for big OS projects, or 15-day demo
  • Riemann?
  • fix 6: RTFM
    • stencil/render-string not memoized
    • use stencil/render-file instead
  • fix 7: more dakka
  • fix 8: async work queues
  • fix 10: tweak clojure
    • “clojure high perf programming” by Shantanu Kumar

browser speed

  • whole nothing issue. esp mobile browsers
  • hard to test at scale

testing options

  • scale “from cheap, fast feedback, flexible” to “accurate, slow, rigid”
  • don’t need to go full sophisticated immediately!
  • “performance test pyramid” by analogy with test pyramid

Q&A

  • simulant
    • don’t know, sounds like a good idea
  • is memoization a harder problem?
    • memoization is a cheat
      • the first person still gets horrible perf
      • if things change, the req which gets the change gets horrible perf
  • aren’t most of your problems RTFM? aren’t you solving a cultural problem with tech?
    • we shouldn’t have needed JMeter to find stencil slowness
    • reading manuals goes out of the window under pressure
  • what do you do when you hit a JVM limit? eg integer parsing <JDK8 had a global lock
    • throw up your hands and say screw it
    • it’s hard to estimate how much time you should put into perf testing

David Pollak, Dragonmark

intro

  • distributed core.async and associated libraries

-

naming things

objects

  • have named “slots”
    • code or data
    • no method/field distinction
  • q: how do you send an “object” from Ponyville to Equestria?

yay clojure

  • separation of data and schema and behaviour
  • immutable data structures: not cyclical
  • transit: richer than json, cross-platform serialization

async

  • nothing is synchronous
    • but we fake it
    • multiple cores and NUMA
  • but
    • we humans gloss over it in day-to-day life
    • in computer systems, we have to think of everything as asynchronous

actors and channels

  • akka: Any => Unit
    • ie take any value, return no response
    • “I’m going to the pub”
  • X => LAFuture[Y]
    • “bruce, I’d like a hug”
    • timeouts
  • Channels > actors
    • because:
      • backpressure (paging @RobAshton)
      • separation of concerns
        • fanout
      • actors can be modelled with channels (not supervisors though), but not vice versa

linearization

  • script:
    • Spike: find the book
    • Pinky: find Applejack
    • S&P: Look for gems
  • when S&P are done, Twilight does stuff with gems, applejack, and the book aka the gofor macro
  • gofor: for comprehension, go-style
(gofor
 [a (foo channel {:thing val})]
 :let [b (inc a)]
 [c (baz other-chan)
  :timeout 45000]
 (println a b c)
 :error (println &err &at))

sends a message to a channel

simultaneous dispatch

documentation

  • you can query services for documentation

dragonmark

lift comet rant

  • lift is the best thing for server push around
    • and has been for 8 yeras
    • currently http long poll
    • actors server & client side
      • multiple actors but only one http connection
    • developers don’t have to worry about guts
      • retries, multiplexing, etc

plumbing

  • the best plumbing “just works”
  • focus on taking a shower, rather than source or destination of water
  • pipes accessible but not in your face
    • so you can fix them when you need to
    • or call a plumber
  • eg Lift Comet
    • parameters for retry backoff tuning
    • how to surface errors, in browser and in server

why dragonmark?

  • isolate logic from transport for testing
    • separate thinking about domain logic from thinking about REST calls
    • fewer developer context switches
  • faster dev cycles
  • discover services, send messages

herding cattle at mixradio, @neilprosser

intro

  • @noahcampbell - “treat your servers like cattle, not pets”
  • REFful µservices
  • github.com/mixradio/mr-clojure
  • legacy java, php
  • riak, rabbit, elasticsearch

problems

  • continuous delivery since 2010
  • snowflake servers
    • configuration drift
    • variation of size, spec, versions
    • slow provisioning (2 week lead time)
    • slow deployment
  • configuration confusion
    • database issues & rollbacks
      • no audit, so hard to know
  • escaped own tin dc, migrated to aws

environments

  • dev account, prod

services

  • naming scheme: mr-*

mr-klink

  • command line swiss army knife
  • written in go
    • JVM dependency would suck :(
    • also considered node & python (but rejected)
    • go is cross-platform
  • uses our RESTful services
  • create app, find images, deploy, list boxes

faraday

  • github.com/ptaoussanis/faraday

mr-baker

  • bakes machine images

phoenix servers

  • throw away old servers and deploy new ones to upgrade
  • no upgrades
    • handled by new bake
    • puppet for a few important things

process

  • amazon linux base image
  • mixradio base image
  • service image | ad-hoc image (testing)
  • shells out to packer
  • uses Raynes/conch

mr-tyrant

  • per-environment configuration
  • mixradio/mr-tyrant
  • backed by github for config storage
  • app is a readonly frontend
  • Raynes/tentacles
  • app properties
    • db conn strings
    • port numbers
  • deployment parameters
    • capacity, security groups
  • launch data
    • ie userdata

mr-pedantic

  • keeping infrastructure in sync
  • mr-pedantic
  • “puppet for cloud infrastructure”
    • configuration exists
    • gets compared with running env
    • running env gets corrected to match config
    • idempotent
  • lets us roll out in another region
  • backed by github for config storage
  • clojure for configuration
    • clojail to avoid stupidity

mr-maestro

  • deployment orchestration
  • mr-maestro
  • red/black deployment
    • bring up new version
    • healthcheck
    • add to load balancer pool (but keep old version)
    • when happy, remove old version
    • (if unhappy, can always flip back to old version)
  • asgard was good for finding what we wanted
  • maestro can deploy itself
    • the new servers will receive the messages which will kill the old versions

Nikita Prokopov, DataScript for web dev

datascript is a post-modern db

  • someone else’s good idea redone on a more primitive level
  • datalog queryable in-memory database
  • triple store:
    • <entity, attribute, value>
  • database as an immutable value
  • completely in-memory
  • written in cljs, js bindings available

how does it work?

(defn create-conn [schema]
  (atom (empty-db schema)
        :meta {:listeners (atom {})}))

(defrecord DB
    [schema
     eavt aevt avet ; three indices
     max-eid max-t])

(defn with [db datoms]
  (-> db
      (update-in [:eavt into datoms])
      (update-in [:aevt into datoms])))

(defn transact [conn datoms]
  (swap! ;...
   ))
  • impl: BTSet: B+ tree (sorted set)
  • perf comparable with sorted-set:
    • slower conj, but faster iterate
  • binary search lookups
  • fast range scans
  • reverse iteration
  • fast first-time creation
  • clojurescript, but heavy use of js arrays and APIs
  • datascript is lightweight
    • 700 loc btset
    • 550 loc query engine
    • 1700 loc total
    • 1200 loc tests
  • unfortunate to be associated with the word “database”
    • no networking
    • query over memory

how to use it

  • every app has an ad-hoc state
  • put everything in a database instead
  • non-trivial SPA has complex state
  • KV stores do not cut it
  • no inherent hierarchy
    • natural for any data: sparse, irregular, hierarchical, graph
  • faster data retrieval from big datasets
  • server sync
  • undo/redo
  • local caching
  • audit

example architecture

  • datascript + react
  • db update causes full re-render, top-down
    • immutability makes it fast

cleanup

  • datomic never removes anything
  • datascript will clean up old values, to ensure storage is bound

example

  • acha-acha.co

whole db prefetch:

  • no server fetch on any navigation, all queries and aggregations happen on client

project status

  • considered alpha

Ephemeral-first data structures, Michał Marczyk

  • ephemeral? transient?
  • persistent-first, but not always?

basic concepts

  • Persistent data structures
    • immutable
    • permit update-like operations
      • with at-most logarithmic slowdown when compared with mutable counterpart
    • “fully persistent data structures”
      • full branching history remains available
  • transients
    • mutable, but share structure with persistent data structures
    • enforce (<1.7) / demand (>= 1.7?) thread isolation
      • change in 1.7 removes the enforcement
      • still requires a single thread to “own”
      • can hand-off via safe publication, so long as single ownership remains
    • can be frozen to create a persistent DS
  • ephemeral data structures: another term for mutable data structures
  • implementation strategies
    • trees with path copying for efficient PDS ‘updates’
    • a notion of ownership for subtrees to support transients

brief history

  • Bagwell (2000), Ideal Hash Trees
    • mutable Hash Array Mapped Tries
      • contribution: traditional hash tables have an occasional expensive resizing operation
      • the unlucky insertion has to wait while the whole hash table is copied
      • HAMTs avoids this by using a tree structure
      • demonstrate comparable performance in best case, while avoiding whole-table copy resize ops
  • Hickey (2007), Clojure
    • persistent hash maps based on HAMTs with path copying
    • also vectors based on a modification of the idea behind HAMTs
  • Hickey (2009), Clojure
    • transient vectors
    • transient maps added later by @cgrand
    • c.c/transient originally called mutable, persistent! - immutable!
      • thread isolation introduced later
  • Prokopec, Bronson, Bagwell, Odesky (2011)
    • Cache-Aware Lock-Free Concurrent Hash Tries
    • Concurrent Tries with Efficient Non-Blocking Snapshots
    • Ctries - lock-free mutable HAMTs with O(1) snapshots
    • the snapshots are first-class Ctries independent of originals

closer look at transients

  • key notion: subtree ownership
  • distinguished central location stores an ‘edit’ marker
    • an AtomicReference
  • and so do individual nodes
  • the transient instance itself owns nodes with its edit marker
  • other nodes might be shared
  • updates check edit markers, mutate own nodes in place, copy paths otherwise
  • new paths get new edit markers, subsequent updates will be in-place
  • persistent! invalidates the transient by resetting its edit marker to nil

persistent-first approach in a mutable settings

  • data.avl vs JDK’s NavigableMaps
    • NavigableMap: nearest neighbour queries and subsetting
      • two impls: java util TreeMap & java util SkipListMap
    • JDK impls return subcollections as views
    • modifications to the original reflected in the view (and vice versa)
    • cannot add items to views outside original bounds
    • inappropriate when the subcoll is to be passed on for arbitrary use
  • however, BSTs (including red-black and AVL trees) support join/split/range
  • join: merge/concat for pairs of trees with an ordering precondition
  • split: produce subcolls including keys < and > a given key
  • range: extract subcoll bounded by limits
  • all in logarithmic time
  • in mutable setting, these modify original tree
  • intention of NavigableMap is different
    • that it leaves original untouched
    • TreeMap is forced into view-based impl
  • AVL doesn’t have this problem
    • return new first-class NavigableMaps, leave original untouched
    • better than Java’s own NavigableMaps in this respect
  • …unless we add transients
    • a transient can be stored in a j.u.TreeMap-workalike wrapper
    • an independent snapshot can be produced by invalidating the transient
    • the original wrapper is free to install a new edit marker immediately
    • the snapshot can be subsetted to produce first-class subcolls
    • these can be safely returned
    • cost: must store edit markers, a little slower for updates
    • no benefit for SortedMap, only NavigableMap

ctries - overview

  • map data structure with support for concurrent updates
  • comparable to regular HAMTs in perf of HAMT ops
  • snapshotable in O(1) time with little degradation to perf of other ops
  • snapshots are completely independent first-class ctries
  • now impl in pure clojure - ctries.clj (not on github yet)

internals

  • ctries: internally similar to clj PersitentHashMap
    • though different range of node types
  • key structure difference:
    • indirection nodes
    • ensure linearizability of history at a slight perf cost
  • CAS-based in-place updates to tree structure
  • “generation” marker used to determine subtree ownership
    • with transients, establishing ownership happens at every level of the tree
    • with ctries, can skip levels
    • would benefit from a generalized CAS operation
      • GCAS: original contribution of second ctries paper
      • RDCSS: Harris, Fraser, Pratt (2002), A Practical Multi-Word Compare-and-Swap Operation
      • based on descriptor objects stored at the modified location
      • GCAS only creates descriptors upon failure

blending in with Clojure PDSs

  • at first glacne, ctries could support persistent and transient APIs simultaneously
  • persistent ops would take snapshot and modify snapshot
    • no go because transients reuse method names from the persistent API
  • instead, ctries.clj maps are “transient first”
    • persistent! creates immutable snapshots that behave like persistent maps
    • deref creates mutable snapshots in the form of independent ctries
    • deref also works on immutable snapshots (returns mutable independent ctries)
  • persistent ctrie-based maps to be optimized further

Martin Trojer

Rob Rees, Trojan horsing clojure with js

  • Wisp, a lisp for js
  • clojurescript as npm modules
    • mori
      • compiled clojurescript
      • prefix notation
      • includes quite a few clojure standard library functions
      • clojure data structures
    • immutablejs
      • javascript
      • prototype extension
  • om
    • love your library, not your language
    • had a huge influence on javascript single-page apps
      • clearly faster
      • clearly fewer bugs
  • omniscient
    • javascript clone of om
      • immutablejs/react
      • immstruct
        • implement cursors from om

-

js developers don’t want to get into clojure

-

Q&A

  • do js people really understand the Om influence on the way single-page apps are going?
    • two-way data binding is dying
  • is wisp ready for prod?
    • you know what you’re shipping
    • you’ll have to test the js that comes out of it

Adrian Mowat, My Componentised Clojure Crusade

  • Glasgow-based, Arnold Clark
  • stuart sierra component workflow
    • juxt/jig – superceded by stuart sierra’s work this year

James Henderson, Flow: learnings from writing a clojurescript DSL

  • github: james-henderson
  • twitter: jarohen

of models and widgets

               +------------+
   listens     |  Atom      |  updates  
  /------------|            |<--------\ 
  v            +------------+         | 
Widget                              Model
  |            +------------+         ^ 
  \----------->|  Channel   |---------/ 
   events      +------------+  reacts to

a typical model

  • core.async

after ClojureX 2013

  • lots of boilerplate
  • could a library help here?
  • led to clidget
    • ensured DOM was updated every time atom changed
    • evolved into Flow
  • 2 weeks later, Om was announced
  • why would any sane person carry on with Flow?!
    • couldn’t shake the feeling that things could be simpler
    • Om & Reagent introduce a lot of new concepts

Flow aims

  • 100% declarative
  • minimise number of new concepts
  • perform “well enough” to be useful (but don’t compete with React)
  • https://github.com/james-henderson/flow
  • (also, facebook created a thing called “flow” too. not the same)

hello world

(:require [flow.core :as f :include-macros true])

(defn hello-world []
  (f/root js/document.body
    (f/el
     [:p.message {::f/style {:font-weight :bold
                             :color "#a00"}}
      "Hello world!"])))

;; dynamic
(defn counter-component [!counter]
  (f/el
   [:p "value is " (<< !counter)]))
  • the << operator continuously reads from an atom

subcompenents

  • important for composable, maintainable code
  • the other operator !<<

steps for writing a compiler

  • input
  • lexing
  • parsing
  • code synthesis
  • output

lexer/parser

  • read-string

synthesis - translating the AST

  • notions:
DynamicElement :: AppState -> (Element, DynamicElement)

DynamicValue :: AppState -> Value

looking at if

(if <DynamicValue>
  <DynamicElement>
  <DynamicElement>)

(defmethod fc/compile-el-form :if [[_ test then else] opts]
  `(build-if (fn [] ~(fc/compile-value-form test opts))
             (fn [] ~(fc/compile-el-form then opts))
             (fn [] ~(fc/compile-el-form else opts))))

rules of macro club

  1. don’t write macros
  2. don’t write macros!
  3. (unless you know you have to)
  4. Get out of macro-lang as soon as you can
    • changing the execution order?
    • analysing a form?

if, cont’d

  • what’s in build-if?
    • cached state (of the current value of the test)

-

aside: clojurescript is getting quicker

  • the bottleneck is becoming me, rather than the clojurescript compiler

what’s next for flow?

  • stable release
  • more tutorials, examples, docs
  • next version: ???

what can you do now?

  • feedback please!
  • give it a go!
  • get involved!

Tom Coupland: Supercharging Cyanite

  • @tcoupland
  • MixRadio
  • graphite
    • graphical frontend for viewing metrics
  • carbon
    • default data store for graphite
    • (technically, whisper, but no matter)
    • flat files on disk
    • round-robin rollup
  • cyanite from @pyr
    • elasticsearch/cassandra backend for graphite, replacement for whisper
  • core.async primer
    • channels, take/put, go blocks
      • thread pool executors

measurements

  • YourKit agent added to cyanite
  • start CPU profiling
  • culprit: add-path
    • doing blocking I/O inside go-loop
    • don’t block the thread pool
    • Single Responsibility Principle

fix: change topology

  • changed topology to read from one input, write to two separate output channels (elasticsearch + cassandra)
  • but… performs even worse!
    • less CPU usage, but less metrics/s and less network usage

fix: extra go blocks to handle communicating with elasticsearch & casandra

  • faster, but stil not great
  • 600 metrics/s
  • 50k/s traffic in, 160k/s traffic out
  • deserves a cup of tea :)
  • eventually we find a problem:
    • AssertionError: No more than 1024 pending puts are allowed on a single channel
    • channel has in-built buffer
    • but there’s an extra “buffer” of pending-puts (and takes)
    • which has a hard limit of 1024
    • we hit it
    • helpfully, the error message suggests “windowed-buffer”
      • which drops work on the floor
      • nope nope nope!
  • (@aphyr from the twitter gallery: LITTLES LAW)

batching

  • core.async partition
    • (<! (partition 1000 chan))
    • consumes messages in batches of 1000

SRP

  • add-path fn is doing too much
    • split into separate CSP processes
    • check full path
    • break up
    • insert
  • now:
    • 2500 metrics/s
    • network
      • 400 k/s in
      • 1500 k/s out

can we do better?

  • are we stressing cyanite enough to see problems?
  • soak testing
    • generate vast quantities of data to stress cyanite
  • use yourkit profiling
    • observe most of what we’re doing is waiting for elasticsearch
    • current elasticsearch client uses clj-http
    • others could use httpkit or similar (which use java nio)

The Joy of Open Source (improving elasticsearch client)

  • being open source really pays off here
  • we can just look inside the elasticsearch driver
  • we can copy-paste it and swap out another http client

what else can we see?

  • qbits.hayt.cql – cassandra query builder
  • half the system’s time is spent on clojure.string/join
    • this isn’t so good
  • I’m using the library wrong
    • it’s for building nice queries
    • I’m just doing inserts, I don’t care about queries
    • should just do batching myself

caching

  • now
    • CPU usage halved to 16% (from 30%)
    • memory usage down
    • 2500 metrics/s, same as before
    • network
      • in: 400k/s
      • out: 880k/s (down from 1500k/s, due to batching)

back to YourKit

  • all our time is in the format_processor
  • this looks good

Back pressure

wrap up

  • Single Responsibility Principle
    • works at all the abstraction levels
    • it’s always a good idea
      • except when it isn’t, like all good ideas
  • open source is brilliant
    • but RTFM
      • if their motivation doesn’t align with yours, it might not help you
  • core.async
    • did it deliver on its promise?
    • it’s a nice way of breaking things down and following SRP
    • check the JIRA before betting the business on it
  • performance tuning
    • it’s brilliant, do it
    • go use YourKit

@matthiasn, BirdWatch

  • BirdWatch, a twitter processing thingy
  • joy of clojure
  • react
  • Om
  • transducers
    • listened to rich’s talk several times
      • each time learned something new
  • A farewell note to a programming language (namely scala)

the technical part

  • twitter streaming API
  • clojure stream client
  • percolation queries for elasticsearch
    • sends data only to those clients who are interested via websockets

inspection

  • logging, pprinting
    • bah
    • tail -F didn’t work, wanted multiline
  • log to core.async channels with dropping-buffers
    • can inspect latest state by reading from them

Thomas Kristiensen, More Open Source Systems, Please

  • composition
    • micro -> macro:
      • immutability, pure fns, idiomatic clojure
      • core.async, component
      • prod
    • but we spend more of our time at the high level – where are our composition tools here?

composition experience reports

  • wish-list for experience report:
    1. context
      • I want to know why I need this thing
    2. limitations in the group
    3. values in the group
    4. alternatives
      • and why didn’t you choose them?
    5. high-level view of component
      • before the details are presented. won’t under the details without this
    6. interesting details

example:bifrost

  • goal: archive data from kafka to S3, without any hadoop dependencies
  • main app:
    • nginx
    • ruby webapp
    • comparison service + mongo
    • collector + MySQL
  • all logs sent to kafka
  • kafka keeps a sliding window, keeping latest n messages
  • behind kafka:
    • a loader
    • sends data to buckets
    • “blueshift” - uswitch tool that sends stuff to aws redshift

limitations

  • time
  • S3
  • part of larger piece of work
  • streaming (no longer batching)

alternatives

  • secor – has a dependency on hadoop

link

let’s do this more!

  • 30 minute slots – please contact me!
  • first meetup jan 2015

Chris Ford, “journey through the looking glass”

  • lenses! <3

clojure/core has:

  • update-in
  • separates the focus of context from the applied function
  • the same paths work with get-in and assoc-in
  • however, update-in is specialised to one kind of focus

functors

  • “functors are structure-preserving maps between categories”
    • Barr and Wells
  • functors are functions that lift functions into a context.
  • functors compose as functions

examples

  • sequence functor
    • (defn fsequence [f] (partial map f))
  • identity functor
    • (defn fidentity [f] f)
  • constant functor
    • (defn fconst [_] identity)
  • “in” functor
    • (defn fin [k f] (fn [x] (update-in x [k] f)))
    • ((fin :x inc) {:x 1 :y 1}) => {:x 2 :y 1}

lenses

  • Lenses are functions that lift contextualising functions into a context.
  • update, put, view can all be represented by one function.
  • lenses compose as functions.
  • I don’t think my notes can do this talk justice. it’s really good, watch the video :)

traversals

  • lenses that can have more than one target
  • the update part works
  • but view needs variadic functors (~ Applicatives) and Monoid targets
    • need a strategy for combining multiple values into one
    • problem in clojure: monoid zero value doesn’t know the context it’s in
    • so how do you deal with traversals with zero targets?

traversy

  • inspired by kmett’s haskell lenses
    • but uses building blocks from clojure/core
    • and recognizes limitations of lack of contextual type information
  • a lens in this world is a pair of fns: focus and fmap
  • in this world, the traversal laws are more what you’d call guidelines…
  • https://github.com/ctford/traversy

q:

  • what are your thoughts on teh different worlds of haskell and clojure?
    • the social difference is most clear
    • clojure is v friendly
    • haskell is friendly too, but a bit of a rough exterior sometimes

kyle kingsbury, keynote

  • I work at factual
  • I do a lot of database stuff

what is it like to be a fold?

  • hugs form a monoid!
  • ->>, map, filter, reduce
(->> coll
     (map inc)
     (filter odd?)
     (reduce +))
  • if you create intermediate sequences for these seqs, you put a lot of pressure on the GC
  • catamorphisms!
  • reducers eliminate the intermediate seqs
  • haskell calls this “stream fusion”
  • a specific case of deforestation - a fp optimization technique
  • reducers are defined for each coll type
  • trancducers have no underlying type (to confuse haskell people)
  • (mapping inc) – maps over anything
  • but still sequential

associativity

  • if f is associative, we can fold in either direction
    • you can have an arborescent fold (tree-like)
    • can more easily parallelize
    • then bring together the resultant results
  • this is already in reducers
  • because Rich invented everything and all we have to do is discover it in the core library
    • eg PersistentQueue - who knew?
  • results preserve order
    • order requires coordination
    • coordination might be hard

commutativity

  • if (= (f a b) (f b a)) we can do stuff in any order we like
  • monoids compose wonderfully
  • commutative monoids compose even better
    • they’re great for distributed systems
    • see also CRDTs (which are commutative monoids with idempotence)

tesser

  • parallel
  • unordered
  • stream fusion
  • collection independent
  • can do this in hadoop
    • using stateful mappers
    • can do reductions in the mappers
  • post-reduce phase
    • reduce over transient, or native array
    • this phase can seal it
  • post-combine phase
    • similar, after combining
  • tesser uses a map!
    • rather than multiple arity fns
{:reduce ...
 :reduce-identity ...
 :combine ...
 :combine-identity...
 ... ...}
  • unlike transducers, we defer the build phase til the end
    • so we can compose inside & out
    • allows seq API

constraints

  • identity fns must be pure
    • transducers doesn’t use pure reduction fns
      • take sets up an atom for no of elements
      • increments each time
      • doesn’t have to wrap the accumulator value
    • no-go for tesser
    • we need all the state in the accumulator
      • so we can pass it back and forth over network
  • reducers/combiners must be associative
    • and also commutative
  • algebird isn’t commutative. should we order?
  • must short-circuit (reduced x) values

can be used for

  • stats!
    • mean, variance, covariance, correlation matrices
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment