- “we don’t care about perf”
- “clojure should be fast”
- “we’re using mongo!”
- “clojure might not be fast enough, we should use type hints”
- build it, then test it
- instrument with graphite
(GET "/slow" [] (t/time! (t/timer "slow-page")
(do (Thread/sleep (+ 100 (rand-int 100))) "oy!")))
- “maybe we could use a macro” – no!
- madness. use bidi!
- use robert/hooke to instrument anything
(add-hook #'function #'hook)
(postwalk identity r)
to disable laziness recursively
- fix 2: don’t iterate over slow things
- remove Schema
- just return fields we need
- or maybe paginate
- don’t do this prematurely!
- fix 3: tags rarely change - we can memoize
- (what about cache invalidation?)
- fix 4: queries without indices
- mongoDB handy setting:
// tablescans are now banned!
db.getSiblingDB("admin").runCommand(
{ setParameter: 1, notablescan: 1}
);
- fix 5: unrolled queries
- temptation:
(->> (all-tumblrs db)
(map (partial all-posts-for-tumble db))
(map (partial all-pics-for-post db)))
- slow in code, fast in SQL
- can roll your own profiler
- JVisualVM - free, but limited
- Yourkit - free for big OS projects, or 15-day demo
- Riemann?
- fix 6: RTFM
- stencil/render-string not memoized
- use stencil/render-file instead
- fix 7: more dakka
- fix 8: async work queues
- fix 10: tweak clojure
- “clojure high perf programming” by Shantanu Kumar
- whole nothing issue. esp mobile browsers
- hard to test at scale
- scale “from cheap, fast feedback, flexible” to “accurate, slow, rigid”
- don’t need to go full sophisticated immediately!
- “performance test pyramid” by analogy with test pyramid
- simulant
- don’t know, sounds like a good idea
- is memoization a harder problem?
- memoization is a cheat
- the first person still gets horrible perf
- if things change, the req which gets the change gets horrible perf
- memoization is a cheat
- aren’t most of your problems RTFM? aren’t you solving a cultural
problem with tech?
- we shouldn’t have needed JMeter to find stencil slowness
- reading manuals goes out of the window under pressure
- what do you do when you hit a JVM limit? eg integer parsing <JDK8
had a global lock
- throw up your hands and say screw it
- it’s hard to estimate how much time you should put into perf testing
- distributed
core.async
and associated libraries
-
- have named “slots”
- code or data
- no method/field distinction
- q: how do you send an “object” from Ponyville to Equestria?
- separation of data and schema and behaviour
- immutable data structures: not cyclical
- transit: richer than json, cross-platform serialization
- nothing is synchronous
- but we fake it
- multiple cores and NUMA
- but
- we humans gloss over it in day-to-day life
- in computer systems, we have to think of everything as asynchronous
- akka:
Any => Unit
- ie take any value, return no response
- “I’m going to the pub”
X => LAFuture[Y]
- “bruce, I’d like a hug”
- timeouts
- Channels > actors
- because:
- backpressure (paging @RobAshton)
- separation of concerns
- fanout
- actors can be modelled with channels (not supervisors though), but not vice versa
- because:
- script:
- Spike: find the book
- Pinky: find Applejack
- S&P: Look for gems
- when S&P are done, Twilight does stuff with gems, applejack, and the book aka the gofor macro
- gofor: for comprehension, go-style
(gofor
[a (foo channel {:thing val})]
:let [b (inc a)]
[c (baz other-chan)
:timeout 45000]
(println a b c)
:error (println &err &at))
sends a message to a channel
simultaneous dispatch
- you can query services for documentation
- lift is the best thing for server push around
- and has been for 8 yeras
- currently http long poll
- actors server & client side
- multiple actors but only one http connection
- developers don’t have to worry about guts
- retries, multiplexing, etc
- the best plumbing “just works”
- focus on taking a shower, rather than source or destination of water
- pipes accessible but not in your face
- so you can fix them when you need to
- or call a plumber
- eg Lift Comet
- parameters for retry backoff tuning
- how to surface errors, in browser and in server
- isolate logic from transport for testing
- separate thinking about domain logic from thinking about REST calls
- fewer developer context switches
- faster dev cycles
- discover services, send messages
- @noahcampbell - “treat your servers like cattle, not pets”
- REFful µservices
- github.com/mixradio/mr-clojure
- legacy java, php
- riak, rabbit, elasticsearch
- continuous delivery since 2010
- snowflake servers
- configuration drift
- variation of size, spec, versions
- slow provisioning (2 week lead time)
- slow deployment
- configuration confusion
- database issues & rollbacks
- no audit, so hard to know
- database issues & rollbacks
- escaped own tin dc, migrated to aws
- dev account, prod
- naming scheme: mr-*
- command line swiss army knife
- written in go
- JVM dependency would suck :(
- also considered node & python (but rejected)
- go is cross-platform
- uses our RESTful services
- create app, find images, deploy, list boxes
- github.com/ptaoussanis/faraday
- bakes machine images
- throw away old servers and deploy new ones to upgrade
- no upgrades
- handled by new bake
- puppet for a few important things
- amazon linux base image
- mixradio base image
- service image | ad-hoc image (testing)
- shells out to packer
- uses Raynes/conch
- per-environment configuration
- mixradio/mr-tyrant
- backed by github for config storage
- app is a readonly frontend
- Raynes/tentacles
- app properties
- db conn strings
- port numbers
- deployment parameters
- capacity, security groups
- launch data
- ie userdata
- keeping infrastructure in sync
- mr-pedantic
- “puppet for cloud infrastructure”
- configuration exists
- gets compared with running env
- running env gets corrected to match config
- idempotent
- lets us roll out in another region
- backed by github for config storage
- clojure for configuration
- clojail to avoid stupidity
- deployment orchestration
- mr-maestro
- asgard via APIs
- red/black deployment
- bring up new version
- healthcheck
- add to load balancer pool (but keep old version)
- when happy, remove old version
- (if unhappy, can always flip back to old version)
- asgard was good for finding what we wanted
- maestro can deploy itself
- the new servers will receive the messages which will kill the old versions
- someone else’s good idea redone on a more primitive level
- datalog queryable in-memory database
- triple store:
- <entity, attribute, value>
- database as an immutable value
- completely in-memory
- written in cljs, js bindings available
(defn create-conn [schema]
(atom (empty-db schema)
:meta {:listeners (atom {})}))
(defrecord DB
[schema
eavt aevt avet ; three indices
max-eid max-t])
(defn with [db datoms]
(-> db
(update-in [:eavt into datoms])
(update-in [:aevt into datoms])))
(defn transact [conn datoms]
(swap! ;...
))
- impl: BTSet: B+ tree (sorted set)
- perf comparable with sorted-set:
- slower conj, but faster iterate
- binary search lookups
- fast range scans
- reverse iteration
- fast first-time creation
- clojurescript, but heavy use of js arrays and APIs
- datascript is lightweight
- 700 loc btset
- 550 loc query engine
- 1700 loc total
- 1200 loc tests
- unfortunate to be associated with the word “database”
- no networking
- query over memory
- every app has an ad-hoc state
- put everything in a database instead
- non-trivial SPA has complex state
- KV stores do not cut it
- no inherent hierarchy
- natural for any data: sparse, irregular, hierarchical, graph
- faster data retrieval from big datasets
- server sync
- undo/redo
- local caching
- audit
- datascript + react
- db update causes full re-render, top-down
- immutability makes it fast
- datomic never removes anything
- datascript will clean up old values, to ensure storage is bound
- acha-acha.co
- no server fetch on any navigation, all queries and aggregations happen on client
- considered alpha
- ephemeral? transient?
- persistent-first, but not always?
- Persistent data structures
- immutable
- permit update-like operations
- with at-most logarithmic slowdown when compared with mutable counterpart
- “fully persistent data structures”
- full branching history remains available
- transients
- mutable, but share structure with persistent data structures
- enforce (<1.7) / demand (>= 1.7?) thread isolation
- change in 1.7 removes the enforcement
- still requires a single thread to “own”
- can hand-off via safe publication, so long as single ownership remains
- can be frozen to create a persistent DS
- ephemeral data structures: another term for mutable data structures
- implementation strategies
- trees with path copying for efficient PDS ‘updates’
- a notion of ownership for subtrees to support transients
- Bagwell (2000), Ideal Hash Trees
- mutable Hash Array Mapped Tries
- contribution: traditional hash tables have an occasional expensive resizing operation
- the unlucky insertion has to wait while the whole hash table is copied
- HAMTs avoids this by using a tree structure
- demonstrate comparable performance in best case, while avoiding whole-table copy resize ops
- mutable Hash Array Mapped Tries
- Hickey (2007), Clojure
- persistent hash maps based on HAMTs with path copying
- also vectors based on a modification of the idea behind HAMTs
- Hickey (2009), Clojure
- transient vectors
- transient maps added later by @cgrand
c.c/transient
originally calledmutable
,persistent!
-immutable!
- thread isolation introduced later
- Prokopec, Bronson, Bagwell, Odesky (2011)
- Cache-Aware Lock-Free Concurrent Hash Tries
- Concurrent Tries with Efficient Non-Blocking Snapshots
- Ctries - lock-free mutable HAMTs with O(1) snapshots
- the snapshots are first-class Ctries independent of originals
- key notion: subtree ownership
- distinguished central location stores an ‘edit’ marker
- an AtomicReference
- and so do individual nodes
- the transient instance itself owns nodes with its edit marker
- other nodes might be shared
- updates check edit markers, mutate own nodes in place, copy paths otherwise
- new paths get new edit markers, subsequent updates will be in-place
persistent!
invalidates the transient by resetting its edit marker to nil
- data.avl vs JDK’s NavigableMaps
- NavigableMap: nearest neighbour queries and subsetting
- two impls: java util TreeMap & java util SkipListMap
- JDK impls return subcollections as views
- modifications to the original reflected in the view (and vice versa)
- cannot add items to views outside original bounds
- inappropriate when the subcoll is to be passed on for arbitrary use
- NavigableMap: nearest neighbour queries and subsetting
- however, BSTs (including red-black and AVL trees) support join/split/range
- join: merge/concat for pairs of trees with an ordering precondition
- split: produce subcolls including keys < and > a given key
- range: extract subcoll bounded by limits
- all in logarithmic time
- in mutable setting, these modify original tree
- intention of NavigableMap is different
- that it leaves original untouched
- TreeMap is forced into view-based impl
- AVL doesn’t have this problem
- return new first-class NavigableMaps, leave original untouched
- better than Java’s own NavigableMaps in this respect
- …unless we add transients
- a transient can be stored in a j.u.TreeMap-workalike wrapper
- an independent snapshot can be produced by invalidating the transient
- the original wrapper is free to install a new edit marker immediately
- the snapshot can be subsetted to produce first-class subcolls
- these can be safely returned
- cost: must store edit markers, a little slower for updates
- no benefit for SortedMap, only NavigableMap
- map data structure with support for concurrent updates
- comparable to regular HAMTs in perf of HAMT ops
- snapshotable in O(1) time with little degradation to perf of other ops
- snapshots are completely independent first-class ctries
- now impl in pure clojure - ctries.clj (not on github yet)
- ctries: internally similar to clj PersitentHashMap
- though different range of node types
- key structure difference:
- indirection nodes
- ensure linearizability of history at a slight perf cost
- CAS-based in-place updates to tree structure
- “generation” marker used to determine subtree ownership
- with transients, establishing ownership happens at every level of the tree
- with ctries, can skip levels
- would benefit from a generalized CAS operation
- GCAS: original contribution of second ctries paper
- RDCSS: Harris, Fraser, Pratt (2002), A Practical Multi-Word Compare-and-Swap Operation
- based on descriptor objects stored at the modified location
- GCAS only creates descriptors upon failure
- at first glacne, ctries could support persistent and transient APIs simultaneously
- persistent ops would take snapshot and modify snapshot
- no go because transients reuse method names from the persistent API
- instead, ctries.clj maps are “transient first”
persistent!
creates immutable snapshots that behave like persistent mapsderef
creates mutable snapshots in the form of independent ctriesderef
also works on immutable snapshots (returns mutable independent ctries)
- persistent ctrie-based maps to be optimized further
- Wisp, a lisp for js
- clojurescript as npm modules
- mori
- compiled clojurescript
- prefix notation
- includes quite a few clojure standard library functions
- clojure data structures
- immutablejs
- javascript
- prototype extension
- mori
- om
- love your library, not your language
- had a huge influence on javascript single-page apps
- clearly faster
- clearly fewer bugs
- omniscient
- javascript clone of om
- immutablejs/react
- immstruct
- implement cursors from om
- javascript clone of om
-
- confident community
- es6
- libraries > framework
- http://slides.com/rrees/trojan-horsing-clojure-with-js
-
- do js people really understand the Om influence on the way
single-page apps are going?
- two-way data binding is dying
- is wisp ready for prod?
- you know what you’re shipping
- you’ll have to test the js that comes out of it
- Glasgow-based, Arnold Clark
- stuart sierra component workflow
- juxt/jig – superceded by stuart sierra’s work this year
- github: james-henderson
- twitter: jarohen
+------------+
listens | Atom | updates
/------------| |<--------\
v +------------+ |
Widget Model
| +------------+ ^
\----------->| Channel |---------/
events +------------+ reacts to
- core.async
- lots of boilerplate
- could a library help here?
- led to clidget
- ensured DOM was updated every time atom changed
- evolved into Flow
- 2 weeks later, Om was announced
- why would any sane person carry on with Flow?!
- couldn’t shake the feeling that things could be simpler
- Om & Reagent introduce a lot of new concepts
- 100% declarative
- minimise number of new concepts
- perform “well enough” to be useful (but don’t compete with React)
- https://github.com/james-henderson/flow
- (also, facebook created a thing called “flow” too. not the same)
(:require [flow.core :as f :include-macros true])
(defn hello-world []
(f/root js/document.body
(f/el
[:p.message {::f/style {:font-weight :bold
:color "#a00"}}
"Hello world!"])))
;; dynamic
(defn counter-component [!counter]
(f/el
[:p "value is " (<< !counter)]))
- the
<<
operator continuously reads from an atom
- important for composable, maintainable code
- the other operator
!<<
- input
- lexing
- parsing
- code synthesis
- output
- read-string
- notions:
DynamicElement :: AppState -> (Element, DynamicElement)
DynamicValue :: AppState -> Value
(if <DynamicValue>
<DynamicElement>
<DynamicElement>)
(defmethod fc/compile-el-form :if [[_ test then else] opts]
`(build-if (fn [] ~(fc/compile-value-form test opts))
(fn [] ~(fc/compile-el-form then opts))
(fn [] ~(fc/compile-el-form else opts))))
- don’t write macros
- don’t write macros!
- (unless you know you have to)
- Get out of macro-lang as soon as you can
- changing the execution order?
- analysing a form?
- what’s in
build-if
?- cached state (of the current value of the test)
-
- the bottleneck is becoming me, rather than the clojurescript compiler
- stable release
- more tutorials, examples, docs
- next version: ???
- feedback please!
- give it a go!
- get involved!
- @tcoupland
- MixRadio
- graphite
- graphical frontend for viewing metrics
- carbon
- default data store for graphite
- (technically, whisper, but no matter)
- flat files on disk
- round-robin rollup
- cyanite from @pyr
- elasticsearch/cassandra backend for graphite, replacement for whisper
- core.async primer
- channels, take/put, go blocks
- thread pool executors
- channels, take/put, go blocks
- YourKit agent added to cyanite
- start CPU profiling
- culprit: add-path
- doing blocking I/O inside go-loop
- don’t block the thread pool
- Single Responsibility Principle
- changed topology to read from one input, write to two separate output channels (elasticsearch + cassandra)
- but… performs even worse!
- less CPU usage, but less metrics/s and less network usage
- faster, but stil not great
- 600 metrics/s
- 50k/s traffic in, 160k/s traffic out
- deserves a cup of tea :)
- eventually we find a problem:
- AssertionError: No more than 1024 pending puts are allowed on a single channel
- channel has in-built buffer
- but there’s an extra “buffer” of pending-puts (and takes)
- which has a hard limit of 1024
- we hit it
- helpfully, the error message suggests “windowed-buffer”
- which drops work on the floor
- nope nope nope!
- (@aphyr from the twitter gallery: LITTLES LAW)
- core.async partition
(<! (partition 1000 chan))
- consumes messages in batches of 1000
- add-path fn is doing too much
- split into separate CSP processes
- check full path
- break up
- insert
- now:
- 2500 metrics/s
- network
- 400 k/s in
- 1500 k/s out
- are we stressing cyanite enough to see problems?
- soak testing
- generate vast quantities of data to stress cyanite
- use yourkit profiling
- observe most of what we’re doing is waiting for elasticsearch
- current elasticsearch client uses clj-http
- others could use httpkit or similar (which use java nio)
- being open source really pays off here
- we can just look inside the elasticsearch driver
- we can copy-paste it and swap out another http client
- qbits.hayt.cql – cassandra query builder
- half the system’s time is spent on
clojure.string/join
- this isn’t so good
- I’m using the library wrong
- it’s for building nice queries
- I’m just doing inserts, I don’t care about queries
- should just do batching myself
- now
- CPU usage halved to 16% (from 30%)
- memory usage down
- 2500 metrics/s, same as before
- network
- in: 400k/s
- out: 880k/s (down from 1500k/s, due to batching)
- all our time is in the format_processor
- this looks good
- Single Responsibility Principle
- works at all the abstraction levels
- it’s always a good idea
- except when it isn’t, like all good ideas
- open source is brilliant
- but RTFM
- if their motivation doesn’t align with yours, it might not help you
- but RTFM
- core.async
- did it deliver on its promise?
- it’s a nice way of breaking things down and following SRP
- check the JIRA before betting the business on it
- performance tuning
- it’s brilliant, do it
- go use YourKit
- BirdWatch, a twitter processing thingy
- joy of clojure
- react
- Om
- transducers
- listened to rich’s talk several times
- each time learned something new
- listened to rich’s talk several times
- A farewell note to a programming language (namely scala)
- twitter streaming API
- clojure stream client
- percolation queries for elasticsearch
- sends data only to those clients who are interested via websockets
- logging, pprinting
- bah
- tail -F didn’t work, wanted multiline
- log to core.async channels with dropping-buffers
- can inspect latest state by reading from them
- composition
- micro -> macro:
- immutability, pure fns, idiomatic clojure
- core.async, component
- prod
- but we spend more of our time at the high level – where are our composition tools here?
- micro -> macro:
- wish-list for experience report:
- context
- I want to know why I need this thing
- limitations in the group
- values in the group
- alternatives
- and why didn’t you choose them?
- high-level view of component
- before the details are presented. won’t under the details without this
- interesting details
- context
- goal: archive data from kafka to S3, without any hadoop dependencies
- main app:
- nginx
- ruby webapp
- comparison service + mongo
- collector + MySQL
- all logs sent to kafka
- kafka keeps a sliding window, keeping latest n messages
- behind kafka:
- a loader
- sends data to buckets
- “blueshift” - uswitch tool that sends stuff to aws redshift
- time
- S3
- part of larger piece of work
- streaming (no longer batching)
- secor – has a dependency on hadoop
- 30 minute slots – please contact me!
- first meetup jan 2015
- lenses! <3
update-in
- separates the focus of context from the applied function
- the same paths work with
get-in
andassoc-in
- however,
update-in
is specialised to one kind of focus
- “functors are structure-preserving maps between categories”
- Barr and Wells
- functors are functions that lift functions into a context.
- functors compose as functions
- sequence functor
(defn fsequence [f] (partial map f))
- identity functor
(defn fidentity [f] f)
- constant functor
(defn fconst [_] identity)
- “in” functor
(defn fin [k f] (fn [x] (update-in x [k] f)))
((fin :x inc) {:x 1 :y 1}) => {:x 2 :y 1}
- Lenses are functions that lift contextualising functions into a context.
update
,put
,view
can all be represented by one function.- lenses compose as functions.
- I don’t think my notes can do this talk justice. it’s really good, watch the video :)
- lenses that can have more than one target
- the
update
part works - but
view
needs variadic functors (~ Applicatives) and Monoid targets- need a strategy for combining multiple values into one
- problem in clojure: monoid zero value doesn’t know the context it’s in
- so how do you deal with traversals with zero targets?
- inspired by kmett’s haskell lenses
- but uses building blocks from clojure/core
- and recognizes limitations of lack of contextual type information
- a lens in this world is a pair of fns:
focus
andfmap
- in this world, the traversal laws are more what you’d call guidelines…
- https://github.com/ctford/traversy
- what are your thoughts on teh different worlds of haskell and
clojure?
- the social difference is most clear
- clojure is v friendly
- haskell is friendly too, but a bit of a rough exterior sometimes
- I work at factual
- I do a lot of database stuff
- hugs form a monoid!
- ->>, map, filter, reduce
(->> coll
(map inc)
(filter odd?)
(reduce +))
- if you create intermediate sequences for these seqs, you put a lot of pressure on the GC
- catamorphisms!
- reducers eliminate the intermediate seqs
- haskell calls this “stream fusion”
- a specific case of deforestation - a fp optimization technique
- reducers are defined for each coll type
- trancducers have no underlying type (to confuse haskell people)
(mapping inc)
– maps over anything- but still sequential
- if f is associative, we can fold in either direction
- you can have an arborescent fold (tree-like)
- can more easily parallelize
- then bring together the resultant results
- this is already in reducers
- because Rich invented everything and all we have to do is
discover it in the core library
- eg PersistentQueue - who knew?
- results preserve order
- order requires coordination
- coordination might be hard
- if
(= (f a b) (f b a))
we can do stuff in any order we like - monoids compose wonderfully
- commutative monoids compose even better
- they’re great for distributed systems
- see also CRDTs (which are commutative monoids with idempotence)
- parallel
- unordered
- stream fusion
- collection independent
- can do this in hadoop
- using stateful mappers
- can do reductions in the mappers
- post-reduce phase
- reduce over transient, or native array
- this phase can seal it
- post-combine phase
- similar, after combining
- tesser uses a map!
- rather than multiple arity fns
{:reduce ...
:reduce-identity ...
:combine ...
:combine-identity...
... ...}
- unlike transducers, we defer the build phase til the end
- so we can compose inside & out
- allows seq API
- identity fns must be pure
- transducers doesn’t use pure reduction fns
- take sets up an atom for no of elements
- increments each time
- doesn’t have to wrap the accumulator value
- no-go for tesser
- we need all the state in the accumulator
- so we can pass it back and forth over network
- transducers doesn’t use pure reduction fns
- reducers/combiners must be associative
- and also commutative
- algebird isn’t commutative. should we order?
- must short-circuit
(reduced x)
values
- stats!
- mean, variance, covariance, correlation matrices