Skip to content

Instantly share code, notes, and snippets.

@philandstuff
Last active November 16, 2015 19:37
Show Gist options
  • Save philandstuff/a33770b3c175b0b261bc to your computer and use it in GitHub Desktop.
Save philandstuff/a33770b3c175b0b261bc to your computer and use it in GitHub Desktop.
Code mesh 2015 notes

Kush, an introduction to schedulers

about me

  • I work for GDS
  • Cabinet Office
  • we started by building GOV.UK
    • replaced older sites like direct gov, business link
  • we’re not just fixing websites
    • we also build and run digital services
    • working with depts across the country
    • eg: register to vote
      • saw about 200k people register at the same time
  • visibility on govt services
  • I’m not here to talk about GDS
    • go to Phil’s talk

this talk

  • it’s been hard to go to a conf in the last two years without hearing about docker (docker docker docker)
  • I want to demystify schedulers
    • some people feel this is too much too soon
    • mesos, kubernetes, etc
  • when you should use schedulers
    • and when you shouldn’t
  • I see schedulers everywhere

what is it?

  • the job of a scheduler is to fairly allocate work
    • fairness is an artistic term
    • schedulers act fairly based on their defined objectives
  • example: schedule the expression E = AB + CD
    • E = AB + CD
    • E = (1*2) + (3*4) – explicit ordering
    • tree of expressions
      • addition is commutative, so order doesn’t matter

common scheduling problems

  1. unconstrained
  2. time-constrained
    • all ops completed in a time-bound
      • at least two steps
  3. resource-constrained
    • if we only have one multiplication unit

types of scheduling

  • ASAP (as soon as possible)
  • ALAP (as late as possible)
  • list scheduling
  • force-directed scheduling
    • weighting (commonly time)

why?

  • we don’t make the best use of our infrastructure resources

example: linuxkernel cpu scheduler

  • decides which ops get CPU ultilization right now
  • challenge: make sure applications get a fair share of CPU time
    • (called a “time slice”)
  • let’s look at linux 2.4 cpu scheduler
    • task list containing all processes, and work to be completed
      • guarded by a r/w lock
      • each process has a priority
    • it had a number of shortcomings
      • wasn’t efficient
        • had to iterate through whole task list every time
      • bad in SMP environments
        • could easily pin a single CPU and leave others unused
  • was rewritten
    • remove global task list log
    • each process gets its own queue and its own lock
  • rewritten again in 2.6
    • called the Completely Fair Scheduler
    • uses red/black tree instead of queues
  • scheduling algorithms:
    • FF SCHED_FIFO
    • RR SCHED_RR
    • TS SCHED_OTHER (_NORMAL)
  • nice
    • runs a program with modified priority
    • eg: nice -n 19 echo “hello”
  • chrt or “change real time”
    • chrt -p 16
    • can change priority but also scheduling algorithm
  • ionice - set IO scheduling priority
    • ionice -c3 -p89
      • sets process with pid 89 as an idle io process

i see schedulers

  • many queues, locks and a scheduling algorithm over the top
  • not just CPUs!
  • eg in networking (see www.lartc.org)
  • tc - traffice control
    • eg: tc qdisc add dev lo root netem delay 500ms
      • sets queuing discipline
      • network delay of 500ms (netem == network emulation)
  • eg in postgres: has something called autovacuum
    • automates execution of VACUUM and ANALYZE

container schedulers

  • Kubernetes:
type Scheduler interface
{
        Schedule(
                *api.Pod,
                ...
        )
}
  • Docker swarm
  • primitives
    • CPU
    • instances
    • add your own: eg response time
  • ephemeral node
    • with a service on it
    • different nodes have different properties in the cloud
      • bad io
      • noisy neighbours
  • see @kelseyhightower’s talks too on container schedulers

others

  • baggage reclaim
    • a bunch of queues
    • a lock
    • you are the scheduling algorithm - you know what you care about which is your bag
    • baggage reclaim is controlled by a software scheduler
  • premier league football matches
    • domain-specific rules
      • man city and man u can’t play at home on the same day
        • (infrastructure can’t cope)
      • nottingham trent bridge
        • three stadiums next to each other
        • can’t all operate at the same time
  • people
    • pomodoro
    • work planning
  • you can choose to hand over knowledge work to schedulers (but they’re not a silver bullet)
    • can probably do better capacity planning

Amir Chaudhry: Unikernels and hyper-elastic clouds

  • who’s heard of unikernels?
  • who’s actually used them? which ones?
    • halvm
    • mirage
    • rump kernel
  • talk resources

motivation

  • software today
    • is built locally, but deployed somewhere else
    • is complex! (even though most apps are single-purpose)
  • complexity is the enemy

MirageOS

  • OCaml
    • it has a good module system

links

Matthew Podwysocki, Putting You Back in Charge of Your Data

Zach Tellman, Everything Will Flow

Queues!

  • they’re simple, right?
    • producer -> queue -> consumer
  • they’re everywhere
  • java.util.concurrent.Executor
    • thread pool
    • BlockingQueue on requests
  • java.util.concurrent.BlockingQueue
    • producer -> [notFull] [buffer] [notEmpty] -> consumer
  • java.util.concurrent.locks.Condition
    • software threads
    • hardware threads
    • more queues to schedule the threads
  • queues are everywhere!

why?

  • queues separate the “what” from the “when”

queueing theory

closed systems

  • one-at-a-time
  • shell prompt
    • command
    • output
    • command
    • output
  • single person browsing a website
    • click link
    • wait for page to load
    • click link
    • wait for page to load

open systems

  • requests come in
    • while we’re serving one, another can come in at any point

simulating open systems

  • producers: exponential distribution
    • it’s memoryless – the amount of time passed doesn’t change likelihood of producers arriving
  • consumers: pareto distribution
  • simulation examples
    • failure mode: load increases until queueing time dominates
    • the more consumers you add, the longer it takes to fall over
      • …but the more suprising it is when it does
      • echos of “why complex systems fail” here
  • lesson: unbounded queues are fundamentally broken
  • we’ve made assumptions about how users behave, but haven’t told them, or held them to it

dealing with too much data

  • options:
    1. drop the data
    2. reject the data
    3. pause the data
  • Drop
    • only valid if newer data obsoletes older data, or if we just don’t care
      • eg statsd uses UDP by default
        • problem: when we’re most in need of high fidelity data, we’re least likely to have it, because UDP is best-effort and will degrade under load
  • Reject
    • eg twitter fail whale
    • often, this is the only choice available
  • Pause (aka backpressure)
    • often the only choice for a close system, library, or subsystem
    • backpressure swims upstream
      • back to java BlockingQueue:
      • producer -> [notFull] [buffer] [notEmpty] -> consumer
        • the extra pieces are for backpressure reasons
        • you don’t even need the buffer here!

unbuffered queues

  • producer -> [puts] [takes] -> consumer
  • unbuffered queues are closed systems
  • why have buffers if we don’t need them?
    • because backpressure isn’t free
      • restarting threads
      • restarting TCP traffic
      • reading from disk
      • reading from dbs
    • a buffer allows higher, more consistent throughput
      • at a cost of potentially higher latency, because things sit on the buffer waiting to be dealt with

lessons

  • we should always plan for too much data
  • backpressure
  • TCP bufferbloat - example of problem of too many composed buffers

concrete example

  • ma: data system that takes data and takes it to a safe at-rest place
  • arbitrary volume of data -> ELB -> [thing] -> S3
    • what goes in the middle?
      • must horizontally scale
      • must be low-maintenance
      • not real-time
      • loss of data ∝ loss of utility
    • used:
      • aleph -> hadoop s3 client -> s3
    • but then:
      • S3 starts rejecting writes
      • hadoop starts accumulating writes waiting for S3 to come back
  • replaced hadoop s3 client with:
  • still had problems:
    • still acquired data in durable-queue without bound
    • no visibility when this happened; no metrics
    • had to start writing metrics for this
  • netty’s threading model
    • our typical metrics measure from when we start actually processing the request
      • but this ignores the queueing delay!
      • threadpools are not well instrumented
  • dirigiste for better understanding of behaviour of thread pools
    • distinguish task latency from queue latency
    • better metrics
      • not complete – they can’t be
      • whole system includes client

lessons

  • be clear what “completion” is
  • be picky about your tools
    • watch for sneaky unbounded queues!
  • prefer metrics to raw performance
  • you’re never measuring the complete system

publishing

  • eg chat server
  • one-to-many queue
  • timeouts

we talked about queues

  • unbounded queues aren’t. they’re bounded by something we’ve failed to define.

Q&A

  • how do you deal with pain from low defaults (eg in haproxy?)
    • you need some documentation on the knobs are and why you should care
    • a one-size-fits-all parameter is bound to disappoint
  • with the log-latency graphs you showed with the buffered queue example, is that a useful visibility tool for day-to-day systems we use?
    • that’s my general intuition
    • most people use graphite
    • doing distributions in graphite is hard
  • if we follow your example of using buffered queues, were there any metrics you feel we should definitely be tracking?
    • utilization & contention
      • once you get above 60-70% utilization, exhaustion gets quite likely
  • queueing theory often assumes a fixed number of consumers; can it work in an autoscaling world?
    • yes
    • maybe cloud services resembles infinite consumers?
    • can you do formal analysis of a dynamic system like this and predict exactly how it will behave? probably not
      • better to test, benchmark, see what it does under load

Evan Czaplicki, Accidentally Concurrent

  • works for Prezi to work on the Elm language
  • I care a lot about how proglang research from last 30 yrs can be useful day-to-day on the web
    • eg: why is immutability good?

message passing concurrency

  • I take this from concurrent ML
  • lightweight threads
  • basic primitives:
    • channel: () -> Channel a
    • send: Channel a -> a -> ()
    • recv: Channel a -> a
    • spawn: (a -> ()) -> a -> ThreadId

mutation

  • what does mutation look like when you model it as a concurrent system with these primitives?
    • type Msg a = Get | Set a
    • “two spies in the park” approach to communication
    • a new thread per variable
      • becomes unmanagable

encapsulation

function BadGuy() {
    var x = 0;
    var y = 0;
    var health = 100;
    this.follow = function(hero) { ... };
    this.damage = function(n) { ... };
}
  • hides multiple mutable cells behind a single interface
  • introduces hierarchy
  • but: hierarchy ≠ modularity
    • reference can be passed from one code unit to another
      • introduces extra comm lines between subtrees of hierarchy
    • getters and setters on objects
    • setters mean readers have ability to bash on objects

schedulers

  • different kinds
    • cooperative
      • runs a thread until it’s done or it yields
      • used by golang, …
    • preemptive
      • you can bail on something half-way through
      • used by erlang, concurrent ML, …
  • that old thing concurrency ≠ parallelism
  • event loop
    • “a poor man’s concurrency”
    • javascript setTimeout(callback, 0); is the same as spawn callback()
  • “Reactive” Programming
    • “everything is a stream!”
      • type alias Stream a = Channel a
      • build things up with combinators
      • map: (a->b) -> Stream a -> Stream b
    • glitches
      • diamond communication shape
diamond : Stream Float -> Stream Bool
diamond stream =
  merge
    (==)
    (map sqrt stream)
    (map sqrt stream)
  • glitches cont’d
    • looked at as a concurrent system, it’s clear that this will cause glitches
      • there’s no reason to expect synchronization
    • if you thought of your streams as a concurrent system, you probably wouldn’t write it this way in the first place
  • FlatMap is weird!
    • flatMap : (a -> Stream b) -> Stream a -> Stream b
    • it is kind of like a threadpool, but really bizarrely done
    • is it even a reasonable use of flatmap?
  • dependencies between streams become a rat’s nest
    • hero depends on keyboard
      • hero’s health depends on location, bad guys, etc
    • bad guys depend on hero
      • want to chase him
    • pause screen depends on
      • hero’s health
      • keyboard, mouse
    • hero depends on pause screen
      • maybe I can apply a potion to increase health
    • this is crazy

big points

  • writing high-quality concurrent programs is hard
  • everyone is writing accidentally concurrent programs

better stuff

  • immutability
    • means no accidental communication channels
    • revealing information doesn’t change your architecture

Q&A

  • what about Rust’s approach to mutability where it just emphasizes ownership?
    • ownership and encapsulation are separate concerns

Leah Hanson, How Julia Goes Fast

what problem is julia solving?

  • julia is for scientists
    • engineers, mathematicians, finance people
    • non-professional programmers who use programming as a tool
      • least time programming for maximum benefit
  • what do they need?
    • easy to learn
    • easy to use
    • good for scripts
    • good for libraries
    • fast enough for medium to large data sets
    • fast, extensible math
      • linear algebra
      • new numeric types
  • easy and fast
  • how is it better than what they already use? (eg numpy)
    • need to learn python
    • need to add your own function that doesn’t yet exist
      • implement in python
        • too slow
        • (numpy’s things are written in C with python wrappers)
        • learning C is not a good use of our users’ time
  • fast julia code is written in julia

the big decisions

  • dynamic, interpreted, easy

background for implementation

  • JIT compilation
    • at run time, you alternate between compiling and running code
  • compiler needs to be fast
    • static compilers can be slow because they don’t contribute to runtime
    • this is no longer true for JITs
  • compiler has access to runtime information
  • julia is designed up front to be JIT compiled
    • type system
      • abstract or concrete types
        • concrete types:
          • can be instantiated
          • determine mem layout
          • types can’t be modified after creation
          • can’t have subtypes
          • single supertype
        • abstract types:
          • can’t instantiate
          • new subtypes can be added at any time
          • single supertype
          • zero or more subtypes
  • multiple dispatch
    • convenience for extensible math
    • single dispatch is method dispatch we’re familiar with
      • x.foo(y) – code impl depends on type of x
    • multiple dispatch can depend on any param:
      • foo(x,y) – code impl depends on types of x and y
    • all named fns are generic
    • each fn has one or more methods
x = ModInt(3,5)
x + 5 # ModInt + Int
5 + x # Int + ModInt

function Base.+(m::ModInt, i::Int64)
    return m + ModInt(..)
end
  • this just can’t be done nicely with single dispatch

the details of how things go fast

dispatch

  • dispatch happens a lot, so needs to be really fast
    • foo(5) #dispatch, compile, run
    • foo(6) #dispatch, run
    • foo(7) #dispatch, run
  • possible signatures:
    • foo(Int64)
    • foo(Number)
    • foo(Float64)
    • foo(Int64,Int64)
  • do we need to keep redispatching? aren’t we always going to pick foo(Int64)?
    • add a cache
    • foo(5) #cache, dispatch, compile, run
    • foo(6) #cache, run
    • foo(7) #cache, run
    • call into hashmap rather than going through a list

generic functions

  • define Base.* in terms of +(Number,Number) and -(Number,Number)
  • C++: templates vs dynamic dispatch
    • dynamic dispatch is generic
      • single Base.* impl to cover all Number instances
    • templates do aggressive specialization
      • individual Base.* impl for each Number instance
    • tradeoff: code size vs speed
  • inlining
    • makes more optimizations available
    • not reusable
  • devirtualization
    • write down the IP to avoid DNS
    • ie hardcode the method implementation rather doing dispatch
  • why do we normally dispatch?
    • we’re avoiding dispatch as much as possible for performance
    • worth knowing why we have it in the first place
      • repl work
        • redefine a function to fix a bug, want callers to reflect changes
        • issue 265

type stability

  • non-concrete types have to be boxed
  • immutable types
    • kind of concrete type
    • once created, value can’t change
  • boxed immutable values are bad for perf
    • immutable types can’t appear to change
    • we have to copy them each time we want to change them
    • new heap allocation

macros for speed?

  • julia has lisp-style macros
  • macros are evaluated at compile time
  • macros should be used sparingly
  • Horner’s rule
    • a*x*x + b*x + c
    • skip a multiply with (a*x + b)*x + c
    • for a cubic: 6 multiplies down to 3
    • for a quartic: 10 multiplies down to 4
    • etc (O(n^2) down to O(n))
  • Horner’s rule can be implemented as a macro
  • 4x faster than matlab
  • 3x faster than scipy
    • both of which call C/Fortran libs
  • Julia is faster than C!
    • compiled julia methods will have inlined constants, which are very optimizable
    • C would involve a run-time loop over the array of coeffs
    • LLVM can optimize the julia better than the C
    • an abstraction which makes things faster

conclusion

  • scientists like easy & fast
  • dynamic for easy (but with limits)
  • compiled for fast
  • multiple dispatch for making fast code concise

Q&A

  • what about R?
    • R is slow for things that need loops (like monte carlo)
  • does julia have support for openmp?
    • I think so?
    • you can call libs that use OpenMP
    • there’s experimental support for OpenMP within julia

John Hughes, Mary Sheeran, Why Functional Programming Matters

FP à la 1940s

  • who needs booleans?
  • a boolean just makes a choice! They can just be functions!
    • true x y = x
    • false x y = y
  • we can then define if-then-else
    • ifte bool t e = bool t e
  • who needs integers?
    • a (positive) integer just counts loop iterations
      • two f x = f (f x)
      • one f x = f x
      • zero f x = x
    • to recover a “normal” integer…
      • two (+1) 0
        • gives 2!
    • add m n f x = m f (n f x)
    • mul m n f x = m (n f) x
  • factorial!
fact n =
  ifte (iszero n)
    one
    (mul n (fact decr n))

This means: booleans, integers, (and other data structures) can be entirely replaced by functions!

Church encodings

Before you try this at home…

  • “cannot construct the infinite type…”
~fact
(forall a. (a->a) -> a -> a)…~

-

Factorial à la 1960

  • LISP!
  • This got lots of people excited!
    • esp Peter Landin (see The Next 700 Programming Languages and ISWIM)

laws

  • (maplist f (reverse l))(reverse (maplist f l))
    • …only if f is side-effect free

issues with conventional languages

  • John Backus, “Can Programming Be Liberated from the von Neumann Style? A Functional Style and Its Algebra of Programs”
    • Turing award 1977; paper 1978
  • “Inherent defects at the most basic level cause them [conventional languages] to be both fat and weak
  • defects such as:

word-at-a-time operation

  • inherited from the von Neumann machine
    • the von Neumann way of designing machines has a terrible effect on writing programs
    • don’t focus on the word-at-a-time bottleneck; focus on whole values

inability to use “combining forms”

  • what we would today call “higher order functions”
  • he called map f “αf”

lack of useful mathematical properties for reasoning about programs

  • [f1,f2,..,fn] ∘ g ≡ [f1 ∘ g, f2 ∘ g,…, fn ∘ g]
  • g ∘ [f1,f2,..,fn] ≡ [g ∘ f1, g ∘ f2,…, g ∘ fn]

Got a bit APLy:

Def IP = (/ +) ∘ (α ×) ∘ Trans

Peter Henderson

  • functional geometry
  • escher’s square limit fish
  • devised an algorithmic definition of escher’s picture
    • operators on pictures such as rot, over, above
    • higher-order operators cycle, quartet
  • then: define pictures as functions
    • p(a,b,c) is the picture at position a, with height along vector b and width along vector c
    • other combinators become easy to describe here:
      • eg over (p,q) (a,b,c) becomes p(a,b,c) ∪ q(a,b,c)
  • this functional geometry avoids Backus’s defects:
    • combining forms
    • operations on whole values
    • algebra with laws
  • and as a bonus: functions as representations
  • taking it further: Conal Elliot’s Pan

FP à la 1990s

  • Haskell: Paul Hudak got DARPA money to work on Haskell as a prototyping language
  • functions as data
    • type Region = Point -> Bool
      • predicate representation of set of points
  • reactions…
    • “too cute for it’s own good”
    • “higher-order functions just a trick, probably not useful in other contexts”

Lazy evaluation

  • Henderson and Morris, A lazy evaluator
  • Friedman and Wise, CONS should not evaluate its arguments
  • “The Whole Value” can be infinite!
    • the infinite list of natural numbers [0, 1, 2, 3, …]
    • all the iterations of a function
    • iterate f x = [x, f x, f (f x), ...]
  • some things can be expressed very nicely with lazy eval
    • eg newton-raphson, numerical derivative

my paper

  • John Hughes, Why Functional Programming Matters
  • lazy producer-consumer pattern
    • producer: successive approximations of a numerical algorithm (eg derivative, integral)
      • consumer: limit detection
    • producer: search space
      • consumer: search strategy
  • pretty printing
    • make choices between horizontal and vertical layout
      • vertical: preserve indentation
      • horizontal: discard indentation
    • combinators: a $$ b puts a above b; a <> b puts a beside b
    • combinators obey laws:
      • <> and $$ are both associative
    • once we understand the laws, the implementation can use the laws to make the best layout

my other paper

  • Koen Claessen, John Hughes, “QuickCheck: A Lightweight Tool for Random Testing of Haskell Programs”
  • randomly generate test cases from properties (properties based on laws)
  • then shrink failing test cases to minimal counterexamples
  • This is another lazy producer-consumer!
    • producer: space of all possible tests
    • consumer: QuickCheck search strategy

more applications

  • Introduction to VLSI systems
    • was intended to revolutionize programming
    • did revolutionize hardware design
  • A VLSI circuit is not a million miles away from Escher’s square limit!
    • we used functional geometry ideas to design VLSI circuits
    • we need to be certain we haven’t made a mistake as the feedback cycle from the fab was too slow (9 months)
  • use Backus’s FP to create a language that describes streaming circuits
    • then use algebraic laws of the language to optimize the circuits
    • symbolic simulator to test
    • muFP
  • hardware sorting based on doubly-recursive definition:
  • having a language of combining forms makes it much easier to reason about sorting networks
    • though there are still outstanding unsolved problems
  • Satnam Singh: Lava language http://blog.raintown.org/p/lava.html
  • Pentium float division bug
    • very costly mistake
    • they suddenly got interested in formal verifiction
    • forte
  • bluespec bsv
    • bluecheck
      • QuickCheck in hardware design!
      • iterative deepening and shrinking on FPGA designs!

four key ideas

  • functions as representations
  • whole values
  • simple laws
  • combining forms

Martin Thompson, A Quest for Predictable Latency

  • subtitle: Adventures in Java Concurrency
  • @mjpt777
  • latency: responding in a timely manner
  • the more things you try to juggle, the harder it becomes to predict how responsive you can be
  • if you don’t respond in a timely manner, you’re effectively unavailable

causes of blocking

  • concurrent algorithms
    • notifying completion
    • mutex
    • synchronization/rendezvous
  • systemic pauses
    • JVM Safepoints (GC etc)
    • Transparent Huge Pages (linux)
    • Hardware

concurrent algorithms

  • two approachs: locks and CAS
    • both are hard to get right!

what is latency?

  • queueing theory
    • response time is sum of:
      • enqueue time
      • latent time (time in queue)
      • dequeue time
      • service time (time being processed)
  • queueing theory focuses on latent time and service time, but sometimes enqueue and dequeue dominate!
  • relationship between utilization and increase response time

adventures with locks and queues

  • evils of blocking
  • condition variables
    • condition variable RTT (echo service)
    • request: ping; response: pong
    • measure histogram of response times
      • don’t use averages! they lie!
    • 90th percentile: 8 µs
    • 8 µs is loads of time to talk between two threads!
      • 2 µs is enough to go between two machines on the same network!
    • but it gets worse at higher percentiles
      • max is 5525 µs!
  • bad news: this is the best case scenario!
  • context for benchmarks:
  • producers: 1, 2 or 3 (can’t have more on a quad-core machine!)
  • measuring: mean and 99th percentile
  • baseline (wait-free) vs ArrayBlockingQueue vs LinkedBlockingQueue vs ConcurrentLinkedQueue
  • burst length: 1 then 100
    • real world traffic is bursty
    • send a burst of 100; wait for 1 ack
  • other considerations:
    • Backpressure (queues should have bounded size)
    • size methods
    • flow rates
    • garbage
    • fan out
  • in the real world, I cannot use java’s built-in queues, partly because they lack features I need
    • if you call .size() on ConcurrentLinkedQueue, it walks the whole list

some alternative FIFOs

the disruptor

  • get over the garbage problem
    • preallocate a ring buffer and reuse it
  • producers claim a slot using CAS
    • once a slot is claimed, can write to the slot
    • once written, can update the cursor
  • consumer updates gating
  • no need for locks!
  • wrinkle: how do I deal with setting the cursor?
    • I can’t update cursor to n before it has reached n-1
    • thread claims slot; gets swapped out before writing to slot
      • blocks whole queue until thread swaps back in (many ms!)

update in Disruptor 3.0

  • cursor is used as CAS loop
  • add an “available” array to mark when things are written
  • made a huge difference

what if I just need a queue?

  • disruptor isn’t best used as a queue (better for coordinating graph of dependencies)
  • ManyToOneConcurrentArrayQueue (influences: Lamport + Fast Flow)
  • link?
  • tail: CAS counter
  • head: counter
  • reference ring buffer

back to the benchmarks

  • Disruptor and ManyToOneConcurrentArrayQueue behave better in low contention
  • gets worse under the bursty case
  • each over 50 µs at this stage
  • spinning CAS loops are the great equalizer

inter-process FIFOs

  • ring buffer
  • spinning CAS on tail counter
  • write message once space claimed
  • write header to indicate completion
    • (zero out memory on consumption to make this work)
  • MPSC Ring Buffer
  • latency is good but throughput takes a hit from zeroing memory

Aeron IPC

  • http://highscalability.com/blog/2014/11/17/aeron-do-we-really-need-another-messaging-system.html
  • we wanted a CRDT
    • replicated to another machine
    • may be reordered or duplicated
    • want to reassemble deterministically
  • big shared-memory file
    • tail pointer (but no head this time)
    • advance tail, write message, write header (just as above)
  • does the file really just go on forever?
    • problems with this: page faults; page cache churn; VM pressure
    • wash one, wear one, dry one
    • active file, dirty file, clean file
  • XADD instruction
    • available in JVM now
    • ring buffers can’t use this because the tail can overtake the head
    • if you move the tail beyond the end of the buffer, you can detect this
      • your responsibility is to rotate the buffer now
    • messages persist while active & dirty buffers are around
  • append-only data structures can be safe to read without locks
    • they can detect when they’ve reached the end
  • zeroing problem from before: how do we avoid the throughput problem?
    • do it in a background thread!
    • the old “dirty” becomes the new “clean”

back to the benchmarks!

  • aeron IPC is slower than ManyToOneConcurrentArrayQueue in the happy case (no burst, one producer, average time)
  • for bursty, 3 producers, 99th percentile:
    • ManyToOneConcurrentArrayQueue breaks below 50µs to 34µs
      • less “False Sharing”
      • inlined data vs Reference Array
        • less “Card Marking”
    • Aeron gets even further down to 13µs (ish?)
      • also mean is 10µs so this is phenomenal predictability
      • avoids the spinning CAS

logging is a messaging problem!

  • the abominations we have in Java for logging are disgusting!
    • the APIs, the design, etc
  • what design should we use?

where can we go next

  • C vs Java
    • Spin loops
    • modern processors do out-of-order speculation
    • with spin loops, they speculate wrong
      • Proposed: Thread.spinYieldHint()
    • data dependent loads
      • heap aggregates: objectlayout
      • stack allocation - value types
    • memory copying
      • baseline against memcpy() for differing alignment
    • queue interface
      • break conflated concerns to reduce blocking actions
      • API design mistakes
        • conflates “is something available?” with “is it empty?”
        • should not have used normal collections API for concurrent stuff

conclusion

  • AWS recently announced x1 instances
    • 2TB RAM + 100+ vcores!
    • lots of spinning wheels

Q&A

  • can you tell us more about XADD in java?
    • AtomicInteger, AtomicLong, etc

Martin Kleppmann, Transactions: Myths, Surprises and Opportunities

history

  • 1975: IBM System R
  • 2008? NoSQL
  • 2012? “NewSQL”
    • eg SQL on top of HBase
  • really a movement of transactions-or-not

ACID!

  • Härder & Reuter, 1983
  • “More mnemonic than precise” - Brewer, 2012

D for Durability

  • archive tape?
  • fsync to disk?
  • replication?

C for Consistency

A for Atomicity

  • not the same as in “atomic compare-and-swap”
  • not about concurrency
  • it’s about fault tolerance
  • rolls back writes on abort
  • better word might have been Abortability
  • aborts are super helpful because they collapse a whole bunch of error classes:
    • crashes, power failures
    • constraint violation
    • network fault
    • deadlock

I for Isolation

  • this is where we start talking about concurrency
  • = SERIALIZABLE?
  • in practice, multiple isolation levels
    • Repeatable Read
    • Snapshot Isolation
    • Read Committed
    • Read Uncommitted
  • can anyone here describe the difference of the top of their head? (one hand goes up)
  • this is a configurable property of many databases
    • though many don’t go as high as Serializable
    • and some call it Serializable but are actually weaker than this

isolation levels

preventing dirty reads and dirty writes

  • read committed prevents these anomalies
  • default in postgres, oracle, …
  • that’s enough for transactions, right?
  • what are the other weird isolation levels?

preventing read skew

  • this can happen under read committed!
  • concurrent transaction:
      • add 100 to x
      • subtract 100 from y
      • commit
      • read y
      • read x
  • reader can see old value of y, but new value of x
  • Repeatable Read and Snapshot Isolation were invented to prevent this
  • Traditional RR implementation is based on locking
  • SI is used more often now to avoid locks
    • poorly named in most DBs

write skew

  • invariant: at least one doctor on call at all times
  • doctors can trade shifts
  • transaction:
    • count how many doctors are on call
    • if >=2:
      • update
    • commit
  • but: two txns operate concurrently for two different doctors:
    • both see n >= 2
    • both remove themselves from rota
    • invariant violated!
  • pattern:
    • measure
    • decide
    • write
    • commit
    • by the time the write is committed, the premise of the decision is no longer true!
  • solutions
    • two-phase locking (pessimistic)
      • shared lock on measure step
        • block entire table for writes, whenever you make a large read
      • poor for perf
      • motivation for weaker isolation levels
      • System R team just didn’t hold locks for so long
        • this is what created their crazy isolation levels
    • H-Store (Stonebraker et al)
      • VoltDB, datomic
      • literally execute your txns in a serial order
        • trivally correct
        • as long as you make your txns fast enough!
          • otherwise your throughput will be abysmal
    • SSI (Cehill Röhm & Fekete)
      • Postgres
      • detect conflicts & abort
        • optmistic counterpart to pessimistic 2PL
      • Idea: locks like in 2PL
        • but locks don’t block, just gather information

what about other distributed systems?

  • microservices?
  • stream processing?
  • serializability across services?
    • atomic commitment
    • total ordering
    • consensus
      • paxos, raft, zab
  • strong connection between all of these things
    • coordination
    • failure amplification
  • geographic distribution makes this worse
  • distributed txns don’t work particularly well

without cross-service transactions

  • so where are we now?
  • compensating transactions
    • ≅ abort/rollback at app level
    • looks a bit like ACID A
  • apologies
    • detect & fix contraint violations after the fact, rather than preventing
      • overselling beyond stock level; issue voucher instead
    • looks a bit like ACID C

every sufficient large deployment of microservices contains an ad-hoc informally-specified bug-ridden, slow implementation of half of transactions

  • ordering of events
    • unfriend
    • post nastygram to remaining friends
  • friends service and posts service are separate services
  • notifications services pushing out emails etc, reading from posts and friends services
  • reordering due to delays
  • notification goes to old set of friends, not new set of friends. oops!
  • implicit causal relationships leading to violated expectations

current research

  • things between strict serializability and eventual consistency
  • causality:
    • partial order of reads/writes
    • can be maintained without global coordination
    • “consistent snapshot” in SI
      • consistent with causality
    • but also efficient?
  • isolation levels are divided into those that require coordination and those that don’t

references

  • four slides’ worth!

Q&A

  • how did we end up with this mess? naming etc
    • on the academic side, the database community and distributed systems community have historically been completely separate
      • evolved separate language for similar concepts
    • wouldn’t have many of current problems if they had talked to each other in 1980s
    • SQL standard had problems with isolation levels
      • failed to describe them in an implementation-neutral way
      • this is why Postgres uses SI but calls it REPEATABLE READ
  • also see hermitage, a tool to test what kind of anomalies different isolation levels of dbs allow

Elise Huard, FRP and Functional Game Programming

  • FRP intro
    • signals, generators, etc
player <-
  transfer2 initialPlayer
            movePlayer
            directionKey
            gameOver'

monster <-
  transfer3 initialMonster
            wanderOrHunt
            player
            randomNumber
            gameOver'

gameState = GameState <$> renderState <*> soundState

Dependencies:

  • player depends on gameOver
  • but gameOver depends on player!

Solution: introduce delay: gameOver' = delay False gameOver

generator :: Signal (SignalGen a) -> SignalGen (Signal a)

playLevel :: Signal (Bool, Bool, Bool, Bool) -- event signals
             -> LevelNumber -- pattern match on level number
             -> Score
             -> Health
             -> SignalGen (...)

cons of FRP?

  • if your game is sufficiently complex, you’ll have a lot of streams to manage
    • if you add a new stream, you might need to make every other stream tap into it
      • laborious for maintenance
  • some added complexity in handling side effects
  • performance?

pros of FRP

  • conceptually simpler
    • specifically, smaller units
  • testability
    • everything is a pure function
    • can exploit property-based testing
    • can record a user session to replay events
      • can use this to verify that a bugfix really works
      • elm does a similar thing

skepticism?

  • this is not a critical system
    • correctness is not as important as user experience

Q&A

Stephanie Weirich, Depending on Types

  • intro to stephanie:
    • working on dependent types in haskell
    • TDD (type-directed development, that is)
  • usually when people talk about dependent types, they talk about coq, agda, idris or similar
  • I’m going to talk about my favourite dependently-typed language: haskell!
    • wait, what?
    • can I really write dependently programs in haskell?
      • yes!
      • …with some caveats
      • we can’t port every coq/agda/idris program

Code is available at: https://github.com/sweirich/dth/tree/master/depending-on-types

the present

motivating example

  • a data structure with an invariant
  • eg a red-black tree
    • maintains balance and ordering
    • no two red nodes in a row
    • root is black
    • leaves are black
    • etc etc
  • Chris Okasaki functional pearl for inserting into a red-black tree
    • how do we know our insert fn preserves the red-black invariants?
    • can we maintain these invariants with types?
    • create a special type called RBT

creating red-black trees in agda

code

aside: you should all go to the oregon programming language summer school

data 𝐍 : Set where
  Zero : 𝐍
  Suc  : 𝐍 → 𝐍

data Color : Set where
  R : Color
  B : Color

data Tree : Color → 𝐍 → Set where
  E  : Tree B Zero
  TR : {n : 𝐍} → Tree B n → A → Tree B n → Tree R n
  TB : {n : 𝐍} {c1 c2 : Color} →
       Tree c1 n → A → Tree c2 n → Tree B (Suc n)

data RBT : Set where
  Root  : {n : 𝐍} → Tree B n → RBT

insert : RBT → A → RBT

Note how the TR constructor requires black subtrees, but the TB constructor allows any color subtrees.

Note also that RBT requires a black root.

The types have captured the invariant. RBT is a valid red-black tree by construction.

creating red-black trees in haskell

code

{-# LANGUAGE GADTs, DataKinds, KindSignatures, ExistentialQuantification, 
    TypeFamilies, StandaloneDeriving #-}

how are these different?

Haskell distinguishes types from terms; Agda does not

Types are special in Haskell:

  1. Type args are always inferred
  2. only types can be used as indices to GADTs
  3. Types are always erased before run-time

inserting in Agda

So we’ve contructed a red-black tree. How can we write our insert function now, and demonstrate it maintains the invariant?

Step one: split balance into two functions:

  • when we insert on the left, we know any potential violation is on the left
  • when we insert on the right, we know any potential violation is on the right

So we create balanceL and balanceR

We also know, when we call balanceL (or balanceR), that the root of the tree passed in is black

balanceLB : ??? → A → Tree c n → ???

-- what are the ???s
-- input: it may have two reds in a row. AlmostTree?
-- output: a valid tree with unknown root color. HiddenTree?

data HiddenTree : 𝐍 → Set where
  HR : {m : 𝐍} → Tree R m       → HiddenTree m
  HB : {m : 𝐍} → Tree B (Suc m) → HiddenTree (Suc m)

incr-if-black : Color → 𝐍 → 𝐍
    incr-if-black R x = x
    incr-if-black B x = Suc x

data AlmostTree : 𝐍 -> Set where
  AT : {n : 𝐍}{c1 c2 : Color} -> (c : Color) ->
       Tree c1 n -> A -> Tree c2 n -> AlmostTree (incr-if-black c n)

haskell

  • need some things to paper over type-term separation
    • singleton types for run-time type information
    • (plug for singletons library)

are these different?

Agda programs are total - ie they always terminate.

Haskell does not make this promise.

Is this ok?

Not proving things is simpler than proving things

  • Okasaki’s original version (simply typed): 12 lines
  • Haskell version translated from Agda: 49 lines
    • includes type defs & sigs
    • precise return types for balance functions
  • Haskell version from scratch (see git repo): 32 lines
    • includes type defs & sigs
    • more similar to Okasaki’s code
    • less precise return type for balance functions
      • I can’t do this in agda, because this precision is required for the totality argument

the future

extensions in progress

  • datatype promotion only works once
    • you can only promote things that are not themselves indexed
    • some agda progs have no ghc equivalent
    • theory for GHC Core
      • want to merge types and kinds together

why haskell? why not agda?

  • TDD (type-driven development)

what else do we need?

  • totality checking for GHC
  • type inference beyond HM
  • Programming support
    • auto case splitting, code completion

Q&A

  • any type-checker perf implications?
    • you can make your type-checker as slow as you want
      • Undecidable instances
    • we should think about this, yes.

Panel

Tony Hoare intro

  • the keynotes and a lot of the sessions talked about history

my first programming language proposal (1964):

  • Case Expressions
    • (ALGOL Bulletin 18.3.7 p20)
  • based on learnings from ALGOL 60
    • switch mechanism was less successful
    • conditionals much better
  • case expression
case n-2 of (a := 1 - y/z
else b := b + 1
else go to thirdcase)

translated to C:

switch (n-2) {
 case 1: a = 1 - y/z; break;
 case 2: b = b+1; break;
 case 3: goto thirdcase
}

The break statement here is odd. Maybe they thought it was more efficient to allow control to pass through to the next case?

Anecdote: missed break statement caused outage to telephone switching system of Eastern US. “I hope programming language designers accept this as a reasonable criticism”

Alternative: arrays of exceptions

Assigned explicitly to an array.

Like assigned goto in FORTRAN or ALTER in COBOL (which changes the target of a go to statement!)

You don’t know where you’re going to, but you know where you’ve come from!

Datamation COME FROM statement article – if you’re thinking as a flowchart, it’s entirely arbitrary whether you choose transitions as GO TO or COME FROM. ;)

next programming language design: Record Handling (1965)

AB21.3.6 p39

Records. I never used the word “pointer” because I felt I was modelling relationships between real-life things. Of course you would need pointers to implement this.

Introduced references to refer to records. References must be typed.

Simula-65 did not have type-checking of pointers (or references). Simula-67 did.

ALGOL W was the first language to have this.

“A contribution to the development of Algol” Wirth and Hoare

Comm ACM 9(6): 413-432 1966

Included classes, subclasses, garbage collection

No dangling references

The billion dollar mistake!

“In order to enable references to represent partial functional relationships… a special reference null is introduced.”

panel session

  • bruce tate: why did you first create a language?
  • joe armstrong:
    • I didn’t know I was doing it!
    • the main problem I was dealing with was: how to build fault-tolerant systems
      • it turned out part of the answer was erlang
  • don syme:
    • I started F♯ because I wanted the strongly-typed paradigm from ML available in .NET development and microsoft development
    • the best motivation is to make something you want to use
  • josh:
    • I created hack at facebook
    • facebook’s FE is php
    • having a real type system reduced the scope of a lot of bugs
    • would help our devs write more efficiently
    • (bruce: so you had a product that escaped the lab a bit, and a customer base dependent on php. this was an attempt to inject type sanity to php?)
    • yes, we had half a billion to a billion users
    • we can’t stop development. we can’t rewrite everything
    • (don: facebook is extraordinary for funding that.)
  • bruce: don, you mentioned type safety as being important, but you were on the CLR
    • yes, one of the first things I did at MS research was injecting generics into the .NET CLR
    • project 7 was part of this
    • they gave MS research early
    • we investigated what we needed for various pl paradigms
  • bruce: IBM tried this with the system object model with less success. but what about the types in the CLR?
    • the key thing was generics
    • I didn’t see the CLR as a restriction once we’d fixed it to be what we needed it to be
    • the aim of the original .NET design was to be a broad runtime for a wide array of languages
    • in practice
      • one main language C♯
      • a skin on that language: VB.NET
      • and F♯
    • we were interested in JavaScript but it didn’t stick
    • what is the common langauge? type system? API layer? to interoperate?
    • we see things like swagger for REST APIs as a common language
    • recurring theme: how do all these languages talk to each other?
  • bruce: joe, <type question>
    • we weren’t so much concerned about the software inside one machine, but rather the system as a whole
    • systems are intrinsically distributed
    • I didn’t care if the program was proven correct, I worried about the machine crashing
      • type systems don’t stop the machine being hit by lightning
    • I was looking at architectural problems
    • to me, dividing by zero was similar to being hit by lightning
    • I didn’t want the program to change or the way you reason about errors to change depending on the type & scale of the error
  • bruce: what features contribute to making more correct programs?
    • josh:
      • existence of a type system in PHP
      • we needed it to be usable and used
        • if it was 10% correct and used, that was better than 100% correct and not used; it’s still 10% better than raw php
      • we tried to deal with some php edge cases in the type system
        • some of them we just didn’t because they required compiler changes
    • don:
      • initial call was guided by O’Caml programming model
      • Hindley-Milner type inference
      • tuples
      • pattern matching
      • everything we did in F♯ since then has been to preserve that core
        • everything else just augments that core experience
      • type providers etc
    • joe:
      • fault tolerance
        • no errors in the system
      • continuous operation
        • should be able to continue after an error is detected
      • these were absolutely central right in the beginning
  • bruce: don, you’ve worked with some people at MS with some great teachers such as SPJ; one of his sayings is “you can’t unlaunch missiles”. F♯ is side-effectful so you can launch missiles. Is there anything in F♯ you regret and would want to take back?
    • structural comparison
      • when you define a tuple, you can compare for < = >
      • not worth it, but it’s done now
    • unsoundnesses, not necessarily in lang but in experience
      • library management software can easily introduce unsoundness into your workflow
      • if your manager doesn’t say that you’re not shipping with a required library, you can suddenly break
      • we did F♯ in a given context initially
        • .NET 2005
      • we picked up unsoundnesses from the ecosystem
      • it’s hard when you encourage correct, sound code in the language but the ecosystem has all these problems
      • the whole process has to be sound
  • bruce: yes, we see the same in elixir. getting the package management and dependency resolution right is hard; lots of places to make a mistake
  • same question to josh:
    • nothing I can say unequivocally “we shouldn’t have done that”
    • there are holes in the type system
      • some unexpected
      • they make sense if you are trying to convert php code
      • if you’re using a php array structure and it’s heterogeneous in certain ways, the type checker just gives up
    • I wish we thought whether we should have done this beforehand
  • bruce: joe: erlang’s been around a long time, and there’s a lot to second-guess. where might you have made a compromise?
    • loads of places
    • a lot was dictated by the machines available at the time
      • 8Mb memory
      • not inherently distributed
    • protocols not part of the language
    • much better formal checking of bounds
    • components that do well-defined jobs
    • we tend to do very big components and we can’t necessarily work out the source of a failure
    • an erlang node can have maybe millions of processes
    • would like a globally unique process id
    • email should be just sending to a process somewhere worldwide
  • bruce: do you have something you really respect in someone else’s language?
    • bruce (answering myself)
      • idris: this really changed the way I thought about productivity and correctness
      • it helped me think about why null was a bad thing
    • don:
      • it’s hard to answer, because you become sensitive to the fact that adding features push people out
        • some people can’t use it or understand it
        • everything’s a tradeoff, everything’s a distortion
      • cross-platform and open source is a thing I admire
      • we’re working hard to make this real for F♯
  • bruce: you’ve moved F♯ on beyond the stewardship of MS and developed the community, can you talk about that?
    • don: we put out a blog post to try to distinguish things
      • Visual F♯ vs F♯ the language
      • people are taking F♯ in all sorts of places
      • the word “F♯” should be considered a free term, like “Clojure” or “Scala”; shouldn’t have strong mental assocations with MS or Visual Studio
      • this is good for MS too
      • eg Azure supports linux, go, etc
      • process of getting here has been funny
      • as a researcher, I always wanted this
      • then suddenly, MS came following in a herd to do similar things with C♯ and tooling and stuff
      • the cloud thing moves towards open technologies
      • OSS itself is a strong economic dynamic
  • bruce: anyone else?
    • josh:
      • I wish we’d launched with a better installation process
      • the tooling wasn’t open sourced when hack the language was
      • the language features are secondary if noone is using it
    • joe:
      • quickcheck is brilliant
      • smalltalk image & SHA1 checksum
        • reproducibility of builds and state
      • don’t like these things being in git and distinct from the language
      • would like to refer to modules by checksum so I know exactly what it is
    • tony:
      • since I’m not a designer of langs and never have been, I’ve never been able to look jealously at other langauges
      • I quite like the separation logic concepts introduced by Peter O’Hearn in his concurrent languages
        • solution to race conditions in a shared-memory system
      • I support the view that the tools is where the action is
      • I would like to have concentrated more on the evolution of programs after first delivery
        • as an academic, we mainly wrote programs from scratch
        • wasn’t til we started teaching that we learned that most people do maintenance, not programming from scratch
      • as a theoretician, how can I help? my contribution could be to see the need for unifying theory on which all the tools can be based, which could be applied to different langauges, but could be specialized to the particular tools for each langauges
        • are these languages going to each have their own separate toolsets? no they are not.
        • we should make our tools adaptable to many different programming languages
        • the idea of a “compiler compiler” to generate compilers automatically
        • it’s more ambitious than that, despite that we never created a compiler compiler in the first place
@philandstuff
Copy link
Author

@vonneudeck fixed, ta!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment