Codemesh 2014, London

keynote: complexity is outside the code

@jessitron and @tastapod
slides

when I started programming:

architecture was always:
- box - box - cylinder
then layers:
- ideally:
  - box - box - box - box - box - box - cylinder
- but in practice, code gets snarled up between the layers
- every time you make a change, you need to change every layer

no silver bullet

in the assembler days:
- cryptic code and brittle code were both slain by compilers
but now we have multiple orthogonal problems
- latency
- deployment
- search
- security
- caching
- monitoring
which don’t get solved in a single sweeping way, like compilers solved the problems with assembly language

soa & polyglot persistence

architecture is now multiple boxes, multiple cylinders, directed acyclic graph
most of the code is glue
- because when you break things apart, you spend more effort gluing them back together

moore’s law

Herb Sutter (2004): the free lunch is over

the happy old days vs now:

lang stdlib
- now: also python and ruby and all of github
unix utils
- brew install it all!
database
- MOAR DATABASES!
algorithms
- read every thesis!

mistake: our goal is not to write code

“IT” makes more sense than “Software Engineering”
goal is in fact: minimize lead time to business impact
- lead time: starts when problem exists
- business impact: not “done done done”
@jessitron’s geeky akka simulation of a dev team, comparing:
- a team where they’re trying to write as quickly as possible, not much slack time for tests/monitoring
- a team where they don’t build as fast as possible, and have slack to do
- she graphed two metrics for each team:
  - number of implemented features
  - number of working features

estimation

“we’re the only industry on the planet that doesn’t give a range when giving an estimate”
compare: “22 days” with “half a month to 2 months”
- deliberately vague units
starts a conversation
- delivery manager: “you’re crazy! how can I plan with that estimate”?
- I say because:
  - domain
  - tech
  - security
  - perf
- I want the DM to be just as nervous as me about this stuff
- so that they ask questions about how we can reduce uncertainty

research

research is a first-class task
“leverage” is the right word for using technology
- because “leverage” increases risk
research is an ongoing activity
- research is not the same thing as a spike
- if all you ever do is a spike, you optimize your tech choice for time-to-hello-world
your architecture diagram should reflect your uncertainty and decisions you haven’t yet made
box - ? - cylinder

kaizen

inner loop: continuous learning/improvement
outer loop: questioning the process itself
we’re spending less time coding, and more time learning about other people’s code
- this is what you need to do to exploit reuse
IBM invented this in the 1950s
- every management grade had an equivalent tech grade
- VP <-> research fellow
- fixed cost, unlimited opportunity
- Benoit Mandelbrot
- Bill Joy

things other than programming that we do

evaluation
monitoring
deploy
tests
program
databases
filesystem
network
so… you’ve got to be good at all of them
- we tried specialising, but: if your job is to deliver to QA, all you’ll do is pass QA tests
- you need the whole feedback loop to do your job
- we can’t care about everything, that just leads to hopelessness and despair
- we can’t care about nothing
- we kind of just pick a couple and hope for the best
  - and hope that enough people randomly choose a particular issue and hope that something gets done

Don Syme, the F♯ approach to relaxation

this is a new talk!

starting position

the great disputes of compsci should be struggled with
they are chances to make a better, simpler, more relaxed world as much as create opposing camps
although want to avoid:
- “mustard watches: an integrated approach to time and food”
  - spoof paper lampooning arbitrary smushing of concepts together

then (2003)

functional languages were
- isolated
- noninteroperable
- used their own vms
interop standards were a mess:
- C-calls
- COM
- CORBA
- XML

things changed:

F♯ approach (also scala, swift, clojure):

embrace industry-standard runtime layers, influence them. compromise where needed.

relaxation achieved 😎

todays FP langs are immensely interoperable while staying true to FP principles
F♯ type providers raise interop to a completely new level
(some tensions remain: some techniquees not easily implemented on industry standard VMs)

tl;dr: where we are now

F♯ is open, cross-platform, independent
accepting contribs
there is now an F♯ software foundation
- becoming a legal entity as a US nonprofit right now
- http://fsharp.org
xamarin provide F♯ tooling for android and iOS
usual packages for linux
visual F♯ tools for windows
F♯ compiler service powers many other tools
F♯ community is very self-empowered

the puzzle

pieces:
- lang
- openness
- cross-platform
- community
- ide tools
  - xamarinstudio, emacs, vim, …
- ecosystem
  - embrace an open ecosystem of libs, tools and type providers

F♯ compiler

flow of contributions:
github.com/fsharp/fsharp
- visualfsharp.codeplex.com (public git repo)
  - this is what gets packaged into the microsoft visual F♯ tools
- nuget packages
  - into emacs/xamarin tools

language design process

next release: 4.0
fsharp uservoice

F♯ Compiler Service

projects which use it:
- emacs plugin
- vim plugin
- Visual f♯ power tools
- IFSharp
- FsReveal – reveal.js-based literate programming
  - the compiler service is run over the fragments of code to get IntelliSense and suchlike

relaxation achieved:

enterprise quality + openness + community + tooling = :D

then (2003)

in 1990s FP langs were anti-OO
even today any mention of objects in Haskell circles gets an “urggh” reaction
Compiler Design with Standard ML - Chapter 9, Object Oriented Programming:
- object: to express a dislike or distatse for something

now: object + functional

embrace objects, make them fit
- particularly for interop
but don’t embrade full object orientation
features to allow merging paradigms:
- object expressions to allow expression-oriented programming
some functional features clash with object features:
- particularly currying vs method overloading

anecdata comparison between C♯ and F♯

interesting comparison:
- in C♯, nothing stops you creating circular dependencies, and you have to make special efforts to avoid it
- in F♯, it’s really hard to do this.
in theory and in practice, unmoderated intra-component cycles are a disaster
- unrestricted circularity is like crack: once people get used to it, it’s hard to wean them off
language mechanisms that enforce acyclicity are good
pattern matching is a lovely feature
- but without a way to name, abstract, and parameterize patterns, it’s a real problem in practice

enter F♯ Active Patterns

similar to Scala extractors?

let (|``Even Number``|``Odd Number``|) n =
  if n % 2 = 0 then ``Even Number`` else ``Odd Number``

let (|Const|_|) (e:FSharpExpr) = ...

relaxation achieved! pattern matching <=> abstraction

code <=> data: Type Providers

operate in a world between static and dynamic languages

sync <=> async

numbers <=> numbers-with-units

GPU <=> CPU

fsharp.org/use/gpu
F♯ quotations feature helps here

what’s ahead?

most major areas of tension have been resolved
still some areas that bug me:

repl <=> distribution and scale

conclusion

false opposites are worth struggling over
simplicity and unity are key

NoSQL is dead

Eric Redmond

naming things is hard

and NoSQL is a rubbish name
it’s possible to name things that don’t exist like fairies

my history

working in dbs since 1999 (as a researcher)
seven databases in seven weeks
a little riak book
now at basho

distributed systems

you should no more make your own distributed system than you should make your own crypto library

the thesis:

the whole talk right here:

Models which aren’t useful should die. NoSQL is not a useful model, so it should die

NoSQL is ill-defined
NoSQL is a bad classifier
What this means for you

We get people who say “I’m stuck on Mongo or Riak” – this is like saying “I’m stuck between hat or shoes”

NoSQL is ill-defined

wiki: NoSQL (often interpreted as Not Only SQL)
- but if you’re running a single postgres node, are you running NoSQL?
- Not At All SQL? NAASQL (pron. Nâsgul)
people desire NoSQL because of some (perceived or actual) shortcoming in SQL

perceived deficiencies

no horizontal scaling
- “newsql” – VoltDB
- OceanBase
no semi-structured data
- postgres: JSONB
- SQL Server 2008 supports semi-structured datatypes
- mongolike
  - mongo frontend to postgres
slower dev
- often considered a corollary to no semi-structured data
- ORMs
- just look at RoR!
can’t use modern tools (like MR/Hadoop)
- MySQL & Postgres both have hadoop connectors

actual deficiencies

terrible in representing complex graphs
high availability
require indexes for speed
- even if all you have is an id
- a key-value store doesn’t have an index for keys, that’s just what you look up by
distributed SQL is often a subset, or has caveats
- “we’re SQL98 compatible but you can’t do nested JOINs”
SQL isn’t always required or helpful
- sometimes all you need is something like S3

NoSQL is a bad classifier

not just ill-defined, actually dangerous

DB classifications

by models (Graph, doc, col, kv)
by network topology (mesh, partial-mesh, tree)
by natural distribution (rdb/graph, doc/col/kv)

models

Graph

http://en.wikipedia.org/wiki/Graph_database
easiest thing to model
- boxes and lines
easy to query (cypher)

START x=node(0)
MATCH x
RETURN x.name

stores
- neo4j (high perf, ACID)
- hypergraphDB (directed hypergraph)
- titan (distributed)
- ArangoDB (flexible modelling)
  - can also act as KV or doc
  - (wait, where does this live in our model again?)
- SparkleDB (RDF, SPARQL)
- InfinityDB (distributable, embeddable)
  - parts of your graph in a mobile phone, parts in a cloud

Key/Value stores

examples
- riak (critical data, simple operations)
  - high availability & operations
- aerospike (specialized for SSD+DRAM)
  - distributed
- redis (speed, fancy types, messaging)
  - horribly broken clustering, so only use single-node instances
- leveldb (embeddable)
message: despite sharing a data model, the use cases here are massively diverse

column store

row keys, column families
- rows are an ephemeral concept
- KKV? (row key, column family key)
eg
- BigTable
- HBase
  - ordered, sparse (BigTable based)
- Cassandra
  - CQL
    - thought experiment: if CQL expanded to full SQL, is it no longer NoSQL?
- Accumuluo
- Hypertable

document datastore

often the first thing people think of when they hear “NoSQL”
queries via field of document
normally JSON, but could be XML or whatever
eg
- CouchDB
  - embeddable, replication
- Couchbase
  - distributed, failover
- MongoDB
  - easy to program
- RethinkDB
  - created by former mongo users
  - distributed joins
  - atomic updates

this is too limited

riak + search is a document datastore
cassandra can do K/V
ArangoDB (document)
Postgres (column -> hadoop)
ElasticSearch (inverted index?)
(graphs are hard to replicate, because walking paths in a graph is difficult)
- riak used to have link-walking, they took it out because it was too much overhead

model types: revised

K/V (with or without indexing)
Graph
Other
is this enough?
- relational DBs can still do K/V and “other” reasonably well

Classify by topology?

what’s entailed by distribution?
- what are you trying to achieve?
- sharding vs replication
what does it mean to have a network partition?
- the CAP theorem
  - quick explanation:

say you’re in a bar, and three people order a beer. one goes to the toilet, the other two switch to whisky. If you ask the separated one what the last drink the group had, they can either answer: “I don’t know” or “beer”.

http://aphyr.com/posts/313-strong-consistency-models
- there are lots of types of consistency
  - strong serializable, linearizable, serializable, read-your-writes…
- not all are banned by the CAP theorem in AP systems
- you can have odd things like monotonic consistency
harvest/yield tradeoff
- you can’t guarantee 100% harvest and 100% yield
- harvest is proportion of request that completes
  - 50% of result set responds
- yield is probability that request responds at all
FLP Impossibility Proof
- safety or liveness but not both
  - safety sounds like consistency - is it?
- paxos is safe but not necessarily live
- two-phase locking makes neither guarantee
there’s loads of these issues
you can’t make a decision here based on “I need a NoSQL database”
synchronicity is not the only problem (think lightcones)
- intention is the problem
- what does a conflict mean? what does conflict resolution mean?
  - first write wins?
  - last write wins?
  - CRDTs?
- this depends on the domain

Single Node

neo4j
leveldb
hbase
couchdb

Mesh

every node speaks to every other node
no SPF
easy to operate
can limit cluster size
- chattiness scales quadratically with cluster size
eg
- riak
- HBase (if configured so)
- Couchbase
- BigCouch
- all different data models!

Partial Mesh

don’t let every node talk to every other node
can introduce complexity to operations
eg
- riak + MDC (multi-dc replication)
- HBase (if configured so)
- Cassandra
  - DC-aware

Tree/Star

consistency is easy to guarantee
but: single point of failure
leader election
these systems often assume you can only get one clean partition at a time
- three-way partitions can prevent leader election
  - lose availability, fail to gain consistency
eg
- mongo
- HBase (if configured so)
- Postgres/Mysql (more star than tree)

classify by natural distribution

hard to distribute
- graphs
- relational joins
easy to distribute
- key/values

hard, but not impossible

titan (distributed graph)
infinitydb (distributed graph)
voltDB (distributed SQL)
postgres cluster (distributed SQL)

easy, but not always there

redis (KVs are easy to distribute, but Redis Cluster sucks)
- don’t roll your own distributed system
mongo (document can distributed, but mongo tends to be tough to scale/admin)
- every time you want to double your capacity, you add three nodes
- 3 -> 9 -> 12

what else?

time series, FTS, ranges
defined schema/schemaless/opaque binary
large object storage
HA/SC, Harvest/Yield
- Riak can choose HA or Strong Consistency
  - both in the same DB! (although not on same dataset)
Message patterns (req/rep, pub/sub)
stream processing, datastream mining
dev friendliness

what does this mean for you?

future

polyglot DBs with data oriented middleware
- (this isn’t a thing, but it should be)

Categories for the Working Programmer

Jeremy Gibbons, University of Oxford
goal: show how category theory inspires better code
but you don’t really need the category theory: it all makes sense as functional programs (Haskell) too.

functions that consume lists

Two equations, indirectly defining sum:

sum :: [ Integer ] -> Integer
sum []     = 0
sum (x:xs) = x + sum xs

Programs are equations, and are built out of equations.

This pattern works for other fns too. For any given f and e, these equations uniquely define h:

h []     = e
h (x:xs) = f x (h xs)

The unique solution is called foldr f e in Haskell:

foldr :: (a -> b -> b) -> b -> [a] -> b
foldr f e []     = e
foldr f e (x:xs) = f x (foldr f e xs)

This shows that patterns can be expressed as pieces of code.

applications of foldr

sum     = foldr (+) 0
and     = foldr (and) True
decimal = foldr (\d x -> (fromInteger d + x) / 10) 0
id      = foldr (:) []
-- etc etc

What’s special about lists?

…only the special syntax. We could have our own list:

data List a = Nil | Cons a (List a)

Then we could have

foldList :: (a -> b -> b) -> b -> List a -> b
foldList f e Nil         = e
foldList f e (Cons x xs) = f x (foldList f e xs)

This works for other data types

data Tree a = Tip a | Bin (Tree a) (Tree a)

Then we could have

foldTree :: (a -> b) -> (b-> b->b) -> Tree a -> b
foldTree f g (Tip x)     = f x
foldTree f g (Bin xs ys) = g (foldTree f g xs) (foldTree f g ys)

It’s not always so obvious

Rose trees (eg for games or XML)

data Rose a = Node a [Rose a]

(aside: name Rose comes from rhododendron (greek for rose tree))

What’s the right notion of fold here?

foldRose1 :: (a -> c -> b) -> (b -> c -> c) -> c -> Rose a -> b
foldRose1 f g e (Node x ts) = f x (foldr g e (map (foldRose1 f g e) ts))

-- but suppose we might want to process list nodes in a different order?

foldRose2 :: (a -> b -> b) -> ([b] -> b) -> Rose a -> b
foldRose2 f g (Node x ts) = f x (g (map (foldRose2 f g) ts))

-- even more general:

foldRose3 :: (a -> [b] -> b) -> Rose a -> b
foldRose3 f (Node x ts) = f x (map (foldRose3 f) ts)

Which one should we use? Category theory can guide this decision.

Haskell gets folds for non-empty lists “wrong”:

foldr1, foldl1 :: (a -> a -> a) -> [a] -> a

this won’t let you define the identity function on nonempty lists

Preparing for genericity

Separate out list-specific shape from type recursion:

-- two-level type definition of list
-- ie ListS is nonrecursive, and Fix doesn't know about lists
data ListS a b = NilS | ConsS a b
data Fix s a = In (s a (Fix s a))
type List a = Fix ListS a

eg, [1,2,3] is represented thus

In (ConsS 1 (In (ConsS 2 (In (ConsS 3 (In NilS))))))

convenience fn:

out :: Fix s a -> s a (Fix s a)
out (In x) = x

This is a lot of boilerplate that you wouldn’t want to actaully write

but it’s a general way of constructing data types to help understand them

Shape is mostly opaque, just need to ‘locate’ the as and bs

bimap :: (a -> a') -> (b -> b') -> ListS a b -> ListS a' b'
bimap f g NilS        = NilS
bimap f g (ConsS a b) = ConsS (f a) (g b)

-- closely related to 'fmap' from Functor

Now we can define a more cleanly separated version of foldr on List:

foldList :: (ListS a b -> b) -> List a -> b
foldList f = f . bimap id (foldList f) . out

-- eg foldList add :: List Integer -> Integer, where

add :: ListS Integer Integer -> Integer
add NilS        = 0
add (ConsS m n) = m + n

The categorical view, in a nutshell

Think of a bifunctor S. It is also a functor in each argument separately.

An algebra for functor S A is a pair (B,f) where f :: S A B -> B

A homomorphism between (B,f) and (C,g) is a function h :: B -> C such that

h . f = g . bimap id h

-- (note bimap id == fmap, so)
h . f = g . fmap h

An algebra (B,f) is initial if there is a unique homomorphism from it to each (C,g).

eg (List Integer, In) and (Integer, add) are both algebras for ListS Integer:

In  :: ListS Integer (List Integer) -> List Integer
add :: ListS Integer Integer -> Integer

and sum :: List Integer -> Integer is a homomorphism from (List Integer, In) to (Integer, add). In fact (List Integer, In) is the initial algebra.

Theorem: for all sensible shape functors S, initial algebras exist.

This means that: whatever shape you pick for your datatype, you can define a corresponding notion of folds, which will be in some sense “most general” for that datatype.

Duality

Recall

fold :: Bifunctor s => (s a b -> b) -> Fix s a -> b
fold f = f . bimap id (fold f) . out

reverse certain arrows:

unfold :: Bifunctor s => (b -> s a b) -> b -> Fix s a
unfold f = In . bimap id (unfold f) . f

and I have a function which will generate a data structure for me. Eg iterate, repeat,…

If you pick the right abstraction, the relationship between fold and unfold is obvious. If you pick the wrong one it doesn’t make it clear:

unfoldr :: (b -> Maybe (a,b)) -> b -> [a]
unfoldr f b = case f b of
    Nothing -> []
    Just (a,b') -> a : unfoldr f b'

Conclusion

Category theory is a good organisation tool for library designers

Helps you write better code, with less mess

The maths is really quite pretty

but the Haskell makes sense on its own too.

http://patternsinfp.wordpress.com http://www.cs.ox.ac.uk/jeremy.gibbons/

TDD, as in Type-Directed Development

Clément Delagargue
clementd

thinking with types

from a map, I can get a value…
- maybe
from a string, I can parse a string…
- maybe
want to abstract away this “maybe” concept
Option[+A] type
Option[String] + String => Option[Int] = Option[Int]
and I can end up with code which is correct by construction
- our code obviously has no errors
  - (cf Tony Hoare)
tests can only demonstrate the presence of errors, not the absence
types can give mathematical certainty of properties
Types ⇔ Properties
provably > probably
but if I can’t prove interesting properties, it’s not useful. so we need:

expressive type systems

parametricity

single most important feature: parametricity or parametric polymorphism
- aka generics
- I don’t see it as a code reuse tool
- it gives you lots of expressivity, by capturing the notion of “I don’t know or care what type a is”. This gives useful guarantees. eg:

def f[A](x: A): A

There is only one possible implementation of this function: the identity function. I don’t know what type A is, so I have no way of creating a value of that type other than by using the only instance I have, x.

Similarly:

def compose[A,B,C](
  g: (B => C),
  f: (A => B)
): (A => C)

again there is only one possible implementation of this type.

This is more interesting:

def rev[A](x: List[A]): List[A]

There are multiple possible fns here (eg identity, reverse, sort all fit the signature), but I can still guarantee some things from the type alone:

Any value in the list rev(xs) must also be in xs, because rev has no other way to create an instance of A.

These are the famous theorems for free!

restrictions

There are ways to break parametricity using common language features:

no nulls

null is a value which inhabits every type. This breaks parametricity guarantees: eg with nulls I can define:

def f[A](x: A) = null

no case matching on class

def f[A](x: A) = match x {
  case (x : String) = "foo"
  case (x : Int) = 3
  case _  = x
}

no exceptions

def f[A](x: A) = throw RuntimeException

no side effects

def f[A](x: A) = {
  launchBallisticMissiles();
  x
}

types make communication easy

you should learn Haskell type syntax:

a -> a  -- function from a to a (lowercase indicates a is *any* type)

Int -> Int -- function from concrete type to concrete type

a -> b -> a   -- equivalent to:
a -> (b -> a) -- think of it as a two-arg function

-- typeclasses:
(Ord a) => [a] -> [a]
-- here Ord is a typeclass.  'a' now means "any type which has a
-- natural order"
-- we still get parametricity, but we can restrict it so it's not over all types

-- fn to remove duplicates:

Eq a => [a] -> [a]

Hoogle

Can use hoogle to search on type: hoogle “Eq a => {a} -> {a}”

Hoogle is smart enough to be more general: hoogle “{Maybe a} -> Maybe {a}” which can find:

sequence :: Monad m => [m a] -> m [a]

this fits because Maybe is a Monad, so the sequence function can be called on it.

parametricity cont’d

Recall rev:

def rev[A](x: List[A]): List[A]

The only extra thing we need beyond the types is this statment:

rev(xs ++ ys) == rev(ys) ++ rev(xs)

This kind of argument leads to property-based reasoning, and property-based testing.

Properties are perfect for ensuring you aren’t caught by edge cases.

Properties are often referred to as laws, which must hold but aren’t enforrced by the type system. So your reasoning becomes:

Types + Operations + Laws

which is another way of talking about algebras.

summary

types are
- high-level reasoning tool
- communication tool

qs

what about dependent types?
- I worry that your types become as difficult as code to work with

Implementing languages on the BEAM

Robert Virding

what IS the BEAM?

a VM to run Erlang
- not designed to run other languages
  - like the JVM in that respect
properties of the Erlang system:
- lightweight, massive concurrency
- async comms
- process isolation
- error handling
- continuous evolution of the system
- soft realtime
- immutable data
  - baked into the system
- predefined set of data types
  - tuples, lists, maps
  - that’s it
- pattern matching
  - primitives built in to the machine for doing this
- functional language
- modules
- no global data

erlang compiler

can work on files and erlang abstract code
can generate .beam files or binaries
has Core, a nice intermediate language
- can be input to the compiler
- simple and regular
- easier to compile to

internal languages

erlang -> core erlang -> kernel erlang -> beam assembler
optimization passes between
you can get anywhere in the compiler
you can even write the assembler directly (though not recommended)

core erlang

simple functional language
“normal” lexical scoping
just the basics
- no records
- no list comprehensions
supports pattern matching
most optimizations done on core
dialyzer speaks core
- sort of 🙍

example

argh wall of text
key point: Core Erlang is the nicest thing to target when implementing BEAM languages

support tools

leex - lexer generator
yecc - parser generator
syntax tools - for building erlang code
XML parsers (xmlerl)
compiler

leex

based on lex/flex but simpler
uses regexes to define tokens
generates scanning functions
- direct use
  - string/2
- for the i/o system
  - token/2, tokens/2

Definitions.

U = [A-Z]
L = [a-z]
D = [0-9]

Rules.

{L}({U}|{L}|{D}|_)* :
        {token,{atom,TokenLine,list_to_atom(TokenChars)}}.
({U}|_)({U}|{L}|{D}|_)* :
        {token,{var,TokenLine,list_to_atom(TokenChars)}}.
%[^\n]* : skip_token.

Usage:

{ok,Tokens,EndLine} = my_scan:string(Chars, StartLine)

{ok,Tokens,EndLine} = io:request(Io, {get_until,'',my_scan,tokens,[StartLine]})

yecc

basically yacc for erlang
LALR(1) parser generator
generates parse/1

Nontermanals E T F.
Terminals '+' '-' '*' '/' '(' ')' number.
Rootsymbol E.

E -> E '+' T : {'+','$1','$3'}.
E -> E '-' T : {'-','$1','$3'}.
E -> T : '$1'.
T -> T '*' F : {'*','$1','$3'}.
T -> T '/' F : {'/','$1','$3'}.
T -> F : '$1'.
F -> '(' E ')' : '$2'.
F -> number : '$1'.

conflicts

shift-reduce errors
- generally prefers shift to reduce
reduce-reduce errors
- would like to write:

expr -> pattern '=' expr ...

but ‘pattern’ and ‘expr’ aren’t distinguishable using 1 token of lookahead as LALR(1), so must write:

expr -> expr '=' expr ...

and extract pattern later.

new languages

Leonard Cohen: New Skin for the Old Ceremony
you can access all existing erlang code, OTP, etc
get a language which is as optimal as it can be for the VM

examples

elixir:
- generates erlang
lfe:
- generates core erlang

Lisp-flavoured Erlang

goals:
- provide lots of lisp goodies
  - homoiconicity and macros
- seamlessly interact with vanilla erlang
modify the language to fit with erlang/otp
- not common lisp or scheme
- no mutable data
- only erlang data types
- macros are only compile-time
- Lisp-2 (not Lisp-1)
  - ie separate namespaces for functions and values
- is as fast as vanilla erlang

core

LFE core primites are core erlang
so it’s almost a one-to-one mapping from erlang to core
exceptions:
- lisp ‘if’ goes to core ‘case’

elixir

dynamic functional language
influenced by ruby
- but not ruby on BEAM
metaprogramming capabilities via macros
- though cannot change syntax
many libraries and interfaces standardised and rewritten
compiles to erlang (rather than core erlang)

“new” datatypes

elixir has structs
we have to fake it
- use existing map datatype
- special format: key ‘__struct__’ with value of struct name
- special syntax and libs to create and recognise the new ‘type’

New languages

not just basic erlang
- different semantics
- non-erlang datatypes
- non-erlang handling of data
erlog (prolog)
luerl (lua)

erlog

can use leex
grammar needs backtracking
- cannot use yecc or any other parser generator
- hand written parser
data:
- good mapping
- structures ⇔ tuples
logical variables and unification
- they are both “global” and “mutable”
- can be unified
- changes seen everywhere
- we are cunning and use {Index}
backtracking
- erlang doesn’t have it
- use continuation-passing style
- efficient and works
- executing is not “interruptible”
  - control is in compiled code
- if we want to have better control we need an interpreter

luerl

implements standard lua 5.2
lua is
- simple, rather neat imperative lang
- dynamic lang
- lexically scoped
- mutable variables/environments
- embeddable
leex + yecc good
- one reduce-reduce conflict, easily resolved
datatypes
- mutable key-value tables
- global
need to explicitly manage global data
- global table store
  - orddict/array/ETC tables
- environment store
- environments
- tables

Qs

don’t you want to make a concurrent lua?
- that’s a good question but will take time to answer
wouldn’t it be nice to make the erlang GC available to other languages
- yes, but it’d make it harder for erlang itself

Programming and testing a distributed database

Reid Draper, @reiddraper
distributed databases have a fundamentally different architecture from single-node counterparts
need to look at:
- Communication and messaging
- correctness testing
- benchmarking
- abstraction

communication and messaging

single-node dbs don’t need to worry here!
distributed systems need to consider all sorts of things, starting with topology
“8 fallacies of distributed computing”
1. the network is reliable
2. latency is zero
3. bandwidth is infinite
4. the network is secure
5. topology doesn’t change
6. there is one administrator
7. transport cost is zero
  - both monetarily and computationally
8. the network is homogeneous

RPC

local and remote procedures are indistinguishable
procedures are familiar
let’s lift distributed programming to the same abstraction
RPC claims transparency
RFC 674
unfortunately.
- this abstraction fails
- handling failure
  - what is the remote node never responds?
  - what if the TCP conn is closed?
  - did it fail halfway through, or right before it returned to us?
    - is it safe to retry?
  - procedure call abstraction tells us nothing here
- why don’t we get a stacktrace?
  - we just never hear back
  - wasn’t this supposed to be transparent?
- what if our RPC call makes another RPC call?
- how do you serialize a file descriptor or a db connection?
a lot of these problems were known about early on:
- RFC 684 critiques procedure calls as a paradigm for distributed computing

message passing

we send a message to a remote host as a basic operation
no expectations at this level
- higher-level protocols built on top
- when I send a message, I expect a reply
erlang’s actor model embodies this

Bloom (UC Berkeley)

http://boom.cs.berkeley.edu/

bawden’s ideas

implementing distributed systems using linear naming, 1993, Bawden

correctness testing

unit testing

Typical unit testing: linear effort for the amount of coverage/confidence we need in our code.
if we want 50 tests, we need to write 5
are we going to think of edge cases like adding to MAX_INT?
nondeterminism kills unit testing too

property-based testing

QuickCheck
- haskell, erlang, scala, clojure

PULSE

builds on property-based testing
distributed system is inherently nondeterministic at runtime
want to avoid “works on my machine”
want to find race conditions
pseudorandomly generate schedules between different distributed nodes
- completely controls interleaving of operations on separate nodes

simulation testing

simulant
- records actions of test into database while test is running

benchmarking

example

imagine we have a cron job running on a single node to take a backup
- make tarball, send to S3
- uses a lot of network bandwidth
- what effect does this have on the system?
need to capture this sort of thing in your simulation

abstraction

programming languages

erlang

language-level api for distributed communication

haskell

libraries for dealing with streaming data
- powerful list processing with map/filter/fold
- can push that into stream processing as well
- conduit library treats a TCP conn as a stream:

tcpConn
$$ map parseInt
$$ CL.fold (+) 0

durability

single node is simple
can make a distributed database which is more durable than any single node in the system is

availability

in a three node system:
- is it okay to return to user before all nodes have received a write?
- one may fail, but that’s okay
  - we may have to repair later

Chad Fowler, Tiny

TDD with kent beck

TDD is about pushing complexity away
solve the easy part first, the bit you understand
push the hard part away
when you get to the hard bit, solve the easy bit of that first
etc

Keep things small

dose of dissonance
- massive, elite team
- huge, efficient codebase
- etc

Limits of memory

George Miller - The Magical Number Seven, Plus or Minus Two: Some Limits on our Capacity for Processing Information
long-term memory is (effectively) limitless
short-term memory is limited
- analogy with registers in processor
- “chunking” helps hold things together
Nelson Cowan
- showed that chunking was responsible for getting up to 7 ± 2
- in fact limit was more like 3 or 4
some people can do more, but stress brings them back down

code

“this is the easy one that everyone understands”
I used to wield the Refactoring book, and talk about design patterns, etc
anecdote: the 10,000-line Java Bean
- I felt I had failed
- new rule: every method can only be 5 lines long
  - every class can only have 5 methods
  - it didn’t work, but it seemed like a good idea
small:
- functions
  - in haskell, it doesn’t matter if your parameter names are called x, f, because the fn is so small you can see what’s going on
- parameter lists
- services
- actors
- applications
- data infratructure
  - “wouldn’t it be easier if every user had their own database?”
  - we used this idea to broke up our monolithic database into a bunch of small ones
- commits
  - large commits are:
    - hard to understand
    - hard to review
    - hard to revert

Test everything that could possibly break – Kent Beck

“that’s great, but my code is inherently big”
- no it isn’t
- this is the easy part
- you know how to do this

Teams

usually people say that big companies suck
- cubicle-dilbert-office space environments
what works: take big teams and split them into smaller, independent teams
“Zero to One” by Peter Thiel
Dunbar’s Number
Gore-Tex practice of deliberately capping the size of an office to 150 people
Military notion of fireteam – small (4) autonomous team
Putnam: team size can be key to a successful software project
Jeff Sutherland: Keep Team Size Under 7!
larger teams need leadership
- leadership is hard
- we don’t know how to do it well
larger teams have a communication overhead
single responsibility principle

Projects

2005/6 – worst burnout of my career
the big rewrite
they had a massive php codebase
we talked about extracting pieces
doing the strangler pattern
but we ended up doing a year long rewrite instead
we shipped it, and the customers hated it
- they sent hate mail
I should have stopped it
it self destructed
it took me years to recover from that project
- maybe I never did?

The problems with big projects:

too much change at once
- small change is easier to introduce successfully
big risk
no real reward until you get to the end
- you can’t learn anything as you go
http://ww2.cfo.com/risk-management/2014/06/why-big-projects-go-bad/
Standish CHAOS report
- small projects much more likely to succeed than large projects (76% vs 10%)
- http://www.versionone.com/assets/img/files/ChaosManifesto2013.pdf 8th page (numbered 4 on the page itself)

Should we just get better at dealing with larger things instead of using a crutch? (maybe. maybe you can be the person that can memorise 9-11 items instead)

Isn’t this all the same as decomposition, cohesion, etc? (yes.)

This conference is focussing on tools and/or techniques, which will have an incremental impact. But making things small will have a dramatic impact.

“Big” is worse than dogma, partly because it’s our default setting

References: https://www.wunderlist.com/list/136225945

Cheats & Liars: The Martial Art of Protocol Design

Pieter Hintjens
making protocols vs self-organizing around problems
statement: protocols are contracts
- who enjoys signing contracts? say for rental, loans, etc
- what about if you have no contract? it’s far worse
- contracts are a cost of doing business
- contracts are a tool for reducing conflict
- in software you need this a lot
- groups work together best when they have contracts between them
- a protocol isn’t just a TCP RFC with frames and headers etc
  - it’s a way of working together
when you do this, you have to define the parties/roles
- a protocol for pizza isn’t just “there will be pizza”
- you need to identify pizza providers, pizza consumers
  - types of ingredients, sizes, delivery time, price, temperature on arrival
  - you don’t describe the name of the chef
contracts need to be specific, but also minimalistic
cheats and liars try to bamboozle with irrelevent extra language
it’s hard to fake a pizza contract
- you can sell low-quality pizza but you’re still technically fulfilling the contract
then you get a natural market
consumers can choose between suppliers
contracts should also protect against the cheats and liars
- make it hard to cheat
- detect and punish when it happens
in the early days of the internet, there was the “embrace and extend” wars
- the big players own the standardization
- AMQP standardization was an insane amount of chaos and pain
  - people pay lip-service to standards but want to carve out their own captive markets
- large-scale standardization is subject to regulatory capture
if protocols are useful they will attract money
- which will attract predatory behaviour
- stuff the committee with your friends
- push through changes which noone understands and can’t fight
- patent it
- trademark it
  - eg Bluetooth
- hire all the people who make the protocol
- copyright it
- all of these things happened to AMQP
  - we will not bet on these protocols
  - the protocol had been killed
  - we came up with a protocol process that couldn’t be captured
ZeroMQ:
- write a spec
- if that spec is viable people might try to capture it
- we publish the spec under GPL
- if at some point the author goes insane and makes changes, you just fork it
  - fork is GPL too
- legal wrapping is designed to be capture-proof
workshop: write an RFC
- start very simple: send “hello” on one port, receive “world” on another port
- lesson: do not solve problems you don’t actually have
  - don’t solve hypothetical problems, no matter how sure you are you’ll hit them tomorrow
  - determined, applied laziness is essential
- early ZeroMQ had no versioning, framing, etc
- it was crap
- we spent no time arguing about anything
- second version added a version header
- next version added metadata
- then handshake
- then security
- very gradual
- could prove everything
- nothing was there that wasn’t desperately needed
- engineers love to create stuff
  - when we create stuff, we’re usually wrong
  - we made assumptions
  - we don’t actually need it
layering:
- secret to long-term success
- when you write a contract, it’s worth nothing
  - then you give it to people and ask them to use it
  - if it’s valuable to someone, they make their own implementation
- when your business is based on your implementation of my contract, then my contract wins
- the way you make changes to a contract is:
  - make a new contract which is a modification of the old
  - make it backwards-compatible
there’s no hurry – this takes years
you get the power to create very large, distributed systems of people
- “living systems”
- not necessarily for one business
- can work very efficiently
“I see software as a tool for destroying large, inefficient companies”
- protocols are a way to organize people to destroy their business

Q&A

protocols arise out of a problem, not out of nothing. say where you have a lot of different people doing the same thing in different ways. does that change things compared to your description? your MVP needs to convince people to switch
- one of my projects is called edgenet
  - I want everyone on the planet to use it
  - I want to start small
  - I want to prove the problem is a valid problem
    - I will not speculate
- you need to set clear rules and boundaries
- bad actors don’t like this and will stay away
  - you can’t afford verbal contracts
  - you need to have a documented process, written by someone as cynical as me
  - once a year I will ban somebody
    - you don’t just absorb damage from bad actors
    - you make it clear that it is not acceptable
in a self-organizing decentralized team, how come you get to decide who gets banned?
- you can’t. but you must own your own specific project
- there are other projects with criminals and cheats, which you can’t control
- but they just make you look better
- you should warn your friends
  - blacklist people and spread the news around
  - and it’s not your fault when there’s a conflict
  - when it doesn’t work, at a certain point
  - you need multiple people, ask around
    - if other people are having issue with the same person, don’t be afraid to take steps to ban them
  - it’s not innocent, it’s predatory maliciousness
  - punish the evil
what do you think about the upcoming MQTT and all that?
- big companies make stuff.
- then they try to push it through
- I’m running this devroom at FOSDEM on the IoT
- the MQTT guy came along and said he wanted to make a devroom on MQTT
  - I said we don’t want to promote particular products
  - turned out he was a marketing manager
- MQTT is one of these very sad, corporate, zombies
  - it’s a big, nasty, clumsy, stupid piece of shit
  - I would never build anything on that
  - go back to raw TCP, it’s better than MQTT
- most software is rubbish, so this isn’t really
- (joeerl: what is MQTT?)
  - it’s “Message Queuing T something”
  - pub-sub protocol pulled up by IBM
  - it’s marketed as the protocol of the Internet of Things
  - (joeerl: what?!)

conclusion

most of this is in the ZeroMQ Guide chapter 6 (free!)

Functional Patterns in Domain-driven design applied to Financial Models

Debasish Ghosh

what is a domain model?

wikipedia definition
Domain Driven Design, book by Eric Evans
historically - rich domain models
- class models the domain abstraction
- contains state and behaviour together
- state hidden within private access specifier
- we define “nouns first”
lean domain models
- algebraic data type models the domain abstraction
  - immutable by definition
- contains only the defining state as immmutable values
- immutability is the basis of the design here
- nothing but the bare essential definitions go in the ADT
- “verbs first”

domain model elements

rehash of evans’ book
entities & value objects
- modeled with types
behaviours
- modeled with functions
domain rules
- constraints and validations
bounded context
ubiquitous language

why functional?

ability to reason about your code
increased modularity - clean separation of state & behaviour
immutable data structures
concurrency becomes easier

mapping

problem domain:
- place order
- do trade
- process execution
solution domain:
- functions
- on types
- constrains
algebra:
- morphisms
- sets
- laws
- not concerned with implementation (right now)
- compose for larger abstractions

concrete algebraic structure: monoid

a set of elements S
a special element e
an operation ⊗ such that
- A ⊗ e = A = e ⊗ A (identity)
- (A ⊗ B) ⊗ C = A ⊗ (B ⊗ C) (associativity)

pattern #1: algebraic API design

client places order (in some flexible format)
- transform to internal domain model entity and place for execution
- when execution comes back, trade & allocate to client accounts

We create some types and some operations on those types:

def clientOrders: ClientOrderSheet => List[Order]

def execute: Market => Account => Order => List[Execution]

def allocate: List[Account] => Execution => List[Trade]

and define certain laws that the operations must satisfy. This is the algebra of the API.

For a single algebra we can have multiple interpreters aka implementations.

We can decouple the algebra from the interpreter.

Let’s mine some patterns:

def clientOrders: ClientOrderSheet => List[Order]

def execute: Market => Account => Order => List[Execution]

def allocate: List[Account] => Execution => List[Trade]

Note that clientOrders returns List[Order] and that execute takes an Order. Also execute returns a List[Execution] and allocate takes an Execution.

This has the pattern of Kleisli arrow (in haskell type notation: Monad m => a -> m b or in scala: A => Monad[B])

Define clientOrders thus instead:

def clientOrders: Kleisli[List, ClientOrderSheet, Order]

def allocate(accounts: List[Account]): Kleisli[List, Execution, Trade]

def tradeGeneration(
  market: Market,
  broker: Account,
  clientAccounts: List[Account]) = {

  clientOrders andThen
    execute(market, broker) andThen
      allocate(clientAccounts)
}

Implementation follows the specification and thus we get the Ubiquitous Language for free!

Is “nouns first” really the right approach to DDD?

Claim: with functions and types as the binding glue we get all these patterns for free using generic abstractions

Pattern #2:

DDD has patterns like Factories & Specification
- these are just normal idioms in functional programming
- they don’t even need special names

Monadic validation pattern:

val s = for {
  _ <- validQuantity
  _ <- validValueDate
  t <- validUnitPRice
} yield t

(aside: not sure the for-notation adds much here)

Smart constructor: factory pattern (chapter 6, DDD)

validation: specification pattern (chapter 9, DDD)

FP and DDD summary

FP is focussed on finding the generic
we implement all specific patterns in terms of generic abstractions
- based on function composition

Pattern #3: functional modelling encourages parametricity

case class Trade(
  refNo: String,
  instrument: Instrument,
  ...)

def netValue: Trade => Money = //..

def valueTradesInBaseCcy(ts: List[Trade]) = {
  ts.foldLeft(Money(0,baseCcy)) { (acc,e) =>
    acc +  inBaseCcy(netValue(e))
  }
}

case class Transaction(id: String, value: Money)

def highestValueTxn(txns: List[Transaction]) = {
  ts.foldLeft(Money(0,baseCcy)) { (acc,e) =>
    acc > e.value
  }
}

Instead of using a specific collection type (List), we can use a generic Foldable type. This reduces the possibilities available to valueTradesInBaseCcy and highestValueTxn and reduces scope for error. This is parametricity working for us.

def mapReduce[F[_], A, B](as: F[A])(f: A=>B)
  (implicit fd: Foldable[F], m: Monoid[B]) = fd.foldMap(as)(f)

def valueTradesInBaseCcy(ts: List[Trade]) =
  mapReduce(ts)(netValue andThen inBaseCcy)

def highestValueTxn(txns: List[Transaction]) =
  mapReduce(txns)(_.value)

Takeaways

Move the algebra from domain-specific abstractions to more generic ones, to make our code more generic, less verbose, more robust

The program space in the mapReduce function is shrunk - so scope of error is reduced

Increases the parametricity of our program - mapReduce is now parametric in type parameters A and B

Parametricity: theorems for free! (not the first time this has been mentioned in these notes)

Q&A

when you’re learning, it’s not apparent which abstractions are available - eg I didn’t know about the kleisli arrow. how does a beginner get to that stage?
- there’s no short answer
- use the tools which are out there - eg hoogle, FP in Scala book
- typeclassopaedia
- edward kmett :)

Naming things

@ianbarber

“Names are the one and only tool you have to explain what a variable does in every place it appears, without having to scatter comments everywhere.” – Michael Mahemoff

You don’t always need to use a name:

window.setTimeout(
    function() {
        console.log("hello");
    }, 1000
);

you need names if you want to rename things:

http://mollyrocket.com/casey/stream_0029.html (zomg, fails without js)

SessionProperties.Wnode.ClientContext = 1;

Here ClientContext is the type of a timestamp. the name is wrong and misleading, although it does correspond to the implementation

what is a name?

“A ‘name’ is a symbol – usually a human-readable-string – identifying some resource, or set of resources.

The string need not be meaningful to all users, and need not be drawn from a uniform name space.”

– Schoch

Names can be mangled: __TFCCC4test1a1b1c1dfS2_FTS0_1xS1_1vFT1xSi_Si_OVS_1e1f is a symbol name in swift

Namespaces

names drawn from a namespace
a name can be valid but not be assigned (or bound)
- can also define aliases
a filesystem is a series of nested namespaces
example: haskell module HttpClient in httpclient.hs - actually in /usr/local/httpclient/httpclient.hs
name purposes:
1. name: what we want
2. address: where it is
3. route: how we get there
as you go down this list, gets less important from a human cognition point of view
- change HttpClient to Sausage - wtf?
- change httpclient.hs to sausage.hs - confusing, but less of an issue
- change /usr/local/httpclient to /home/philandstuff/sausage - not sure it would be an issue at all any more
philosophy of naming
- Wittgenstein
  - my favourite buzzfeed listicle
- idea
  1. names gain meaning through convention
    - names attach to your understanding of things
  2. sense is in propositions
  3. interesting propositions are generalizable
algorithm:
- name things after yourself
- if you need more names, add numbers to the end
example: George Foreman
- the George Foreman Grill
- children:
  - George Jr
  - George II
  - George III
  - Georgette
example: PageRank
- decent name
- often when we name after someone, we have no good way to otherwise describe it:
  - Pythagoras’ Theorem
  - Dijkstra’s Algorithm
another approach: the Ronseal Pattern (it does exactly what it says on the tin)
- “Decking stain” - it stains decking
- note: not named after ingredients
Code Complete Ch 11 – naming
- “In general, if a name refers to some aspect of computing rather than to the problem
hungarian notation
- analysis: http://blogs.msdn.com/b/oldnewthing/archive/2004/06/22/162629.aspx
- there were two approaches: the systems division and the apps division
- lpString1 (for long pointer) vs szDest (string terminated by zero)
- apps division was better – about how was to be used, rather than what it was
“the point of having proper names is to just refer to objects and not to describe them” – Searle

-- most unrestricted
haversine :: (Ord a, Floating a) => a -> a -> a -> a -> a

-- says how to use it, but not what it does
haversine :: Float -> Float -> Float -> Float -> Float

-- says what it does, but it imposes an extra cost -- how do I get a
-- Location value?
haversine :: Location -> Location -> Distance

“Zooko’s Triangle” – Human Meaningful, Secure, Decentralised - pick two
- (not quite a CAP-theorem guarantee, but a broad property)

general rules

is a name necessary?
- if it doesn’t need one, don’t give it one
has it already been named?
- don’t invent new names for things
  - (grr, hash vs map, map vs collect, fold vs reduce vs inject)
what is the context & category?
how should it be used?
how will people remember it?
- don’t make it:
  - too long
  - too similar to something else
  - unpronouncable

Ideas:

currentLoggedInUser vs me
- a shorter name is expected to be used more
- a longer name indicates you expect it to be used less
  - but also that the distinction you make with the extra length is a distinction worth making
if someone doesn’t have the name for our thing, they can’t use it
- a lot of our success is about the success for people to active use the things we create
links

questions

“a lot of naming comes down to programmer readability, and similarly to English. How do you go about naming for non-English audiences?”
- comes down to context of audience again
- have to use words that the audience are likely to use
- if you know that your audience are haskell programmers, then they will already know the words in Prelude
  - not necessarily well enough to use in English, but well enough to use in Haskell
- when you first define a name, the comments are a good place to expand
- if you’re sitting with someone who is explaining something to you, the words they use to explain it are sometimes better names than the name itself
- often it doesn’t matter if you get it wrong first time
- but in a widely-used library, it’s hard to change
- the harder it is to change, the more effort you should put in to getting it right
“is it sometimes useful to name things that you’re not using (in pattern matching)?”
- yes, sometimes they’re useful purely for documentation rather than for use
“do you abbreviate names?”
- it can depend
“what about single-letter variable names? would you follow the convention of the community, or break convention and be description”
- single-letter variable names (particularly l) are confusing
- some odd names can get written when extracting a tmp var to break down a large expr

Verifying Sateful and Side-effecting Programs using Dependent Types

@edwinbrady, St Andrews University

intro

where is st andrews?
wonderful collection of error displays
it’s pretty hard to write software that works
famous bugs:
heartbleed
- a bug so bad that a marketing department got involved
- nobody spotted it for a long time
- goto
goto fail

Idris

Idris is a Pac-man Complete functional programming langauge with dependent types.

cabal update; cabal install idris
https://github.com/edwinb/idris-demos

In this talk: Reasoning about side effects in Idris

extra-functional correctness
example: communication protocols

example: write the type, the editor automatically gives options for implementation:

vzipWith : (a -> b -> c) -> Vect n a -> Vect n b -> Vect n c
vzipWith f [] [] = []
vzipWith f (x :: xs) (y :: ys) = f x y :: vzipWith f xs ys

Since this is the only implementation, the editor can write this function for us from the type alone, including splitting the cases into separate pattern matches and making the recursive call.

(it’s hard to take notes on this to demonstrate the real interactivity offered…)

aside: with dependent types, you can put anything you want in a type and it can get very complicated.

Please don’t do that.

It’s very important that the types make sense.

side effects

The Eff type family indicates side-effects:

vadd_check : Vect n Int -> Vect m Int ->
             { [EXCEPTION String] } Eff (Vect m Int)


read_vec : { [STDIO] } Eff (length ** Vect length Int)

do_vadd : { [STDIO, EXCEPTION String] } Eff ()

Note that we only allow STDIO, not BALLISTICMISSILEIO.

This gives a program which will add two user-supplied vectors of the same length.

what is Eff?

Eff is effectively a DSL for managing what you’re allowed to do and proving that you don’t do anything forbidden.

eg: open a file: are you allowed to read it?
- only if the open was successful
- types capture success and failure and
like a Hoare triple
- precondition, command, postcondition

State machine example: door opening

CLOSED >- knock -> CLOSED >- open door -> OPEN >- close -> CLOSED

knock     : { [DOOR Closed] } Eff ()

openDoor  : { [DOOR Closed] ==> [DOOR Open] } Eff ()
closeDoor : { [DOOR Open] ==> [DOOR CLosed] } Eff ()

state transitions which handle failure:

data Jam = Opened | Jammed

openDoor : { [DOOR Closed] ==>
             {isJammed}
             [DOOR (case isJammed of
                         Opened => Open
                         Jammed => Closed)] } Eff Jam

doorProg = { [STDIO, DOOR Closed] } Eff ()
doorProg = do knock
              jam <- openDoor
              case jam of
                   Opened => closeDoor
                   Jammed => do putStrLn "NNnnnnnnngh"
                                return ()

This is getting deeply nested. Is there a better way to handle failure?

Exceptions? Problem with exceptions is that you don’t know what state you were in when you threw; so you don’t know what cleanup you need to do.

Inspired by perl’s || die syntax:

doorProg = { [STDIO, DOOR Closed] } Eff ()
doorProg = do knock
              Opened <- openDoor | Jammed => return ()
              closeDoor

If you have a precise enough type system, you have a lot of help.

Protocol actions

Similar to session types. See http://eb.host.cs.st-andrews.ac.uk/drafts/protocols-spls0614.pdf

data Actions : Type where
     DoSend  : (dest : p) -> (pkt : Type) ->
               (const: pkt -> Actions) -> Actions
     DoRecv  : (src : p) -> (pkt : Type) ->
               (const: pkt -> Actions) -> Actions
     DoRec   : Inf Actions -> Actions
     End     : Actions

echo : Protocol ['C, 'S] ()
echo = do 'C ==> 'S | String
          'S ==> 'C | String
          Done

Refine the types to ensure it really is an echo server:

echo : Protocol ['C, 'S] ()
echo = do msg <- 'C ==> 'S | String
          'S ==> 'C | Literal msg
          Done

Q&A

“could we have a maybe effect to handle the door failure case?”
- there are loads of ways to deal with this
- the key thing is that the type says what state you’re in at each point
- there is an EXCEPTION effect but you lose the information about the state you’re in

Process calculi for the working programmer

@thattommyhall
Second reference in this conf to Mac Lane’s “Categories for the Working Mathematician”

Starting from the beginning

which is of course the start of the 20th century
David Hilbert vs Kurt Gödel
- also Russell and Whitehead
- Principia Mathematica
propositional calculus
example logical proof
1. ∃v¬φ (Assumption)
2. ¬φ (Assumption)
3. ∀vφ (Assumption)
4. φ (Universal Quantifier Elimination, 3)
5. (φ&¬φ) (Tautology,2,4)
6. ∀vφ -> (φ&¬φ) (CP,5)
7. ¬∀vφ (Tautology,6)
8. ¬∀vφ (EH,7)
Hofstadter, Gödel, Escher, Bach
- more programmer-friendly syntax with nesting

lambda calculus

syntax

e ::= v | λ/v/./e/ | (e e)
Lisp syntax is inspired by lambda calculus:
- (fn [x] (+ x 1))

semantics

(\x. E) M => E{M/x}

((\x. x + 1) 2) => 2+1 => 3

((fn [x] (+ x 1)) 2)

scope issues

(\x.\y (x y) (y w))

=> (\y ((y w) y))

but the name collision here changes the meaning. We didn’t mean for the y inside (λy. ..) to refer to the same thing as in (y w)

((fn [x]
   (fn [y]
     (x y)))
 (y w))

=> (in a call-by-name version of clojure)

(fn [y]
   ((y w) y))

order of evaluation

applicative order (call-by-value)
normal order (call-by-name)

Universal Register Machines

End
Inc n m (add 1 to register n, goto m)
DecB n m p (subtract 1 from reg n, if x=0 goto m else goto p)

Actors = Lambda calculus + Message Passing

Robin Milner is responsible for a lot of this work
Pi Calculus syntax
parallel processes communicating over channels
- writes block until corresponding read, reads block until corresponding write
non-determinism
- if two writers write to same channel, nondeterministic which message a reader gets

dining philosophers

wiki http://en.wikipedia.org/wiki/Dining_philosophers_problem
solved using Pict

Communicating Sequential Processes (CSP)

syntax + semantics

Clojure core.async

CSP for clojure
one difference from pi calculus: channels are allowed a buffer
- to remove blocking from writes and reads
- though the buffer is always bounded
- (one issue with the actor model is that the mailbox is unbounded)
channels are first-class values which can be sent down channels

from concurrent to distributed

mobile pict
join calculus

deterministic parallelism

LVars (Lindsey Kuper)
- RICON

practical uses

Spi-calculus (crypto)
biology (simulations of cells using pi calculus)
modelling telephony protocols

failure of formal methods?

“the world just does not suffer significantly from the kind of problem that our research was originally intended to solve” – Hoare

who’s doing this?

INRIA JoCaml
Microsoft Research (c-omega?)
F♯ tryjoinads
Me (maybe…)
David Pollak has been talking about making clojure core.async distributed

conclusion

not rocket science
some of this stuff may be the future
it’s also interesting

QuickCheck Evolution

John Hughes @rjmh

QuickCheck runthrough

why is testing hard?
say you have n features:
- O(n) tests
but some bugs appear only when features used together:
- pairs of features
- O(n^2) tests
etc
you can’t test everything
where do you stop?

Don’t write tests!

Generate them

QuickCheck

1999 - Koen Claessen and John Hughes, for HAskell
2006 - Quviq founded marketing Erlang version
many extensions
finding deep bugs for ericsson, volvo, basho, etc

example

Binary search trees:

to_list(leaf) ->
    [];
to_list(to_list({node,L,X,R}) ->
  to_list(L) ++ [X] ++ to_list(R).

member(_,leaf) ->
    false;
member(X,{node,L,Y,R}) ->
    if X==Y ->
            true;
       X<Y ->
            member(X,L);
       X>Y ->
            member(X,R)
    end.

prop_member() ->
    ?FORALL({X,T},{nat(),tree()},
            member(X,T) == lists:member(X,to_list(T))).

?FORALL asserts a property is true for all natural numbers, and trees. We then generate test cases fitting this pattern

shrinking

Every time I run the test, I get a different, potentially large case; but shrinking reduces it down to a much smaller and more manageable case.

keeping the example

If you have a failing test case, you may want to keep the failing testcase (esp if it appears less frequently), to ensure that you really have fixed the bug.

Enter: QuickCheck CI for testing github projects

demo with https://github.com/rjmh/trees
testing
coverage
- often directs your attention straight to where the bug is
- coverage data is per-test
  - the bug can only be in the green lines, since the red lines weren’t executed
failing tests are remembered between builds
build doesn’t go green until all previously failing cases are passing

state machine testing - example

test the process registry
- register(Name,Pid)
- unregister(Name)
- spawn() - to create pids for test data
what’s different?
- these functions change the state of the registry
we won’t find bugs
- instead: reverse engineer preconditions

state machine models

test case: sequences of API calls
for each call, model how the state has changed
can use postconditions on each call
when a test fails, shrink the sequence of calls

specification of register:

register_pre(S) ->
    S#state.pids /= [].

register_args(S) ->
    [name(),elements(S#state.pids)].

regsiter(Name,Pid) ->
    erlang:register(Name,Pid).

register_next(S,_,[Name,Pid]) ->
    S#state{regs=S# ...

can conveniently specify the indended behaviour of stateful systems
testing in practice involves
- reverse engineering the actual specification
  - documentation is always wrong
  - specification is always incomplete
- finding and correcting bugs

automotive: CAN bus testing

3000 pages of spec
20,000 lines of QuickCheck, testing
1,000,000 LoC, 6 suppliers
200 problems, of which
- 100 problems in the standard itself

does this approach scale? yes!

problem: bugs are plural

QuickCheck finds one bug at a time
- not much point running quickcheck again
  - it will just shrink down to the same testcase
- fix the bug
- only then do you find the next bug
what if fixing the bugs aren’t so easy?
- in the CAN bus case, we didn’t have the source code
- would send a bug report, and get a new version maybe 3 weeks later
in practice:
- unfix the spec to allow the bug
- if we’ve modelled the bug we’ve found correctly, quickcheck will now find the next bug
- feels icky
“More bugs”
- compute “bug signatures” from failures already found
- focus testing effor on areas not yet known to be buggy
in practice:
- move from finding 1 bug, we’d find 20+ bugs in a single run of QuickCheck

want to try it out?

Q&A

“is quickcheck-ci erlang only?”
- you write the properties in erlang
- there’s a C project we’re testing using it
- “will you support quickcheck for haskell?”
  - that might be fun
  - there’s nothing happening along those lines

Panel: Two steps forward, one step back

Andreas Stefik
- professor at university nevada las vegas
- research how proglangs impact human beings
- mostly work on Quorum these days
Chad Fowler
- used to be a ruby guy
- usually use:
  - ruby
  - haskell
  - clojure
  - scala
  - go
  - node
  - erlang
  - objective c
Jessica Kerr
- 15 years of java
- scala then clojure
- I play with ruby in order to speak at ruby conferences
Steve Vinoski
- work for basho
- lately a lot in C (working on Erlang VM)
- long time C++ (started 1988)
- whole bunch of languages
Viktor Klang
- work for TypeSafe
- mostly scala
- bit of java, bash, applescript

static vs dynamic typing

jess: I think Stefik settled that yesterday :)
- ie that static typing improves productivity at some point between half a year and 3 years
  - although we have no data in that interval
- use of generics
Chad: what does static typing mean?
- stefik: can mean:
  - interesting haskell syntax
  - generics with crazy wildcards
  - most developers think it means “I have types on my method decls”
    - this property is a strong predictor (for what?)
  - implementing generics have a tenfold decrease in productivity
    - using generics gives a slight bump
joeerl: currently we use a lot of C and a lot of erlang
- in my experience, 50x more errors in the C than on the erlang
- I’d like to see the division between GC (java) and non-GC (C) langs
- microsoft became rich by shipping buggy software early
  - will static types deliver
- Stefik:
  - I didn’t make a sweeping statement
  - we ran formal peer-reviewed RCTs
    - with human users
    - we published the data and the methodology
  - it didn’t say “static typing is always better”
  - joeerl: what langs were used in that study
  - Stefik: we did 8 studies
  - joeerl: can you infer results from one language to another
  - jess: there’s a lot more difference between C and erlang
  - Stefik: we didn’t test C in the syntax studies
    - there might be any number of causes of bugs in a language
    - memory leaks
    - if you’re saying there’s many things besides type systems that can affect your productivity, that’s true
  - Chad: there’s also strong vs weak typing
  - steve: types don’t save me across the network
    - types don’t save me from locking and concurrency
    - other things affect productivity and correctness
  - Stefik: that’s true of course
    - we don’t know much about these other things that affect productivity
    - there aren’t many studies - theres one from UT Austin which showed eightfold increase in bugs with threads+locks vs STM
    - this isn’t surprising but it’s a huge productivity impact
    - but we’ve also shown that static vs dynamic typing is a productivity impact
rvirding: how much is the typing, and how much is being forced to write down the type?
- stefik: the physical manifestation (the fact that it’s in the file) matters
  - we’ve tested this
  - the type needs to be in the method declaration
  - not universal across programmers:
  - 92% of novices screw up types in method decls
  - people with more experience get a productivity boost
Chad: a little over 10 years ago, there was a big movement for dynamic movement against static typing
- ruby was the big upcoming
- strong dynamic typing vs weak static typing
- now, we’re all pushing for static typing and category theory
- projects still suck just as bad, software still sucks just as bad
viktor: there’s a difference between structural and nominal types
- particularly with concurrency
- you don’t need to ship the defintion of the types around with structural types
Stefik: my wish is that people start collecting more and more evidence on this stuff
- pendulum swings are good
- but if we hone down what works and what doesn’t, when we define a new language, we can read 100 papers on productivity
jess: everybody can’t read every paper! (reference to @aphyr debate recently)
- steve: you just don’t want to reinvent the wheel
- jess: how do you find that previous work?
- steve: leslie lamport’s still around! you can ask him
  - reid draper gave a talk here and included a thing on RPC vs messaging
    - had good mentions of RFCs from 1970s (see above)
  - we have google now
  - you don’t have to sit in the library
- jess: to google it you have to have the name already!
- stefik: I’m talking about people defining a new language
  - you simply can’t read all the papers
  - the number of papers published in a month is probably more than anyone can read in a lifetime
- viktor: getting up to speed with where the researcher was when they wrote the paper is a significant effort
steve: take all the good papers and put them on stackoverflow
- people would read them then!
stefik: there would be lawsuits
- it’s horribly stupid
- institutions are judged not on quality of papers, but number they produce
- when they are produced
  - the publishing company gets paid for the research
  - they contribute nothing to any part of the process
  - they provide an “upload” button and you click “submit”
jess: we’ve discovered this thing called “open source”

distributed system for akka

francesco: viktor, how do you approach the work you’re doing, and how do you combine new ideas with old ideas?
- viktor: what helps me the most is to start from scratch
- esp for distributed systems, don’t start with computers
- first think “how would we solve this problem with people”?
- the more I learn about fundamental problems, the more I see them everywhere
- not just in computing
- queueing theory
- a lot of people never peer through the base level
chad: do you think anyone knows what the list of fundamentals to learn is?
stefik: there are two groups of scholars in the US trying to adjust the compsci education
- Exploring Computer Science (ECS) & Computer Science Principles
jess: fundamentals are different per person
- say someone was doing a talk with queueing theory?
- how many people have a local group where such a presentation would be accepted?

open access

jeremy gibbons: can you get hold of ACM papers?
- does the (in general) paywall stop you accessing the paywall?
- do you find ways round it? do you google the title and find the copy on the author’s homepage?
philandstuff: the biggest barrier for me is pdf as a format needs to die

other sources

jess: there are other sources from papers – eg open source
- how do you find how other people solved a problem before?
audience: papers have keywords and get indexed
- search by keywords
jess: I have written programs to write tests and I didn’t know what I was doing was called property-based testing
- how can we educate each other?
viktor: more negative experience reports
- why don’t things work?
joeerl: Chad’s slides showed 60% of software projects fail (for some value of fail)
- I’ve been close to some of those
- the employer never admitted the failure
  - it was a success, but due to market changes it had been down-prioritized
Chad: host of the interesting failures aren’t technical
steve: can you make “failmesh”?
- francesco: QCon london once had a “majestic failures” track
  - C.A.R Hoare and the “billion dollar mistake” - the null pointer
viktor: I would say a mars lander feet/metres problem was a technical error
- chad: this is the airport security approach
  - one person had a bomb in their shoe, so everyone takes of their shoes
- jess: a type system like F♯’s with units would have fixed that
  - people problem behind our couch database issue
rjmh: from CHAOS report, among successful projects, less than half of the requirements got implemented
- are we doing better nowadays?
- maybe agile methodology means we never discover the half of the requirements we failed to implement
- stefik: I’ve heard the CHAOS report cited a million times
  - it’s a for-profit group
  - surveys to managers but not the technical developers
  - measuring failures is pretty hard
- chad: everyone knows the standish thing is bullshitty
  - the one I showed is 2013
  - it seems worth showing to me is that it’s a confirmation of what we all already know
- stefik: I show it to my class too, I’m just reminding that it’s bullshitty
- rvirding: how many failures are because the project itself was stupid?
- stefik: that’s exactly it!
audience: in 2012 RBS had a major failure
- people weren’t able to withdraw cash for a weekend
- was that a technical failure or a process failure?
  - steve: you can look at that as a people failure
    - they didn’t have a process in place to verify code
    - in that case, it’s similar: who put these processes together, did they practice them?
    - you’re suppose to practice doing upgrades, having outages
    - if you don’t do it, how can you have confidence?
- jess: both etsy and netflix at goto amsterdam:
  - dealing with failure:
    - don’t blame anyone
    - you can fix it at any of 18 levels
      - code
      - testing
reiddraper: we have poor role models
- eg the 25-year-old male programmer who writes an iphone app
- not the person whos been here 40 years and learns slowly
- how can we fix that?
- jess: pictures of joeerl in a purple velvet jacket

philandstuff/codemesh2014.org

Codemesh 2014, London

keynote: complexity is outside the code

when I started programming:

no silver bullet

soa & polyglot persistence

moore’s law

the happy old days vs now:

mistake: our goal is not to write code

estimation

research

kaizen

things other than programming that we do

Don Syme, the F♯ approach to relaxation

starting position

then (2003)

things changed:

relaxation achieved 😎

tl;dr: where we are now

the puzzle

F♯ compiler

language design process

F♯ Compiler Service

relaxation achieved:

then (2003)

now: object + functional

anecdata comparison between C♯ and F♯

enter F♯ Active Patterns

code <=> data: Type Providers

sync <=> async

numbers <=> numbers-with-units

GPU <=> CPU

what’s ahead?

repl <=> distribution and scale

conclusion

NoSQL is dead

naming things is hard

my history

distributed systems

the thesis:

NoSQL is ill-defined

perceived deficiencies

actual deficiencies

NoSQL is a bad classifier

DB classifications

models

Graph

Key/Value stores

column store

document datastore

this is too limited

model types: revised

Classify by topology?

Single Node

Mesh

Partial Mesh

Tree/Star

classify by natural distribution

hard, but not impossible

easy, but not always there

what else?

what does this mean for you?

future

Categories for the Working Programmer

functions that consume lists

applications of foldr

What’s special about lists?

It’s not always so obvious

Preparing for genericity

The categorical view, in a nutshell

Duality

Conclusion

TDD, as in Type-Directed Development

thinking with types

expressive type systems

parametricity

restrictions

no nulls

no case matching on class

no exceptions