jonthebeach

Infrastructure as Code with Terraform

Mitchell Hashimoto - @mitchellh

The problem - history

From bash scripts to virtualization empowers automation
Virtualization AAS: the cloud
Cloud architectures are super powerful but complex -> Capability breeds complexity
Terraform (TF) describes the infrastructure as code

Benefits

Code as source of knowledge AND truth (since you execute it to generate the infra).
Text, versionable, collaborative
Simpler than CloudFormation (CF)
Is not just about the cloud, has many many providers (gitHub, PagerDuty, DataDog...). E.g HashiCorp onboarding consists in adding a module for the new employee with the access for everything.

Terraform internals

Keep It Simple example: few interface public methods, simple architecture
Clear, easy to understand extension points

A Functional approach to reactive microservice architecture

Jose Escanciano Patrick Di Loreto @patricknoir

Their framework - https://github.com/patricknoir/reactive-system
Reactive principles: elastic, responsive, resilient, message-driven
Microservices pros (does not say the cons :-/)
DDD as a tool to tackle bussiness logic with Divide&Conquer
- Siplitting the domain in Bouded Contexts allows to segregate them in different microservices
- DDD elements for modeling: VO, Entities, (BCs, Ubiquitous Language...
How FP (pure functions) help building multithreaded stateful services
- state monad: funtion(input, state) returns (state, sideEffect)
CQRS and how they model it in their framework (processors )
Actor model, distsys and Akka
Code example
- Apps, services, topics
- No time to see it running
It's not ready for production :-|
if you are into Scala+Akka to build a brand new clustered service then take a look at this framework abstractions

Lessons learned building a big data analytics engine, from proprietary to open source

@joelbrunger Alvaro Santamaria @dofideas

data viz lessons

viz is not about frontend but how do you architect the backend to allow displaying what's needed
information extraction is lightweight, processed from backend and sent to frontend
- control maps and heats maps over putting the values directly
- std deviation in form of elipsis
- box plots are better than historgrams
- sequences of box plots are even better
hold state
- materialized views with averages, counts, tdigest (statistical distributions)
- dimensions and groups
  - 1 group: a KPI
  - several: graph
  - lots: time series
delivering (information from back to front)
- req/response vs reactive
- indexes
  - top K (sort + limit)
  - time series: index by time
pipeline
- join/aggegarte
- enrich
- filter
- extract
- materialized view (inmemory)
- index

mapr

products for: data-store, analytics, ML, database, streams
- https://mapr.com/products/
describes distributed arch for ML models processing based on a dist log like Kafka + pub/sub
Asked about Feature prep: "it will be eventually automated"

Automerge: Making servers optional for real-time collaboration

Martin Kleppman

examples:
- git as an example of a distributed system with replication (git push), applying changes (pull + merge), conflicts etc
- google docs (real time, collaborative editing)
- distributed databases
replicated state diverges between nodes/collaborators for some time and gets merged back
examples by type
- text manipulation (GDocs)
- set manipulation
- counter (needs to capture the operations - increment - not just to merge the resulting states of the nodes)
Operational Transform algos: there are many implementations and papers, most of them wrong (conflict resolution does not work and states between nodes diverge - consensus is not reached) if they don't use a central server.
Side note
- blockchain is also about dist consensus (what's the next block in the chain - pick-one strategy)
- in contrast, for collaboration we want a pick-all strategy to integrate all changes
CDRTs based algos
- RGA is the CDRT for text editing
- others
using Isabell proof software, they build CDRTs, applied strong eventual consistency theory over a modelled network and proved that their software is correct. Automerge is the JS implementation. It's still a research project and not production ready.

automerge

data/storage layer (does not do networking)
trellis: p2p trello-based example, using WebRTC
- https://github.com/automerge/mpl lib for WebRTC
automerge's operation to generate the new state captures the mutations done to the old state and transforms it in a log of operations that will be used later for conflict resolution when concurrent changes ocurr.
has sensible defaults regarding concurrent modifications
- e.g a word is both removed by 2 collaborators, then it should be removed from the final state

Distributed Transactions are dead, long live distributed transaction!

Sergey Bykov @sergeybykov

Basic example of ACID: Atomicity, Consistency, Isolation, Durability
DBMSs resolve this at local level
In clustered ddbbs you need Distributed Transactions
- CAP theorem, latency & throughput
CQRS: append/write events to the Event Store (append-only log), and denormalize them in another datastore to be queried.
- this solved D from ACID but not A, C or I
eventual consistency
- Google's Spanner paper: building software based in eventual consistency is hard. Spanner claims to be CA "effectively", because Google's network reliability makes partitions probability much smaller.
- CockroachDB: "similar" without atomic clocks and unreliable network.
- CosmosDB: different consistency options

MS Orleans

made to build distsys on cloud implemeting the actor model in .NET
grains instead of actors
- they manage their own state
- multiple storage systems
- no coordination -> scalability
bank transaction example
- happy path: all good
- error paths: disk errors, network errors, reboots...
how people solves this
- give an ID to the request (idempotency)
- record completitions of operations against actors
- add retries when errors and hope for the best
- Martin Kleppman: microservices do poor's-man distributed transactions
what we really want: distributed transactions
- example of a Taks (async Future in .NET) that makes a transaction, looks like a local operation.
how does this work?
- implements a paper about dist transactions in actor model implementations
- 2PC diagram
- transaction manager (SPOF) and agents
- open source

Big Data in a neurophysiology research lab… what?

Max Novelli

Motor and sensory functions with prosthetic limb needs a brain, a computer interface
Data that is powering this system is gathered from various places: raw neural activity, nerve activity, kinematics, control signals, events, prosthetic signals, forces, torques, etc, notes, videos and images, etc. quite a bit of data!
Data vs Metadata: data are just raw values, and metadata are labels for those raw values. Metadata gives meaning to the data itself
Managing these unstructured data becomes challenging over time, with more people working on the same, etc. similar problem ot a typical warehouse, but with some intrincacies: proprietary formats, frequent human manipulation, almost no automation
Continuous Ingestion + Continuous Curation needed
They added a software layer on top of it (Matlab MDB), but then they start having scalability issues and manageability issues: too much data, code base upgrades, unoptimized queries, unflexible architecture, etc.
Brainstorm for solutions:
- People working with data are not programmers, they want data, not a program
- Platform constraints (Windows)
- Mini coding
- Structure queries (SQL)
- And so on so forth: leads to big data approach
Big data, four Vs: volume (size in disk), velocity (real-time, batch), variety (format) and veracity (quality and validation)
Novel concepts (for this specific industry):
- Curation (be able to add more information, better tags, labels, to existing data)
Biggest challenges:
- Walk through with the user to make use of the data, make sure they know how to use the tool, etc.
- Most of the time babysitting the person using the tool
Existing tools doesn't fit with the kind of workflow people in this industry. Building a custom solution adapted to this workflow was adopted really quickly. A traditional big data technology would be too much disruptive.
A research environment is really hard to find common ground in terms of technology, is more or less anarchy, a lot of ad-hocs developments, etc.

Designing Events-first Microservices

Jonas Boner @jboner

Microservices should be implemented for organization reasons and make companies enter into distributed systems.
Beware with Microliths! If you have temporal coupling (e.g a ms1 calls a ms2 and waits for its response) then you are doing it wrong.
Events first DDD: OO was about finding the structures of the systems too early in the design progress. Events based systems can help with that. Instead of Nouns, focus on Verbs that are often materialized as events. Define what's happening instead of who's provoking what's happening.
Events:
- are facts that are immutable
- something that happened, past tense: ProductShipped
- how to find the facts? Event Storming
- can be ignored, but not deleted
  - GDPR! But new facts can invalidate existing facts.
- vs commands: the latter have
  - have intent
  - are directed
  - imperative: ShopProduct
  - addressable destination (one or many, but we know who they are)
- vs reactions: the latter represents side effects
diagram of event arch: command -> processing -> events -> event bus -> subscriptors react to events | and eventual consistency in the middle of it
CRUD is fine for isolated data, but if you have microservices (which have their own datastores) how to you do JOINs?
Distributed transactions, 2PC? Use it carefully, strong consistency should not be the default because of availability.
evolution from CRUD
- 1: CRUD of 2 services + Event Streams to a Materialized View where you can JOIN. The source of truth is is services DBs.
- Jim Gray: Update-in-place (overwriting) is an epic fail, use appending of changes.
- Patt Helland: the truth is the log. The database is a cache of a subset of the log.
- 1. Event Sourced Services
  - happy path: command -> event > event log -> subscribe + update component -> run side effects
- 1. CQRS to separate how reads and writes are modelled because they have different constraints (scalability, consistency, availability... eventual consistency).

Good ideas we forgot

Joe Armstrong - @joeerl

Some ideas have been forgot over 50y of programming
There is lots of options, advertising and little time to try how things really are
Principles are important when talking about software systems. If you violate any of them, unpredictable things can happen.
- observation: how to describe I/O, computations, connections, events
- isolation: one system does not affect others, to do so you pass a message
  - gives fault-tolerance, scalability, security
- composition
- causality: A -[msgs]-> B
  - time: A does not know about B after the messages are sent
- physics: program and data must be in the same space/time

ideas

flow-based programming
- https://en.wikipedia.org/wiki/Flow-based_programming
- https://github.com/flowbased/flowbased.org/wiki/Definition
pipes => composability
linda tuple spaces: declarative message passing, adding the data to a shared "blackboard" so you don't need to know the receivers but they read the backboard to gather messages.

so there is no shared state, no locks, no futures, no promises... no bullshit
hypertext
- the web/HTML is no hypertext really in the theorical way
- https://en.wikipedia.org/wiki/Project_Xanadu#Original_17_rules
what next?
- do https://en.wikipedia.org/wiki/Precision_medicine and save my life (joke)
Q/A:
- session types is something new that can work http://simonjf.com/2016/05/28/session-type-implementations.html

Real production use: Reactive design for the manufacturing industry

Roland Kuhn @rolandkuhn

Asynchronous Programming with Kotlin

Hadi Hariri @hhariri

async (non-blocking) programming models
- multithreading
  - creation is expensive (OS level)
  - shared mutable state is dangerous
- callbacks
  - error handling is complex
- promises/futures
- Rx (reactive extensions): observables and subscriptions
  - but then everything is an observable stream
- kotlin coroutines
- golang coroutines
  - diff vs kotlin. https://stackoverflow.com/a/46865213/547956

kotlin coroutines

coroutines are suspendable
- launch / suspend
they are FSM with CPS: callbacks handleded for you using a FSM
async/await are functions, not language keywords
launch returns a new job, which you can call join to make it wait
channels for communicating between coroutines
- patterns: fan-in, fan-out...

General Purpose Big Data Systems are eating the world: Tool consolidation -- is it inevitable?

Holden Karau @holdenkarau

Co-author of Learning Spark, High Performance Spark
Apache Beam main contributor
Former Spark contributor
Every-time there is a new piece of big data technology we often see many different specific implementations of the concepts, which often eventually consolidate down to a few viable options, and then frequently end up getting rolled into part of another larger project.
Abstracting all the data processor is impossible
Beam still WIP, lots of problems in the way
Beam backends: Google CP + Flink

mserranom/jonthebeach.md

Select an option

No results found

Select an option

No results found

jonthebeach

Infrastructure as Code with Terraform

The problem - history

Benefits

Terraform internals

A Functional approach to reactive microservice architecture

Lessons learned building a big data analytics engine, from proprietary to open source

data viz lessons

mapr

Automerge: Making servers optional for real-time collaboration

automerge

Distributed Transactions are dead, long live distributed transaction!

MS Orleans

Big Data in a neurophysiology research lab… what?

Designing Events-first Microservices

Good ideas we forgot

ideas

Real production use: Reactive design for the manufacturing industry

Asynchronous Programming with Kotlin

kotlin coroutines

General Purpose Big Data Systems are eating the world: Tool consolidation -- is it inevitable?