Skip to content

Instantly share code, notes, and snippets.

@mserranom
Last active May 29, 2018 07:39
Show Gist options
  • Select an option

  • Save mserranom/1b2179e2ae9c0ec089d10c6aaec8b0ef to your computer and use it in GitHub Desktop.

Select an option

Save mserranom/1b2179e2ae9c0ec089d10c6aaec8b0ef to your computer and use it in GitHub Desktop.
jonthebeach.md

jonthebeach

Infrastructure as Code with Terraform

Mitchell Hashimoto - @mitchellh

The problem - history

  • From bash scripts to virtualization empowers automation
  • Virtualization AAS: the cloud
  • Cloud architectures are super powerful but complex -> Capability breeds complexity
  • Terraform (TF) describes the infrastructure as code

Benefits

  • Code as source of knowledge AND truth (since you execute it to generate the infra).
  • Text, versionable, collaborative
  • Simpler than CloudFormation (CF)
  • Is not just about the cloud, has many many providers (gitHub, PagerDuty, DataDog...). E.g HashiCorp onboarding consists in adding a module for the new employee with the access for everything.

Terraform internals

  • Keep It Simple example: few interface public methods, simple architecture
  • Clear, easy to understand extension points

A Functional approach to reactive microservice architecture

Jose Escanciano Patrick Di Loreto @patricknoir

  • Their framework - https://github.com/patricknoir/reactive-system
  • Reactive principles: elastic, responsive, resilient, message-driven
  • Microservices pros (does not say the cons :-/)
  • DDD as a tool to tackle bussiness logic with Divide&Conquer
    • Siplitting the domain in Bouded Contexts allows to segregate them in different microservices
    • DDD elements for modeling: VO, Entities, (BCs, Ubiquitous Language...
  • How FP (pure functions) help building multithreaded stateful services
    • state monad: funtion(input, state) returns (state, sideEffect)
  • CQRS and how they model it in their framework (processors )
  • Actor model, distsys and Akka
  • Code example
    • Apps, services, topics
    • No time to see it running
  • It's not ready for production :-|
  • if you are into Scala+Akka to build a brand new clustered service then take a look at this framework abstractions

Lessons learned building a big data analytics engine, from proprietary to open source

@joelbrunger Alvaro Santamaria @dofideas

data viz lessons

  • viz is not about frontend but how do you architect the backend to allow displaying what's needed
  • information extraction is lightweight, processed from backend and sent to frontend
    • control maps and heats maps over putting the values directly
    • std deviation in form of elipsis
    • box plots are better than historgrams
    • sequences of box plots are even better
  • hold state
    • materialized views with averages, counts, tdigest (statistical distributions)
    • dimensions and groups
      • 1 group: a KPI
      • several: graph
      • lots: time series
  • delivering (information from back to front)
    • req/response vs reactive
    • indexes
      • top K (sort + limit)
      • time series: index by time
  • pipeline
    • join/aggegarte
    • enrich
    • filter
    • extract
    • materialized view (inmemory)
    • index

mapr

  • products for: data-store, analytics, ML, database, streams
  • describes distributed arch for ML models processing based on a dist log like Kafka + pub/sub
  • Asked about Feature prep: "it will be eventually automated"

Automerge: Making servers optional for real-time collaboration

Martin Kleppman

  • examples:
    • git as an example of a distributed system with replication (git push), applying changes (pull + merge), conflicts etc
    • google docs (real time, collaborative editing)
    • distributed databases
  • replicated state diverges between nodes/collaborators for some time and gets merged back
  • examples by type
    • text manipulation (GDocs)
    • set manipulation
    • counter (needs to capture the operations - increment - not just to merge the resulting states of the nodes)
  • Operational Transform algos: there are many implementations and papers, most of them wrong (conflict resolution does not work and states between nodes diverge - consensus is not reached) if they don't use a central server.
  • Side note
    • blockchain is also about dist consensus (what's the next block in the chain - pick-one strategy)
    • in contrast, for collaboration we want a pick-all strategy to integrate all changes
  • CDRTs based algos
    • RGA is the CDRT for text editing
    • others
  • using Isabell proof software, they build CDRTs, applied strong eventual consistency theory over a modelled network and proved that their software is correct. Automerge is the JS implementation. It's still a research project and not production ready.

automerge

  • data/storage layer (does not do networking)
  • trellis: p2p trello-based example, using WebRTC
  • automerge's operation to generate the new state captures the mutations done to the old state and transforms it in a log of operations that will be used later for conflict resolution when concurrent changes ocurr.
  • has sensible defaults regarding concurrent modifications
    • e.g a word is both removed by 2 collaborators, then it should be removed from the final state

Distributed Transactions are dead, long live distributed transaction!

Sergey Bykov @sergeybykov

  • Basic example of ACID: Atomicity, Consistency, Isolation, Durability
  • DBMSs resolve this at local level
  • In clustered ddbbs you need Distributed Transactions
    • CAP theorem, latency & throughput
  • CQRS: append/write events to the Event Store (append-only log), and denormalize them in another datastore to be queried.
    • this solved D from ACID but not A, C or I
  • eventual consistency
    • Google's Spanner paper: building software based in eventual consistency is hard. Spanner claims to be CA "effectively", because Google's network reliability makes partitions probability much smaller.
    • CockroachDB: "similar" without atomic clocks and unreliable network.
    • CosmosDB: different consistency options

MS Orleans

  • made to build distsys on cloud implemeting the actor model in .NET
  • grains instead of actors
    • they manage their own state
    • multiple storage systems
    • no coordination -> scalability
  • bank transaction example
    • happy path: all good
    • error paths: disk errors, network errors, reboots...
  • how people solves this
    • give an ID to the request (idempotency)
    • record completitions of operations against actors
    • add retries when errors and hope for the best
    • Martin Kleppman: microservices do poor's-man distributed transactions
  • what we really want: distributed transactions
    • example of a Taks (async Future in .NET) that makes a transaction, looks like a local operation.
  • how does this work?

Big Data in a neurophysiology research lab… what?

Max Novelli

  • Motor and sensory functions with prosthetic limb needs a brain, a computer interface
  • Data that is powering this system is gathered from various places: raw neural activity, nerve activity, kinematics, control signals, events, prosthetic signals, forces, torques, etc, notes, videos and images, etc. quite a bit of data!
  • Data vs Metadata: data are just raw values, and metadata are labels for those raw values. Metadata gives meaning to the data itself
  • Managing these unstructured data becomes challenging over time, with more people working on the same, etc. similar problem ot a typical warehouse, but with some intrincacies: proprietary formats, frequent human manipulation, almost no automation
  • Continuous Ingestion + Continuous Curation needed
  • They added a software layer on top of it (Matlab MDB), but then they start having scalability issues and manageability issues: too much data, code base upgrades, unoptimized queries, unflexible architecture, etc.
  • Brainstorm for solutions:
    • People working with data are not programmers, they want data, not a program
    • Platform constraints (Windows)
    • Mini coding
    • Structure queries (SQL)
    • And so on so forth: leads to big data approach
  • Big data, four Vs: volume (size in disk), velocity (real-time, batch), variety (format) and veracity (quality and validation)
  • Novel concepts (for this specific industry):
    • Curation (be able to add more information, better tags, labels, to existing data)
  • Biggest challenges:
    • Walk through with the user to make use of the data, make sure they know how to use the tool, etc.
    • Most of the time babysitting the person using the tool
  • Existing tools doesn't fit with the kind of workflow people in this industry. Building a custom solution adapted to this workflow was adopted really quickly. A traditional big data technology would be too much disruptive.
  • A research environment is really hard to find common ground in terms of technology, is more or less anarchy, a lot of ad-hocs developments, etc.

Designing Events-first Microservices

Jonas Boner @jboner

  • Microservices should be implemented for organization reasons and make companies enter into distributed systems.
  • Beware with Microliths! If you have temporal coupling (e.g a ms1 calls a ms2 and waits for its response) then you are doing it wrong.
  • Events first DDD: OO was about finding the structures of the systems too early in the design progress. Events based systems can help with that. Instead of Nouns, focus on Verbs that are often materialized as events. Define what's happening instead of who's provoking what's happening.
  • Events:
    • are facts that are immutable
    • something that happened, past tense: ProductShipped
    • how to find the facts? Event Storming
    • can be ignored, but not deleted
      • GDPR! But new facts can invalidate existing facts.
    • vs commands: the latter have
      • have intent
      • are directed
      • imperative: ShopProduct
      • addressable destination (one or many, but we know who they are)
    • vs reactions: the latter represents side effects
  • diagram of event arch: command -> processing -> events -> event bus -> subscriptors react to events | and eventual consistency in the middle of it
  • CRUD is fine for isolated data, but if you have microservices (which have their own datastores) how to you do JOINs?
  • Distributed transactions, 2PC? Use it carefully, strong consistency should not be the default because of availability.
  • evolution from CRUD
    • 1: CRUD of 2 services + Event Streams to a Materialized View where you can JOIN. The source of truth is is services DBs.
    • Jim Gray: Update-in-place (overwriting) is an epic fail, use appending of changes.
    • Patt Helland: the truth is the log. The database is a cache of a subset of the log.
      1. Event Sourced Services
      • happy path: command -> event > event log -> subscribe + update component -> run side effects
      1. CQRS to separate how reads and writes are modelled because they have different constraints (scalability, consistency, availability... eventual consistency).

Good ideas we forgot

Joe Armstrong - @joeerl

  • Some ideas have been forgot over 50y of programming
  • There is lots of options, advertising and little time to try how things really are
  • Principles are important when talking about software systems. If you violate any of them, unpredictable things can happen.
    • observation: how to describe I/O, computations, connections, events
    • isolation: one system does not affect others, to do so you pass a message
      • gives fault-tolerance, scalability, security
    • composition
    • causality: A -[msgs]-> B
      • time: A does not know about B after the messages are sent
    • physics: program and data must be in the same space/time

ideas

Real production use: Reactive design for the manufacturing industry

Roland Kuhn @rolandkuhn

Asynchronous Programming with Kotlin

Hadi Hariri @hhariri

  • async (non-blocking) programming models
    • multithreading
      • creation is expensive (OS level)
      • shared mutable state is dangerous
    • callbacks
      • error handling is complex
    • promises/futures
    • Rx (reactive extensions): observables and subscriptions
      • but then everything is an observable stream
    • kotlin coroutines
    • golang coroutines

kotlin coroutines

  • coroutines are suspendable
    • launch / suspend
  • they are FSM with CPS: callbacks handleded for you using a FSM
  • async/await are functions, not language keywords
  • launch returns a new job, which you can call join to make it wait
  • channels for communicating between coroutines
    • patterns: fan-in, fan-out...

General Purpose Big Data Systems are eating the world: Tool consolidation -- is it inevitable?

Holden Karau @holdenkarau

  • Co-author of Learning Spark, High Performance Spark
  • Apache Beam main contributor
  • Former Spark contributor
  • Every-time there is a new piece of big data technology we often see many different specific implementations of the concepts, which often eventually consolidate down to a few viable options, and then frequently end up getting rolled into part of another larger project.
  • Abstracting all the data processor is impossible
  • Beam still WIP, lots of problems in the way
  • Beam backends: Google CP + Flink
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment