Skip to content

Instantly share code, notes, and snippets.

@toff63
Created December 22, 2013 20:37
Show Gist options
  • Select an option

  • Save toff63/8088126 to your computer and use it in GitHub Desktop.

Select an option

Save toff63/8088126 to your computer and use it in GitHub Desktop.
Confraria dos arquitetos:Stream vs CEP
Kafka: A high-throughput distributed messaging system.
Publish-subscribe messaging as a distributed commit log
Fast: hundreds of megabytes of reads and writes per second from thousands of clients
Scalable: designed to be a central data backbone for large organization. Data streams are partitionned and spread over a cluster of machines
Durable: Message persisted on disk and replicated within the cluster to prevent data loss. Each broker can handle terabytes of messages without performance impact.
Distributed by design: cluster centric design offerring strong durability and fault tolerance guarantees
Intro:
publish-subscribe messaging solution on top of tcp.
Each topic is partitionned and replicated on several patition. Consumer handles where they are in the partition (message counter).
For each partition there is a leader node and follower nodes. Leader handle reads and writes. If the leader fails a follower will take over.
Producer choose on wich partition they will publish.
Very high throughput. Facility to ensure ordering. Message storage.
Real time distributed messaging platform
Can handle 90k+ messages per seconds
Features
support distributed topologies with no SPOF
horizontally scalable (no brokers, seamlessly add more nodes to the cluster)
low-latency push based message delivery (performance)
combination load-balanced and multicast style message routing
excel at both streaming (high-throughput) and job oriented (low-throughput) workloads
primarily in-memory (beyond a high-water mark messages are transparently kept on disk)
runtime discovery service for consumers to find producers (nsqlookupd)
transport layer security (TLS)
data format agnostic
few dependencies (easy to deploy) and a sane, bounded, default configuration
simple TCP protocol supporting client libraries in any language
HTTP interface for stats, admin actions, and producers (no client library needed to publish)
integrates with statsd for realtime instrumentation
robust cluster administration interface (nsqadmin)
Tradeoff:
messages are NOT durable
messages are delivered at least once: client’s responsibility to perform idempotent operations or de-dupe.
messages received are UN-ORDERED
consumers eventually find all topic producers
Distributed stream processing framework. Uses Kafka for messaging and Hadoop YARN for fault tolerance, processor isolation, security and ressource management.
Provide:
* simple API. Simple callback based ""process message" API.
* managed state: Manages snapshot and restoration of stream processor state.
* fault tolerance: uses YARN to restart the stream
* durability: Kafka for message ordering and durability
* scalability: partitionning and distributed at every level. Kafka for partitionned ordered messages. YARN for distributed environment for samza container to run in.
* Pluggable: samza provides an API to run with other messaging system (ie not Kafka) and other execution environment (ie not YARN)
* Process isolation: provdied by YARN
Distributed and fault-tolerant realtime computation
It is scalable, fault-tolerant, guarantees your data will be processed, and is easy to set up and operate.
Scalable
Guarantees no data loss
Easy to set up and maintain
Fault tolerant: in case of failure Storm will reassign the work. Computation can work forever.
Language agnostic
Topology:
Nimbus: assign work to nodes. It distribute code.
Each node has a Superviser that will create worker on demand to process work assigned by Nimbus
Zookeeper make the glue between Nimbus and Supervisors. It also keep the state of the cluster. If you kill a Nimbus or a Supervisor, Zookeeper will create a new one.
Stream is an unbounded sequence of tuples.
Spout: stream source like tweet stream, messaging queue
Bolt: consumes any number of Spout to process events and possibly emit a new stream
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment