Created
December 22, 2013 20:37
-
-
Save toff63/8088126 to your computer and use it in GitHub Desktop.
Confraria dos arquitetos:Stream vs CEP
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Kafka: A high-throughput distributed messaging system. | |
| Publish-subscribe messaging as a distributed commit log | |
| Fast: hundreds of megabytes of reads and writes per second from thousands of clients | |
| Scalable: designed to be a central data backbone for large organization. Data streams are partitionned and spread over a cluster of machines | |
| Durable: Message persisted on disk and replicated within the cluster to prevent data loss. Each broker can handle terabytes of messages without performance impact. | |
| Distributed by design: cluster centric design offerring strong durability and fault tolerance guarantees | |
| Intro: | |
| publish-subscribe messaging solution on top of tcp. | |
| Each topic is partitionned and replicated on several patition. Consumer handles where they are in the partition (message counter). | |
| For each partition there is a leader node and follower nodes. Leader handle reads and writes. If the leader fails a follower will take over. | |
| Producer choose on wich partition they will publish. | |
| Very high throughput. Facility to ensure ordering. Message storage. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Real time distributed messaging platform | |
| Can handle 90k+ messages per seconds | |
| Features | |
| support distributed topologies with no SPOF | |
| horizontally scalable (no brokers, seamlessly add more nodes to the cluster) | |
| low-latency push based message delivery (performance) | |
| combination load-balanced and multicast style message routing | |
| excel at both streaming (high-throughput) and job oriented (low-throughput) workloads | |
| primarily in-memory (beyond a high-water mark messages are transparently kept on disk) | |
| runtime discovery service for consumers to find producers (nsqlookupd) | |
| transport layer security (TLS) | |
| data format agnostic | |
| few dependencies (easy to deploy) and a sane, bounded, default configuration | |
| simple TCP protocol supporting client libraries in any language | |
| HTTP interface for stats, admin actions, and producers (no client library needed to publish) | |
| integrates with statsd for realtime instrumentation | |
| robust cluster administration interface (nsqadmin) | |
| Tradeoff: | |
| messages are NOT durable | |
| messages are delivered at least once: client’s responsibility to perform idempotent operations or de-dupe. | |
| messages received are UN-ORDERED | |
| consumers eventually find all topic producers |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Distributed stream processing framework. Uses Kafka for messaging and Hadoop YARN for fault tolerance, processor isolation, security and ressource management. | |
| Provide: | |
| * simple API. Simple callback based ""process message" API. | |
| * managed state: Manages snapshot and restoration of stream processor state. | |
| * fault tolerance: uses YARN to restart the stream | |
| * durability: Kafka for message ordering and durability | |
| * scalability: partitionning and distributed at every level. Kafka for partitionned ordered messages. YARN for distributed environment for samza container to run in. | |
| * Pluggable: samza provides an API to run with other messaging system (ie not Kafka) and other execution environment (ie not YARN) | |
| * Process isolation: provdied by YARN |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Distributed and fault-tolerant realtime computation | |
| It is scalable, fault-tolerant, guarantees your data will be processed, and is easy to set up and operate. | |
| Scalable | |
| Guarantees no data loss | |
| Easy to set up and maintain | |
| Fault tolerant: in case of failure Storm will reassign the work. Computation can work forever. | |
| Language agnostic | |
| Topology: | |
| Nimbus: assign work to nodes. It distribute code. | |
| Each node has a Superviser that will create worker on demand to process work assigned by Nimbus | |
| Zookeeper make the glue between Nimbus and Supervisors. It also keep the state of the cluster. If you kill a Nimbus or a Supervisor, Zookeeper will create a new one. | |
| Stream is an unbounded sequence of tuples. | |
| Spout: stream source like tweet stream, messaging queue | |
| Bolt: consumes any number of Spout to process events and possibly emit a new stream |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment