Created
May 11, 2017 11:04
-
-
Save jamesrajendran/0e5866d9044a742b3a8885836868c66f to your computer and use it in GitHub Desktop.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
kafka vs Flume: | |
Kafka - general purpose infrastructure with producer and consumer API, so can be used with any system/sinks. | |
Enterprise messaging system, to connect any systems not just Hadoop - Flume is primarily meant for Hadoop. | |
Asynchronous kafka - producer and consumer can work on their own pace, for example no data losss if the consumer is down for some time. | |
Consumer can come up later and resume where left off - it can even pull from a particular older offset if needed. | |
Overflow events will be in the kafka "persistent buffer" | |
Events spike can be easily accomodated in Kafka, can handle 100K per second | |
High durability/Fault Tolerance: | |
Replication possible in Kafka | |
Flume though supports file channel, data not in sink yet but in channel cannot be used until agent comes back up. | |
If the file channel host server crashes, data recovery is not possible - so entails expensive RAID/SAN systems. | |
Kafka supports synchronous/asynchronous replication hence commodity hardware is good enough. | |
High Scalability: | |
Consumers can be added on the fly without affecting performance or downtime. | |
In Flume, the topology of Flume pipeline has to be changed, replicating the channel for new sink and hence some downtime. | |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment