Last active
July 22, 2021 11:26
-
-
Save giri-sh/bb5c7c9bf781bd743a965bedf67b539d to your computer and use it in GitHub Desktop.
Kafka Fundamentals Notes
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
What is this gist about? | |
-- Apache Kafka Fundamentals | |
Why do we need Apache Kafka? | |
-- Every day data produced in the world is huge. Currently estimated at 2.5 QB (Quintillion Bytes). | |
-- Because of the huge data that we generate every day, we need some kind of Queuing theory that can process this data for our systems. | |
Types of Queuing Systems? | |
-- P2P | |
-- Publisher-Subscriber | |
What is Apache Kafka? | |
-- Kafka is a distributed, reliable and performant streaming platform. | |
-- Kafka works on Publisher-Subscriber model. | |
-- Kafka has the capability of handling the continuous stream of data. | |
-- Kafka supports the transfer of huge data or requests between systems. | |
-- Kafka stores data that is published and consumed. | |
What is Zookeeper? | |
-- Cluster management system for Kafka. | |
-- Also acts as an orchestrator for Kafka. | |
-- Zookeeper ensemble | |
-- Zookeeper is needed to - | |
---- Elect topic leader | |
---- Resolve deadlock issues | |
Kafka Cluster - Collection of brokers. | |
Broker - Independent instance of Kafka service. Each broker runs in its own VM. It is also known as a Bootstrap server. | |
Topic - Is based on commit log architecture. Mulitple topics can be created in a Kafka cluster. | |
Partitions - Topics are divided into partitions. These are created for improving parallel processing. Messages are stored in partitions with incremental offset. | |
What are the guarantees that Kafka provides? | |
-- Ordering is confirmed in a partition. | |
-- The default time for which the data is stored is 7 days. This is customizable. | |
-- | |
-- Partition to broker assignment is automatic | |
What is a Producer? | |
-- System that generates the data. | |
-- Uses API to write data to a Kafka cluster. | |
-- Uses keys to send the data. | |
What are Acknowledgements? | |
-- Response that the producer awaits for to confirm that that data produced has been safely stored in Kafka cluster. | |
-- 3 modes of acknowledgements. | |
---- 0 - Fire and forget. | |
---- 1 - Get ackowledgements from leader. | |
---- All - Get acknowledgement from all (leaders and ISR) | |
What is a consumer? | |
-- System that reads data from topics. | |
-- Multiple consumer groups can consume a particular topic or a partition. | |
What are consumer groups? | |
-- Group of consumers that is created to achieve a common goal. | |
-- Goal can be - to save the data to DB or perform an alerting operation. | |
What are Delivery Semantics? | |
-- Process by which consumers mark the message in a topic as consumed. | |
-- 3 modes of delivery semantics - | |
Replication Factor - | |
-- Helps preserve the number of copies of topic in a cluster. | |
Offset - Incremental integer ID assigned to a broker | |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment