Skip to content

Instantly share code, notes, and snippets.

@okram999
Last active November 12, 2018 19:00
Show Gist options
  • Save okram999/5ccc1791a5c7442b3b3195dea98c9003 to your computer and use it in GitHub Desktop.
Save okram999/5ccc1791a5c7442b3b3195dea98c9003 to your computer and use it in GitHub Desktop.
Kafka
Kafka:
Topics/Partitions/Offsets:
* Think of it as a database, which have streams of data
* Can have many topics, identified by names
* Topics are splitted into partitions
-- partitions are ordered
-- each msg within each partition gets an incremental id called OFFSETS
-- Order of a msg is only garantueed within a partition. NOT accross partitions
-- Dafault of data retention is 7 days
-- Data are immutable inside the stream (cannot be changed)
-- Data is assinged to a partition of a topic radomly if you dont provide a key
BROKER:
=============
Kafka cluster consist of multiple brokers
each have an ID
Connect to 1 broker, you are connected to the entire cluster (Bootstrap)
Topic Replication Factor:
==============================
This is for high availabity
Topics needs to have replications
https://photos.google.com/photo/AF1QipOfp_u-yCbw6WFbYZAgu6KV7cl85RVjZjQGdscL
At any point, there is only one LEADER serving data for a partition.
Therefore each partition have one leader and ISR (in-sync replica)
Zookeeper - decides the leader and ISR's
PRODUCERS:
===============================
Producers write data
They know which broker and partition to write to
If broker fails, producers recover themselves
Producers will load balance, data being sent to mulitple brokers by themselves
Producers can choose to recieve data acknowledgements
https://photos.google.com/photo/AF1QipNRVHCAI4I55fzR6ye3WusSArG0d_ZnVpDse3A2
Producers Message Keys:
IF keys=null - data is sent in round robins accross the partitions threfroee accross the brokers
key is included, if you need msg ordering -- this is achieved by sending to particular partitions
CONSUMERS:
=======================
Reads data from the topics
If brokers failed, consumers knows how to recover
Data will be read inorder WITHIN each partitions
CONSUMER GROUP:
==============================
* each consumer in a group reads data from an exclusive partitions
* if more consumers than partitions, some consumers may be inactive
*
CONSUMER OFFSET:
===============================
# Kafka stores offsets at which consumer grp have been reading
# topic is called __consumer_offsets
# When a consumer in a group reads a data, it commits the offset
# this will help when a consumer dies and resumes operations
DELIVERY SEMANTICS FOR CONSUMERS
================================
Ways to commit to the CONSUMER offsets
1. At most once: done as soon as the msg is recieved
2. At least once: **Prefered
--- offsets are committed after the msg is processed
--- This can result in duplicate processing - so make sure the system is idempotent
3. Exactly once
Kafka Broker Discovery:
===================================
Every broker is a BOOTSTRAP server
you only need to connect to one broker
Each broker knows whicj broker with which topics and which partitions (They have metadatas)
Zookeeper:
* Manages the brokers
* Selects the leaders and ISR
* sends notifications to Kafka in case of changes (new topics, broker up donw, delete topics etcs)
* Zookeepers will have odd # of servers
* Zooker has a leader (handles writes) the rest are followers (handles read)
Kafka Guarantees:
# Msg are appended to a topic partition in the order they are sent
# Consumers read msgs in the order stored in a topic partition
Linux - Summary
Download and Setup Java 8 JDK:
sudo apt install openjdk-8-jdk
Download & Extract the Kafka binaries from https://kafka.apache.org/downloads
Try Kafka commands using bin/kafka-topics.sh (for example)
Edit PATH to include Kafka (in ~/.bashrc for example) PATH="$PATH:/your/path/to/your/kafka/bin"
Edit Zookeeper & Kafka configs using a text editor
zookeeper.properties: dataDir=/your/path/to/data/zookeeper
server.properties: log.dirs=/your/path/to/data/kafka
Start Zookeeper in one terminal window: zookeeper-server-start.sh config/zookeeper.properties
Start Kafka in another terminal window: kafka-server-start.sh config/server.properties
Kafka CLI:
=============
$ kafka-topics.bat --zookeeper 127.0.0.1:2181 --topic first_topic --partitions 3 --replication-factor 1 --create
$ kafka-topics.bat --zookeeper 127.0.0.1:2181 --topic first_topic --describe
$ kafka-topics.bat --zookeeper 127.0.0.1:2181 --topic second_topic --delete
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment