Last active
November 12, 2018 19:00
-
-
Save okram999/5ccc1791a5c7442b3b3195dea98c9003 to your computer and use it in GitHub Desktop.
Kafka
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Kafka: | |
Topics/Partitions/Offsets: | |
* Think of it as a database, which have streams of data | |
* Can have many topics, identified by names | |
* Topics are splitted into partitions | |
-- partitions are ordered | |
-- each msg within each partition gets an incremental id called OFFSETS | |
-- Order of a msg is only garantueed within a partition. NOT accross partitions | |
-- Dafault of data retention is 7 days | |
-- Data are immutable inside the stream (cannot be changed) | |
-- Data is assinged to a partition of a topic radomly if you dont provide a key | |
BROKER: | |
============= | |
Kafka cluster consist of multiple brokers | |
each have an ID | |
Connect to 1 broker, you are connected to the entire cluster (Bootstrap) | |
Topic Replication Factor: | |
============================== | |
This is for high availabity | |
Topics needs to have replications | |
https://photos.google.com/photo/AF1QipOfp_u-yCbw6WFbYZAgu6KV7cl85RVjZjQGdscL | |
At any point, there is only one LEADER serving data for a partition. | |
Therefore each partition have one leader and ISR (in-sync replica) | |
Zookeeper - decides the leader and ISR's | |
PRODUCERS: | |
=============================== | |
Producers write data | |
They know which broker and partition to write to | |
If broker fails, producers recover themselves | |
Producers will load balance, data being sent to mulitple brokers by themselves | |
Producers can choose to recieve data acknowledgements | |
https://photos.google.com/photo/AF1QipNRVHCAI4I55fzR6ye3WusSArG0d_ZnVpDse3A2 | |
Producers Message Keys: | |
IF keys=null - data is sent in round robins accross the partitions threfroee accross the brokers | |
key is included, if you need msg ordering -- this is achieved by sending to particular partitions | |
CONSUMERS: | |
======================= | |
Reads data from the topics | |
If brokers failed, consumers knows how to recover | |
Data will be read inorder WITHIN each partitions | |
CONSUMER GROUP: | |
============================== | |
* each consumer in a group reads data from an exclusive partitions | |
* if more consumers than partitions, some consumers may be inactive | |
* | |
CONSUMER OFFSET: | |
=============================== | |
# Kafka stores offsets at which consumer grp have been reading | |
# topic is called __consumer_offsets | |
# When a consumer in a group reads a data, it commits the offset | |
# this will help when a consumer dies and resumes operations | |
DELIVERY SEMANTICS FOR CONSUMERS | |
================================ | |
Ways to commit to the CONSUMER offsets | |
1. At most once: done as soon as the msg is recieved | |
2. At least once: **Prefered | |
--- offsets are committed after the msg is processed | |
--- This can result in duplicate processing - so make sure the system is idempotent | |
3. Exactly once | |
Kafka Broker Discovery: | |
=================================== | |
Every broker is a BOOTSTRAP server | |
you only need to connect to one broker | |
Each broker knows whicj broker with which topics and which partitions (They have metadatas) | |
Zookeeper: | |
* Manages the brokers | |
* Selects the leaders and ISR | |
* sends notifications to Kafka in case of changes (new topics, broker up donw, delete topics etcs) | |
* Zookeepers will have odd # of servers | |
* Zooker has a leader (handles writes) the rest are followers (handles read) | |
Kafka Guarantees: | |
# Msg are appended to a topic partition in the order they are sent | |
# Consumers read msgs in the order stored in a topic partition | |
Linux - Summary | |
Download and Setup Java 8 JDK: | |
sudo apt install openjdk-8-jdk | |
Download & Extract the Kafka binaries from https://kafka.apache.org/downloads | |
Try Kafka commands using bin/kafka-topics.sh (for example) | |
Edit PATH to include Kafka (in ~/.bashrc for example) PATH="$PATH:/your/path/to/your/kafka/bin" | |
Edit Zookeeper & Kafka configs using a text editor | |
zookeeper.properties: dataDir=/your/path/to/data/zookeeper | |
server.properties: log.dirs=/your/path/to/data/kafka | |
Start Zookeeper in one terminal window: zookeeper-server-start.sh config/zookeeper.properties | |
Start Kafka in another terminal window: kafka-server-start.sh config/server.properties | |
Kafka CLI: | |
============= | |
$ kafka-topics.bat --zookeeper 127.0.0.1:2181 --topic first_topic --partitions 3 --replication-factor 1 --create | |
$ kafka-topics.bat --zookeeper 127.0.0.1:2181 --topic first_topic --describe | |
$ kafka-topics.bat --zookeeper 127.0.0.1:2181 --topic second_topic --delete | |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment