Dev setup for Kafka and Spark Streaming

This note is for setting up rapid prototyping of MongoDB -> Kafka -> Spark Streaming pipeline.

Pyspark is being used for fast turn around. Pyspark uses Kafka connector that uses ZooKeeper so the topic has to be created with offsets stored in ZooKeeper.


$> brew install zookeeper kafka apache-spark jupyter
$> brew services start zookeeper
$> brew services start kafka

Create Kafka topic

$> kafka-topics --create --zookeeper localhost:2181 --replication-factor 1 --partitions 5 --topic checklist

Check that the partition is created.

$> kafka-topics --describe --zookeeper localhost:2181 --topic checklist 

Topic:checklist	PartitionCount:5	ReplicationFactor:1	Configs:
	Topic: checklist	Partition: 0	Leader: 0	Replicas: 0	Isr: 0
	Topic: checklist	Partition: 1	Leader: 0	Replicas: 0	Isr: 0
	Topic: checklist	Partition: 2	Leader: 0	Replicas: 0	Isr: 0
	Topic: checklist	Partition: 3	Leader: 0	Replicas: 0	Isr: 0
	Topic: checklist	Partition: 4	Leader: 0	Replicas: 0	Isr: 0

Status of consumer group

Once Spark streaming code is run, it creates a new consumer group.

$> kafka-consumer-groups  --zookeeper localhost:2181 --describe --group checklist-group

Note: This will only show information about consumers that use ZooKeeper (not those using the Java consumer API).

checklist                 0          25094           25094           0          checklist-group_Rydeen.local-1517642255383-8ec43f30
checklist                 1          25387           25387           0          checklist-group_Rydeen.local-1517642255383-8ec43f30
checklist                 2          24989           24989           0          checklist-group_Rydeen.local-1517642255383-8ec43f30
checklist                 3          24887           24887           0          checklist-group_Rydeen.local-1517642255383-8ec43f30
checklist                 4          25138           25138           0          checklist-group_Rydeen.local-1517642255383-8ec43f30

Reset consumer offset

kafka-consumer-groups --bootstrap-server localhost:9092 --reset-offsets --group checklist-group --all-topics --to-earliest

Purging kafka topics

kafka-configs --zookeeper localhost:2181  --alter --entity-type topics --add-config  --entity-name checklist

kafka-configs --zookeeper localhost:2181  --alter --entity-type topics --delete-config  --entity-name checklist
