Chidvi Doddi DoddiC

Getting Started with AWS MSK Kafka

Create Kafka MSK

Create AWS kafka instance you can use default broker sige for this POC.

Create an EC2 Instance

Ec2 used for publisher and subscriber of kafka events.

Setup your EC2 Instance to produce Kafka Topic

If you accidentally landed on this page looking for the original (Franz) Kafka, however unlikely it may seem, I have nothing to offer to you, except this wonderful quote.

By believing passionately in something that still does not exist, we create it. The nonexistent is whatever we have not sufficiently desired -- Franz Kafka

But if you came here for Apache Kafka, I assume you already know what Kafka is. But for the uninitiated, here is a fun illustrated introduction to Kafka, aptly named Gently Down The Stream. But don't stop there, read a more serious introduction here.

With that out of the way, lets get straight to the point of this this post. We recently had the opportunity of deploying self managed Kafka clusters at scale for our multiple customers as part of migrating them from On-premise / AWS environments to Google Cloud Platform. This post is a how to guide on deploying a highly availab

References

I've been working with Apache Kafka for over 7 years. I inevitably find myself doing the same set of activities while I'm developing or working with someone else's system. Here's a set of Kafka productivity hacks for doing a few things way faster than you're probably doing them now. 🔥

Show me all my Kafka topics and their partitions, replicas, and consumers
Show me the contents of a topic
Create a Kafka topic
Produce messages to a Kafka topic
Validate the schema of messages before producing to a topic
Do all of this at a distance

Get the tools

Kafka 0.11.0.0 (Confluent 3.3.0) added support to manipulate offsets for a consumer group via cli kafka-consumer-groups command.

List the topics to which the group is subscribed

kafka-consumer-groups --bootstrap-server <kafkahost:port> --group <group_id> --describe

Note the values under "CURRENT-OFFSET" and "LOG-END-OFFSET". "CURRENT-OFFSET" is the offset where this consumer group is currently at in each of the partitions.

Reset the consumer offset for a topic (preview)

Apache Kafka is publish-subscribe messaging rethought as a distributed commit log.

	#!/usr/bin/env bash

	# Author: Kel Graham
	# Date: 2019-12-04
	# Purpose: Let's use Kafka CLI tools to get topic counts.
	# This should be usable on any Kafka install since it uses
	# only the kafka-consumer-groups utility and some standard
	# unix tools to sum up.

	## Consumer Throughput: Single consumer thread, no compression
	## Consumer Throughput: 3 consumer thread, no compression

	bin/kafka-consumer-perf-test.sh --topic benchmark-3-3-none \
	--zookeeper kafka-zk-1:2181,kafka-zk-2:2181,kafka-zk-3:2181 \
	--messages 15000000 \
	--threads 1

	<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
	<template>
	<description>This template generates messages, puts it to Kafka topic. Then another processor gets messages from
	Kafka and put it on HDFS.
	</description>
	<name>Kerberized Kafka and HDFS</name>
	<snippet>
	<connections>
	<id>2b93ffcd-0698-44a9-86f6-ce0ea6fc4145</id>
	<parentGroupId>3bdd324d-db87-4a21-8149-f88d7a46741e</parentGroupId>

	Producer

	Setup
	bin/kafka-topics.sh --zookeeper esv4-hcl197.grid.linkedin.com:2181 --create --topic test-rep-one --partitions 6 --replication-factor 1
	bin/kafka-topics.sh --zookeeper esv4-hcl197.grid.linkedin.com:2181 --create --topic test --partitions 6 --replication-factor 3

	Single thread, no replication

	bin/kafka-run-class.sh org.apache.kafka.clients.tools.ProducerPerformance test7 50000000 100 -1 acks=1 bootstrap.servers=esv4-hcl198.grid.linkedin.com:9092 buffer.memory=67108864 batch.size=8196