Create AWS kafka instance you can use default broker sige for this POC.
Ec2 used for publisher and subscriber of kafka events.
If you accidentally landed on this page looking for the original (Franz) Kafka, however unlikely it may seem, I have nothing to offer to you, except this wonderful quote.
By believing passionately in something that still does not exist, we create it. The nonexistent is whatever we have not sufficiently desired -- Franz Kafka
But if you came here for Apache Kafka, I assume you already know what Kafka is. But for the uninitiated, here is a fun illustrated introduction to Kafka, aptly named Gently Down The Stream. But don't stop there, read a more serious introduction here.
With that out of the way, lets get straight to the point of this this post. We recently had the opportunity of deploying self managed Kafka clusters at scale for our multiple customers as part of migrating them from On-premise / AWS environments to Google Cloud Platform. This post is a how to guide on deploying a highly availab
#!/usr/bin/env bash | |
# Author: Kel Graham | |
# Date: 2019-12-04 | |
# Purpose: Let's use Kafka CLI tools to get topic counts. | |
# This should be usable on any Kafka install since it uses | |
# only the kafka-consumer-groups utility and some standard | |
# unix tools to sum up. | |
I've been working with Apache Kafka for over 7 years. I inevitably find myself doing the same set of activities while I'm developing or working with someone else's system. Here's a set of Kafka productivity hacks for doing a few things way faster than you're probably doing them now. 🔥
Kafka 0.11.0.0 (Confluent 3.3.0) added support to manipulate offsets for a consumer group via cli kafka-consumer-groups
command.
kafka-consumer-groups --bootstrap-server <kafkahost:port> --group <group_id> --describe
Note the values under "CURRENT-OFFSET" and "LOG-END-OFFSET". "CURRENT-OFFSET" is the offset where this consumer group is currently at in each of the partitions.
## Consumer Throughput: Single consumer thread, no compression | |
## Consumer Throughput: 3 consumer thread, no compression | |
bin/kafka-consumer-perf-test.sh --topic benchmark-3-3-none \ | |
--zookeeper kafka-zk-1:2181,kafka-zk-2:2181,kafka-zk-3:2181 \ | |
--messages 15000000 \ | |
--threads 1 |
<?xml version="1.0" encoding="UTF-8" standalone="yes"?> | |
<template> | |
<description>This template generates messages, puts it to Kafka topic. Then another processor gets messages from | |
Kafka and put it on HDFS. | |
</description> | |
<name>Kerberized Kafka and HDFS</name> | |
<snippet> | |
<connections> | |
<id>2b93ffcd-0698-44a9-86f6-ce0ea6fc4145</id> | |
<parentGroupId>3bdd324d-db87-4a21-8149-f88d7a46741e</parentGroupId> |
Apache Kafka is publish-subscribe messaging rethought as a distributed commit log.
Producer | |
Setup | |
bin/kafka-topics.sh --zookeeper esv4-hcl197.grid.linkedin.com:2181 --create --topic test-rep-one --partitions 6 --replication-factor 1 | |
bin/kafka-topics.sh --zookeeper esv4-hcl197.grid.linkedin.com:2181 --create --topic test --partitions 6 --replication-factor 3 | |
Single thread, no replication | |
bin/kafka-run-class.sh org.apache.kafka.clients.tools.ProducerPerformance test7 50000000 100 -1 acks=1 bootstrap.servers=esv4-hcl198.grid.linkedin.com:9092 buffer.memory=67108864 batch.size=8196 |