Skip to content

Instantly share code, notes, and snippets.

@skryvets
Last active September 27, 2024 01:15
Show Gist options
  • Save skryvets/23a6a5d5a2ea4b5ab1bd8ae4f7fc5f13 to your computer and use it in GitHub Desktop.
Save skryvets/23a6a5d5a2ea4b5ab1bd8ae4f7fc5f13 to your computer and use it in GitHub Desktop.
Summarized notes from the presentation

Brick-by-Brick: Exploring the Elements of Apache Kafka®

Speaker: Danica Fine
YouTube Link: Youtube


Kafka Fundamentals

  • Apache Kafka is a distributed event streaming platform that enables real-time data applications. It supports:
    • Reactive, accurate, loosely coupled, and resilient systems.
    • Various data streaming patterns such as:
      • Publish/Subscribe
      • Queuing
      • Broadcasting
      • Batch processing

Events in Kafka (the most important)

  • Events in Kafka are records of something that has happened, defined by:
    • A timestamp and a description of what occurred.
    • Examples: Adding an item to an online cart or tracking the location of a ship.
  • Immutability: Events are immutable by nature and can’t be changed after they’ve occurred.

Kafka Topics

  • Kafka topics are logs, not queues.
    • Events persist even after being read by consumers.
    • Events in topics are ordered, immutable, and assigned a monotonically increasing offset.
  • Durability: Topics are append-only logs that can store data indefinitely, with configurable cleanup policies based on time or size.
  • Partitions: Topics are divided into partitions, which are durable logs as well.
    • Partitions allow for scalability, but require careful configuration to balance performance.

Kafka Brokers and Replication

  • Brokers are the nodes in a Kafka cluster.
    • Partitions are distributed across brokers to ensure even load.
    • Replication ensures that partitions have multiple copies across different brokers for fault tolerance.

Writing Data to Kafka

  • Producer: When writing to Kafka, the required fields are:
    • Topic: Defines where the event is written.
    • Value: The event or message itself.
  • Kafka supports producers in multiple programming languages.
  • Serialization: Kafka only processes data in byte format, so data needs to be serialized (e.g., using Avro, JSON, or Protobuf).
  • Partitioning: Producers either assign a partition for the data or rely on a default partitioning strategy.

Reading Data from Kafka

  • Consumers can start reading from the:
    • Earliest event,
    • Most recent event,
    • Or from a specific offset or timestamp.
  • Offset tracking: Consumers commit their processed offset back to Kafka to ensure they can resume where they left off in case of failure.
  • Consumer groups: Multiple consumers can be grouped to process data in parallel.

Kafka Ecosystem

  • Schema Registry: Manages and maintains schemas across topics, supporting schema evolution and compatibility.

  • Kafka Connect:

    • A framework that connects Kafka with other data systems (sources and sinks).
    • Offers low-code/no-code options for easy integration.
    • Can be used to move data from Kafka to other systems or into Kafka.
  • Kafka Streams:

    • A Java/Scala library for stream processing with built-in support for stateful processing.
    • ksqlDB: A SQL-based interface for stream processing built on Kafka Streams, making stream processing more accessible.
  • Other Frameworks: Kafka can integrate with tools like Apache Spark and Apache Flink for stream processing that may offer advantages in certain use cases (e.g., different languages or features).


Kafka Applications

  • Financial Services: Used for fraud detection and real-time financial processing.
  • IoT and Manufacturing: Real-time event processing for tracking devices and systems.
  • Inventory Management: Kafka can manage real-time inventory data for supply chain optimization.

Speaker: https://linktr.ee/thedanicafine

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment