Skip to content

Instantly share code, notes, and snippets.

View DennisFederico's full-sized avatar

Dennis Federico DennisFederico

View GitHub Profile

Event Deduplication in Kafka Stream processing using ksqlDB (SIMPLIFIED)

Introduction

Event deduplication emiting the very first message and filtering the rest of the duplicates within a tumbling window or a session window, using ksqlDB.

The ID to deduplicate is the eventId field inside the record payload, thus the first step is to re-key the stream by this field.

The deduplication is done by counting the number of times the eventId appears in the stream within a window, and only emitting the first event that has a count of 1.

@DennisFederico
DennisFederico / Private-Public-Private.md
Last active December 2, 2024 23:36
Private-Public-Private Cluster Linking

Cluster Link with Public Jump Cluster

This pattern applies to AZURE and GCP Cross-Region replication between two private cluster on different cloud regions. See. Private-Public-Private

This requires two Cluster Links, one from the Source private cluster to the Public "Jump" Cluster, and then a second Cluster Link, from the Public to the Destination private cluster. Cluster linking is commonly hosted on the destination cluster of the data, but for the first leg of the replication flow, the public (destination) cluster cannot start a connection to the private (destination) cluster, thus the cluster link needs to be source initiated.

Assume a ACTIVE-PASSIVE DR scenario where data is replicated between two private cluster using a public cluster inbetween the replication flow. Let's define our clusters.