Streaming Producer: Partition Assignment and Batching

Goal

Allow developers to enqueue events to be efficiently published without being burdened with managing batches or needing a deep understanding of Event Hubs partitioning. Publishing behavior should be understandable and performance consistent regardless of the pattern of events being enqueued.

Constraints

The order of events when published must match the order that they were enqueued, unless developers specifically opt-out of ordering to achieve higher throughput via concurrency.
If any event in a batch uses a partition key, all events in the batch must use the same partition key; the batch must be published to the Event Hubs gateway.
If a batch of events is published to the Event Hubs gateway with the intent of automatic distribution to partitions, all events in the batch will be automatically distributed; no event can specify a specific partition or use a partition key.
If a batch of events is published to a specific partition, all events in the batch will be committed to that partition; no event can specify a different partition, use a partition key, or request automatic distribution elsewhere.
Idempotent retries can be used only for event batches published to a specific partition; events using a partition key or requesting automatic distribution cannot be published idempotently.

Assumptions

There is a demand from developers for the streaming model of publishing; the Streaming Producer will be a core type for general publishing needs and should not be positioned for special-case scenarios.
The majority of developers will use the default options.

Developer Scenario

Assume that a streaming producer is publishing to an Event Hub with 4 partitions. The following sequence of events are enqueued:

"Event 1" : { PartitionId: "",  PartitionKey: "" }
"Event 2" : { PartitionId: "",  PartitionKey: "omgrichardwhy?" }
"Event 3" : { PartitionId: "",  PartitionKey: "" }
"Event 4" : { PartitionId: "",  PartitionKey: "" }
"Event 5" : { PartitionId: "",  PartitionKey: "omgrichardwhy?" }
"Event 6" : { PartitionId: "3", PartitionKey: "" }
"Event 7" : { PartitionId: "",  PartitionKey: "thiskeyisawesome" }
"Event 8" : { PartitionId: "",  PartitionKey: "" }
"Event 9" : { PartitionId: "",  PartitionKey: "" }

What are your expectations for what batches would be published?

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

Option: Producer Assigned Partitions

The streaming producer assumes responsibility for assigning partitions to all events as they are enqueued. This includes round-robin assignment and the hashing of partition keys.

Outcome

Batches are formed efficiently by partition, built/published concurrently.

Assume that the key "omgrichardwhy?" maps to partition 2 and "thiskeyisawesome" maps to partition 1. The resulting batches published would be:

BATCH
{
  "Partition 0"
  [
    "Event 1",
    "Event 9"
  ]
}

BATCH
{
  "Partition 1"
  [
    "Event 3",
    "Event 7"
  ]
}

BATCH
{
  "Partition 2"
  [
    "Event 2",
    "Event 4",
    "Event 5"
  ]
}

BATCH
{
  "Partition 3"
  [
    "Event 6",
    "Event 8"
  ]
}

Benefits

Because the client is partitioning events as they are enqueued, every event has a specific partition assigned. The producer is able to manage each partition independently while maintaining the expected ordering of events.
The producer is able to build and publish batches for each partition concurrently, ensuring the best possible throughput.
Batches never have to be published prematurely to ensure the ordering of events.
Idempotent retries can be used for all partitions.

Drawbacks

The mapping of partition keys to partitions may differ from the service behavior; even if we replicate the service logic, it may drift over time.
Unless include hashing for all our producer types and use the same algorithm across languages, the mapping of partition keys to partitions may differ between types or languages.
The client library has no way to detect if a partition becomes unavailable. If the client is responsible for applying round-robin and partition key hashing, it is possible that events would be published to an unavailable partition. This may impact availability.

Challenges

The Event Hubs guarantee of "events with the same partition key will always go to the same partition" would need to be maintained. (ref)

Open Questions

What is the scope of the guarantee of "events with the same partition key will always go to the same partition"? Does that apply to an individual request? Something wider? Wouldn't dynamically adding partitions cause the hashing to change?
Is there a way that we can set the partition key on an event being published directly to a partition so that it is reflected in the metadata when the event is received?
Should we add the hashing logic to the EventHubProducerClient as well to ensure that partition keys are consistently mapped to the same partition?

Option: Service Partitioning (heterogeneous batches)

The existing service constraints are loosened, and it is possible to publish heterogeneous batches to the gateway for partition assignment. This would require supporting events for automatic routing, events with partition keys, and events for a specific partition within the same batch.

For idempotent retries to be possible, the gateway would also need to support idempotent semantics in the same manner as a partition.

Outcome

Batches could be formed efficiently without the need to segment events. To maintain the correct order of events, publishing would need to take place sequentially.

Assuming that there were not size considerations, the resulting batch for the example events would be:

BATCH
{
  "Event 1",
  "Event 2",
  "Event 3",
  "Event 4",
  "Event 5",
  "Event 6",
  "Event 7",
  "Event 8",
  "Event 9"
}

Benefits

The existing Event Hubs service logic for automatic assignment and partition key mapping would be used, ensuring consistency across event publishers.
The Event Hubs guarantee of "events with the same partition key will always go to the same partition" would continue to be fully controlled by the Event Hubs service and not require SDK releases to tweak.
The producer is agnostic to automatic assignment and partition keys while still constructing batches efficiently.
Publishing behavior is simplified and easy to reason about for developers.

Drawbacks

The producer would have to build and publish batches sequentially in to maintain the correct order of events.
The set of changes to the Event Hub service are non-trivial and are likely to have a negative impact on performance characteristics.
The Event Hubs gateway would taking on more responsibility and is highly likely to use more resources. This would be expected to have a negative impact on the operating costs and ROI.
The service changes would need to be scheduled, implemented, and tested which would increase the time-to-market and impose an opportunity cost on the Azure Messaging team.

Option: Service Partitioning (existing constraints)

The streaming producer batches events appropriate for the Event Hubs gateway, allowing the service to hold responsibility for hashing partition keys and routing events with no assignment.

Outcome

Batches are formed and published inefficiently due to constraints for partition keys. Each batch must be built and published sequentially to maintain the correct ordering of events.

The resulting batches would be:

BATCH
{
  "Automatic Assignment"
  [
    "Event 1"
  ]
}

BATCH
{
  "Partition Key: omgrichardwhy?"
  [
    "Event 2"
  ]
}

BATCH
{
  "Automatic Assignment"
  [
    "Event 3",
    "Event 4"
  ]
}

BATCH
{
  "Partition Key: omgrichardwhy?"
  [
    "Event 5"
  ]
}

BATCH
{
  "Partition: 3"
  [
    "Event 6"
  ]
}

BATCH
{
  "Partition Key: thiskeyisawesome"
  [
    "Event 7"
  ]
}

BATCH
{
  "Automatic Assignment"
  [
    "Event 8",
    "Event 9"
  ]
}

Benefits

The existing Event Hubs service logic for automatic assignment and partition key mapping would be used, ensuring consistency across event publishers.
The Event Hubs guarantee of "events with the same partition key will always go to the same partition" would continue to be fully controlled by the Event Hubs service and not require SDK releases to tweak.

Drawbacks

Developers who which to publish events efficiently would need to have a solid working understanding of how Event Hubs partitioning and publishing constraints work.
Because the client relies on the Event Hubs gateway for automatic assignment and partition key support, it cannot batch efficiently without ignoring the ordering of events.
Publishing can be highly inefficient with batches containing a single event. This will be most likely and impactful when partition keys are in use.
Publishing must be sequential to maintain the order of events.
Idempotent retries cannot be used when events are enqueued with a partition key or to be automatically assigned. When enabled, either the producer would need to enforce that all events have an explicit partition only or publishing behavior would differ based on how an event was enqueued.

Challenges

The entire developer experience; the streaming producer would perform poorly for common developer scenarios. Using it effectively would require enough knowledge that it would be as difficult to use, if not more, than the EventHubProducerClient, negating much of its value in the ecosystem.

Option: Single Publishing Semantic (specified upfront)

The streaming producer could take arguments at construction to declare the type of publishing desired, allowing only one of: "Automatic Assignment", "Partition Key", or "Direct Partition". This would allow the producer to assume usage and make optimize most cases.

Outcome

The developer scenario would not be supported as described earlier. The outcome would vary depending on how the producer was constructed.

If we assume that "Automatic Assignment" was selected The resulting batches would be:

BATCH
{
  "Event 1",
  "Event 3",
  "Event 4",
  "Event 6",
  "Event 8",
  "Event 9"
}

However, if we assume that "Partition Key" was selected, the resulting batches would be:

BATCH
{
  "Partition Key: omgrichardwhy?"
  [
    "Event 2",
    "Event 5"
  ]
}

BATCH
{
  "Partition Key: thiskeyisawesome"
  [
    "Event 7"
  ]
}

Note that in each case, only a subset of events from the scenario could be published.

Benefits

The "Automatic Assignment" and "Direct Partition" cases are well-supported; batches can be created and published efficiently.
The existing Event Hubs service logic for automatic assignment and partition key mapping would be used, ensuring consistency across event publishers.
The Event Hubs guarantee of "events with the same partition key will always go to the same partition" would continue to be fully controlled by the Event Hubs service and not require SDK releases to tweak.

Drawbacks

Developers who which to publish events would need to have at least a cursory understanding of how Event Hubs partitioning and publishing constraints work.
The development experience is degraded, requiring more thought into how to enqueue events and, potentially, the use of multiple streaming producers concurrently.
The "Partition Key" case is not improved at all over the "Service Partitioning" approach previously discussed; the client relies on the Event Hubs gateway and cannot batch efficiently and maintain the ordering of events.
Publishing can be highly inefficient with batches containing a single event.
Publishing must be sequential to maintain the order of events.
Idempotent retries cannot be used when events are enqueued with a partition key or to be automatically assigned. When enabled, either the producer would need to enforce that all events have an explicit partition only or publishing behavior would differ based on how an event was enqueued.

Challenges

The entire developer experience; the streaming producer would be as difficult to use, if not more, than the EventHubProducerClient, negating much of its value in the ecosystem.

What About Kafka's Streaming Producer Model?

The Kafka producer send API accepts an optional partition key that is hashed for partition assignment if provided. If not provided, a partition is automatically assigned using a round-robin approach. (see: Kakfa Producer API)

The Kafka producer performs a partition assignment for a record when its send method is called. Each batch is associated to a specific partition at the time that it is created and filled as records are enqueued.

jsquire/streaming-producer-flow.md

Streaming Producer: Partition Assignment and Batching

Goal

Constraints

Assumptions

Developer Scenario

Option: Producer Assigned Partitions

Outcome

Benefits

Drawbacks

Challenges

Open Questions

Option: Service Partitioning (heterogeneous batches)

Outcome

Benefits

Drawbacks

Option: Service Partitioning (existing constraints)

Outcome

Benefits

Drawbacks

Challenges

Option: Single Publishing Semantic (specified upfront)

Outcome

Benefits

Drawbacks

Challenges

What About Kafka's Streaming Producer Model?

References: