Jesse Squire jsquire

Schema Registry: JSON Serializer

Azure Schema Registry currently supports the Avro schema type and is adding support for JSON Schema. The Azure SDK library for Schema Registry offers an Avro serializer that is integrated with the Schema Registry client. In order to provide parity between the schema formats, serializer support for JSON Schema with a developer experience consistent with Avro is needed.

Business impact

Contributes to the Kafka/Confluent compete story for Event Hubs. (marketing "checkbox" feature)
Contributes to the Kafka interoperability story for Event Hubs, focusing on cross-product producing and consuming scenarios. (marketing "checkbox" feature)
Enables a consistent developer experience with Schema Registry across different schema formats, reducing support costs and special-case documentation needs.

Azure SDK Repository Automation Rules

Goal

This document attempts to define a set of logical rules based on the existing Fabric Bot implementation. Rules are described based on their logical association; in some cases, these may need to be expanded into multiple physical rules due to how GitHub events and/or Actions work. Likewise, concepts and flow are intentionally kept abstract and do not intend to describe a specific implementation. An example of this is the different data integrations. Multiple data items may drive from a single source, but they constitute different logical concepts.

Nomenclature

Trigger: An event that occurs in GitHub to which automation should respond to.
Target: An item in GitHub that can trigger an event. Generally, this will be an issue or pull request.

Text Analytics: Client API Evolution Thoughts

Currently the TextAnalytics client offers a set of bespoke methods for each operation that developers may wish to perform. These methods are associated with the operation by name, intending to allow operations to be discoverable when browsing code completion lists and organizing related methods into groups.

This pattern has worked well with the Text Analytics REST API, which offered roughly seven core analysis skills. As Text Analytics moves to the Unified Language API, it is expected that the number of skills offered will grow steadily. This growth may cause the number of bespoke methods to become burdensome for developers, necessitating an evolution of the client API.

Things to know before reading

The names used in this document are intended for illustration only. The names for Text Analytics skills are placeholders to simulate volume and do not reflect the actual service API.

Text Analytics: Unified Language REST API Support

Historically, Text Analytics has existed as a dedicated REST service in Azure, managed and evolved independently from other Cognitive Services offerings. This has led to developers working directly with the REST APIs having to learn unique locations, structure, and usage patterns for the different services, despite them having a similar goal of analyzing language-related aspects of documents.

Going forward, the Cognitive Services teams are consolidating REST APIs with related functionality into a single REST service. In the case of Text Analytics, service functionality is moving to a new unified language service. The API of the new unified language service introduces changes both structurally and behaviorally, making it incompatible with the API offered by the stand-alone Text Analytics REST service.

In order to support the unified language service, the client libraries will need to determine an approach able to accommodate the new REST API without in

Event Hubs: Checkpoint Store Proposal

Despite being a key requirement for extending EventProcessor<T>, no abstraction exists for processor storage operations. Developers wishing to extend the processor must implement the storage operations required by its abstract members, holding responsibility for ensuring a production-ready implementation. They must also infer what storage operations may be needed by the application which the processor does not use - such as writing checkpoints that the processor consumes - and provide an implementation for them. This places a burden on developers and introduces a barrier of entry for extending EventProcessor<T>.

Things to know before reading

The names used in this document are intended for illustration only. Some names are not ideal and will need to be refined during discussions.
Some details not related to the high-level concept are not illustrated; the scope of this is limited to the high level shape and paradigms for the feature area.

Event Hubs: Next Steps for Idempotent Publishing in the `EventHubProducerClient`

The "idempotent publishing" feature in the Event Hubs client library was introduced as a means to help reduce the potential for duplication when publishing events using the EventHubProducerClient. It appeared in several beta packages, starting in September of 2020 and was last available in March of 2021.

Primarily driven by a desire for parity with Kafka, the feature was built on service infrastructure created for the Event Hubs Kafka head. Its API and user experience are heavily influenced by Kafka's approach, which is centered around their buffered producer model. As a result, the feature is not well-suited to the EventHubProducerClient and has significant potential for causing customer confusion and providing a poor development experience.

The major concerns are:

The concept of “idempotent” in this context does not meet developer expectations.

Scenario

This scenario is based on this question on Stack Overflow, slightly modified to improve the failure cases.

A WinForms application is collecting telemetry from some number of IoT devices deployed on company delivery trucks. Collection is accompanied by some light data transformations, and then publishing them to Event Hubs.

Because there is a consistent flow of telemetry from the devices, the application is performing the collection and the processing in parallel. To maximize throughput, the application is prioritizing batch density and is not concerned with accounting for any drops in telemetry collection that would benefit from flushing partial batches, other than when the collected telemetry queue is fully empty.

Event Publishing Code

Streaming Producer: Partition Assignment and Batching

Goal

Allow developers to enqueue events to be efficiently published without being burdened with managing batches or needing a deep understanding of Event Hubs partitioning. Publishing behavior should be understandable and performance consistent regardless of the pattern of events being enqueued.

Constraints

The order of events when published must match the order that they were enqueued, unless developers specifically opt-out of ordering to achieve higher throughput via concurrency.

	private static BufferListStream Serialize(AmqpMessage message, int bufferSizeBytes = 4096)
	{
	var buffers = new List<ArraySegment<byte>>();
	var more = true;

	while (more)
	{
	var messageBuffers = message.GetPayload(bufferSizeBytes, out more);

	if (messageBuffers != null)

	// This is the convenience layer that customers interact with.
	// This should look almost exactly like the current API:
	// https://apiview.dev/Assemblies/Review/2d2a87eaa70a43b8860f1c4e7135494b
	//
	// Because it isn't interacting with the swagger types, we probably need to clone the
	// current implementation and use it as template.
	//
	public class TextAnalyticsClient
	{
	TransportClient _transport = this.serviceVersion switch

Jesse Squire jsquire

Schema Registry: JSON Serializer

Business impact

Azure SDK Repository Automation Rules

Goal

Nomenclature

Text Analytics: Client API Evolution Thoughts

Things to know before reading

Text Analytics: Unified Language REST API Support

Event Hubs: Checkpoint Store Proposal

Things to know before reading

Event Hubs: Next Steps for Idempotent Publishing in the EventHubProducerClient

Scenario

Event Publishing Code

Streaming Producer: Partition Assignment and Batching

Goal

Constraints

Event Hubs: Next Steps for Idempotent Publishing in the `EventHubProducerClient`