OpenTelemetry Tracing APIs vs .NET Activity/DiagnosticSource

Raw

Tracing API Comparison

A distributed trace is a set of events, triggered as a result of a single logical operation, consolidated across various components of an application. A distributed trace contains events that cross process, network and security boundaries. A distributed trace may be initiated when someone presses a button to start an action on a website - in this example, the trace will represent calls made between the downstream services that handled the chain of requests initiated by this button being pressed.

Contract difference

OpenTelemetry Tracing API is a very strict contract that enables tracing signal (not debugging or profiling). This contract is the same for all kinds of libraries and tracing backends and includes several related concepts:

span creation, required and optional properties
sampling
exporting
noop behavior in absence of tracing implementaiton
extra information certain types spans should include (e.g. spans for http calls).

DiagnosticSource contract is loose. It lets library and listener together decide when Activities should be created, which information should be passed, how to sample, how to convert rich payloads into exportable telemetry events, etc. DiagnosticSource can be used for basic tracing or deeper profiling.

Activity is inspired by OpenTracing (OpenTelemetry parent) and is very similar to Span.

Open Telemetry	.NET Analog	Comments
Tracer	Diagnostic Source	Somewhat similar Tracer is used to create spans and carry common components (exporter, sampler, etc). DiagnosticSource notifies listener about Activities and allows to control their flow.
Span	Activity	Similar Represents logical operation
SpanContext	Properties on Activity (TraceId, SpanId, etc)	Context that is propagated over the wire
SpanBuilder	Set* Methods on the Activity	Helper that configures span/activity properties
Sampler	-	Configurable algorithm that makes sampling decision on span when it is being started.
Exporter	-	Configurable pipeline to deliver data from the user process to the tracing backend of choice
Propagation formats	Activity.Id implements http propagation, users/libs can leverage ActivityTraceId/SpanId for binary propagation	protocol-specific encoding for context

Detailed Activity vs Span delta

Activity	Span	Comments	Importance
OperationName	Name	Different. For Activity it is DiagnosticSource event prefix (like HttpIn), for OT this is more specific to the event like http path	High, this is used in every instrumentation and listener, Azure SDK uses it
Current	tracer.CurrentSpan
Parent	-	Reference to parent if it is in-proc
ParentId	-	Unique parent Activity Id (encoded for HTTP propagation)
Id	-	Unique Activity Id encoded for HTTP propagation
RootId	TraceId	legacy Id that is common for all logical operation in the same trace
TraceStateString	Span.SpanContext.Tracestate	Tracestate as per W3C trace-context spec
Tags	Attributes	Key-value pairs that augment Activity/Span like http url. Difference Activity supports string, string, OT supports string keys and string, long, double, bool values	Low
Baggage	DistributedContext API (not even tracing)	Context that propagates to other services	High. 1P users have high interest in this
-	Status	The result of operation: Success, failure, failure kind	High, this is used in every instrumentation and listener, Azure SDK uses it
-	Links	Represent relationship to multiple other span trees (useful in batch processing scenarios)	High Azure SDK (EventHub, ServiceBus) need it, Azure services need it for batchin scenarios
-	Events	Additional events that happen in scope of span (receiving chunk of data, or attaching a log message)	Low
-	Kind	Useful for UX: service for incoming request, client for outgoing, internal for logical operations.	High, this is used in every instrumentation and listener, Azure SDK uses it
TraceId	Context.TraceId	Same. Trace Id as per W3C trace-context spec
SpanId	Context.SpanId	Same. Span Id as per W3C trace-context spec
ParentSpanId	ParentSpanId	Same. Span Id of the parent Span/Activity
Recorded	IsRecordingEvents	Same. Indicates if Activity/Sapn is sampled in or out
ActivityTraceFlags	Context.TraceOptions	Same. Trace flags as per W3C trace-context spec
Duration	Duration	Same
StartTimeUtc	StartTime	Same
APIs to control Id format	-

Key difference in behavior

Notifications

OpenTelemetry: When span ends, it is automatically scheduled for exporting. Library does not need to call anything else.
Activity/DiagnosticSource: It is library responsibility to accompany each Activity start/stop with DiagnosticSource event.

Noop vs tracing

OpenTelemetry: Library code does not know or care if user enabled instrumentation. OpenTelemetry is noop if user did not bring the implementation package and enables tracing if user did bring it.
Activity/DiagnosticSource: It is library responsibility to understand if there is a listener and if event is enabled and if this request is interesting, etc and behave differently

Sampling

OpenTelemetry: When span starts, OpenTelemetry makes sampling decision. Sampler is configurable by user or library. Library may check for Span sampling decision and augment span with more information only if it is sampled in. OpenTelemetry does some internal optimizations for not-sampled spans (e.g. they are not sent for exporting). Typical sampling algorithms are part of the contract and consitent per each span in the same trace.
Activity/DiagnosticSource: Sampling flag is available on Activity, but there is no other contract (when, who, how, etc). Setting/updating it and making sampling decisions is the listener responsibility.

Implicit vs explicit context propagation

OpenTelemetry: implicit propagation is a choice. By default, propagation in explicit.
Activity/DiagnosticSource: propagation is always implicit. There is no choice (unless someone wants to hack it).

Augmenting spans

OpenTelemetry: library defines which information to provide and sets attributes on span as string-string (long, bool, double) pairs. It should follow common conventions for well-known things like HTTP, gRPC or DB calls. No rich payloads are involved.
Activity/DiagnosticSource: Library gives listener everything it knows: requests/responses payloads and leaves it up to listener to extract what it needs. Library can also set Tags on Activity to stamp information that everyone needs and leave it up to listener to decide whether they need anything else.

Suggested areas to focus on

1. [P0] Strict vs loose contract

Check out Contract difference section for more details. Is DiagnosticSource a good way to instrument a library? (It seems Activity is). How strict we want contract to be?

2. [P0] OT Span vs .NET Activity

Activity and Span are the same. Can we avoid having OT Spans in the first place?

3. [P1] OT APIs usability and perf

Review OT APIs (at least those that should survive after p1 and p2) and influence good design and room for perf improvements now

4. [P1] Extend Activity APIs

decouple Activity.OperationName from DiagnostiSource events
add Activity.Status
add Links

5. [P1] Activity Baggage vs OT Distributed Context

decouple addition context propagation from Activity

6. [P2] Other Activity vs Span API diff

Events
long, bool, double attributes

7. [P2] Metrics API vs .NET Event Counters

Raw

1_Tracing_Examples.md

Basic scenario example

Span

Let's assume library creates this span. Library depends on OpenTelemetry.Abstractions package.

private static ITracer tracer = Tracing.Tracer;

public static void BasicSpan()
{
    var span = tracer
        .SpanBuilder("my span") // set span name and other properties
        .StartSpan(); 
     
    using (var scope = tracer.WithSpan(span))
    {
        // do stuff
        Console.WriteLine(tracer.CurrentSpan.Context.TraceId);
    }
    
    span.End();
}

Basic Activity with comparable features

private static readonly diagnosticListener = new DiagnosticListener("test");

public static void BasicActivityWithEvents()
{
    Activity activity = null;
    if (diagnosticListener.IsEnabled())
    {
        if (diagnosticListener.IsEnabled("my activity"))
        {
            activity = new Activity("my activity");
            diagnosticListener.StartActivity(activity, null);
        }

        // do stuff

        if (activity != null)
        {
            diagnosticListener.StopActivity(activity, null);
        }
    }
    else
    {
        // do stuff
    }
}

Exporting spans

Exporting API is subject to change.

static async Task Main(string[] args)
{
    Tracing.SpanExporter.RegisterHandler("ConsoleExporter", new ConsoleExporter());

    BasicSpan();
}

class ConsoleExporter : IHandler
{
    public Task ExportAsync(IEnumerable<SpanData> spanDataList)
    {
        foreach (var span in spanDataList)
        {
            Console.WriteLine($"[{span.StartTimestamp:o}] Exporting span={span.Name}, duration={span.EndTimestamp - span.StartTimestamp}, status={span.Status} with context traceId={span.Context.TraceId} spanId={span.Context.SpanId} parentId={span.ParentSpanId}");
        }

        return Task.CompletedTask;
    }
}

Typical instrumentation

Let's add some attributes, propagate context over the wire and set status.

Span

public static void TypicalSpan()
{
    var span = tracer
        .SpanBuilder("my span")
        .SetSpanKind(SpanKind.Client)
        .StartSpan();

    using (var _ = tracer.WithSpan(span))
    {
        if (span.IsRecordingEvents)
        {
            span.SetAttribute("component", "example");
            span.SetAttribute("target", "my-service");
        }

        try
        {
            // this is noop check
            if (span.Context.IsValid)
            {  
                tracer.TextFormat.Inject(
                    span.Context,
                    message,
                    (msg, headerName, headerValue) => msg[headerName] = headerValue);
            }
            
            // send message
        }
        catch (Exception e)
        {
            span.Status = Status.Unknown.WithDescription(e.ToString());
            throw;
        }
        finally
        {
            span.End();
        }
    }
}

Activity

public static void TypicalActivity()
{
    var diagnosticListener = new DiagnosticListener("test");

    Activity activity = null;
    if (diagnosticListener.IsEnabled())
    {
        if (diagnosticListener.IsEnabled("my activity"))
        {
            activity = new Activity("my activity");
            activity.AddTag("component", "example");
            activity.AddTag("target", "my-service");
            diagnosticListener.StartActivity(activity, new {Message = message});
        }

        bool result = true;
        try
        {
            if (activity != null)
            {
                message["traceparent"] = activity.Id;
            }
            // send message
        }
        catch (Exception e)
        {
            result = false;
            diagnosticListener.Write("exception", new {Exception = e, Message = message});
            throw;
        }
        finally
        {
            if (activity != null)
            {
                diagnosticListener.StopActivity(activity, new {Message = message, Result = result});
            }
        }
    }
    else
    {
        // do stuff
    }
}

Links

static async Task ReadAndProcessAsync(PartitionReceiver eventHubReceiver)
{
    var tracer = Tracing.Tracer;

    while (true)
    {
        IEnumerable<EventData> messages = await eventHubReceiver.ReceiveAsync(5);

        var builder = tracer.SpanBuilder("process message");
        foreach (EventData message in messages)
        {
            builder.AddLink(message.ExtractActivity());
        }

        var span = builder.StartSpan();
        foreach (EventData message in messages)
        {
            // process messages
        }
        span.End();
    }
}

Raw

2_Other_OT_APIs.md

Metrics API

OpenTelemetry allows to record raw measurements or metrics with predefined aggregation and set of labels.

Recording raw measurements using OpenTelemetry API allows to defer to end-user the decision on what aggregation algorithm should be applied for this metric as well as defining labels (dimensions). It will be used in client libraries like gRPC to record raw measurements "server_latency" or "received_bytes". So end user will decide what type of aggregated values should be collected out of these raw measurements. It may be simple average or elaborate histogram calculation.

Recording of metrics with the pre-defined aggregation using OpenTelemetry API is not less important. It allows to collect values like cpu and memory usage, or simple metrics like "queue length".

Raw metrics collection is similar to .NET PerformanceCounters or EventCounter

Distributed Context API

Labels other telemetry (metrics, traces) with user-defined context that flows across process boundaries.

This is similar to Activity.Baggage, but not related to tracing, i.e. could be used with metrics only. Another analog is ILogger scopes, but distributed

Resources API

Resource captures information about the entity for which telemetry is recorded. For example, metrics exposed by a Kubernetes container can be linked to a resource that specifies the cluster, namespace, pod, and container name.

Resource may capture an entire hierarchy of entity identification. It may describe the host in the cloud and specific container or an application running in the process.

Logging API

Future. Probably does not make much sense in .NET.

vanwx commented Jul 17, 2023

Thank you ❤️

lmolkova/0_Tracing_API.md

Tracing API Comparison

Contract difference

Detailed Activity vs Span delta

Key difference in behavior

Notifications

Noop vs tracing

Sampling

Implicit vs explicit context propagation

Augmenting spans

Suggested areas to focus on

1. [P0] Strict vs loose contract

2. [P0] OT Span vs .NET Activity

3. [P1] OT APIs usability and perf

4. [P1] Extend Activity APIs

5. [P1] Activity Baggage vs OT Distributed Context

6. [P2] Other Activity vs Span API diff

7. [P2] Metrics API vs .NET Event Counters

Basic scenario example

Span

Basic Activity with comparable features

Exporting spans

Typical instrumentation

Span

Activity

Links

Metrics API

Distributed Context API

Resources API

Logging API

vanwx commented Jul 17, 2023

Uh oh!