A distributed trace is a set of events, triggered as a result of a single logical operation, consolidated across various components of an application. A distributed trace contains events that cross process, network and security boundaries. A distributed trace may be initiated when someone presses a button to start an action on a website - in this example, the trace will represent calls made between the downstream services that handled the chain of requests initiated by this button being pressed.
OpenTelemetry Tracing API is a very strict contract that enables tracing signal (not debugging or profiling). This contract is the same for all kinds of libraries and tracing backends and includes several related concepts:
- span creation, required and optional properties
- sampling
- exporting
- noop behavior in absence of tracing implementaiton
- extra information certain types spans should include (e.g. spans for http calls).
DiagnosticSource contract is loose. It lets library and listener together decide when Activities should be created, which information should be passed, how to sample, how to convert rich payloads into exportable telemetry events, etc. DiagnosticSource can be used for basic tracing or deeper profiling.
Activity is inspired by OpenTracing (OpenTelemetry parent) and is very similar to Span.
Open Telemetry | .NET Analog | Comments |
---|---|---|
Tracer | Diagnostic Source | Somewhat similar Tracer is used to create spans and carry common components (exporter, sampler, etc). DiagnosticSource notifies listener about Activities and allows to control their flow. |
Span | Activity | Similar Represents logical operation |
SpanContext | Properties on Activity (TraceId, SpanId, etc) | Context that is propagated over the wire |
SpanBuilder | Set* Methods on the Activity | Helper that configures span/activity properties |
Sampler | - | Configurable algorithm that makes sampling decision on span when it is being started. |
Exporter | - | Configurable pipeline to deliver data from the user process to the tracing backend of choice |
Propagation formats | Activity.Id implements http propagation, users/libs can leverage ActivityTraceId/SpanId for binary propagation | protocol-specific encoding for context |
Activity | Span | Comments | Importance |
---|---|---|---|
OperationName | Name | Different. For Activity it is DiagnosticSource event prefix (like HttpIn), for OT this is more specific to the event like http path | High, this is used in every instrumentation and listener, Azure SDK uses it |
Current | tracer.CurrentSpan | ||
Parent | - | Reference to parent if it is in-proc | |
ParentId | - | Unique parent Activity Id (encoded for HTTP propagation) | |
Id | - | Unique Activity Id encoded for HTTP propagation | |
RootId | TraceId | legacy Id that is common for all logical operation in the same trace | |
TraceStateString | Span.SpanContext.Tracestate | Tracestate as per W3C trace-context spec | |
Tags | Attributes | Key-value pairs that augment Activity/Span like http url. Difference Activity supports string, string, OT supports string keys and string, long, double, bool values | Low |
Baggage | DistributedContext API (not even tracing) | Context that propagates to other services | High. 1P users have high interest in this |
- | Status | The result of operation: Success, failure, failure kind | High, this is used in every instrumentation and listener, Azure SDK uses it |
- | Links | Represent relationship to multiple other span trees (useful in batch processing scenarios) | High Azure SDK (EventHub, ServiceBus) need it, Azure services need it for batchin scenarios |
- | Events | Additional events that happen in scope of span (receiving chunk of data, or attaching a log message) | Low |
- | Kind | Useful for UX: service for incoming request, client for outgoing, internal for logical operations. | High, this is used in every instrumentation and listener, Azure SDK uses it |
TraceId | Context.TraceId | Same. Trace Id as per W3C trace-context spec | |
SpanId | Context.SpanId | Same. Span Id as per W3C trace-context spec | |
ParentSpanId | ParentSpanId | Same. Span Id of the parent Span/Activity | |
Recorded | IsRecordingEvents | Same. Indicates if Activity/Sapn is sampled in or out | |
ActivityTraceFlags | Context.TraceOptions | Same. Trace flags as per W3C trace-context spec | |
Duration | Duration | Same | |
StartTimeUtc | StartTime | Same | |
APIs to control Id format | - |
- OpenTelemetry: When span ends, it is automatically scheduled for exporting. Library does not need to call anything else.
- Activity/DiagnosticSource: It is library responsibility to accompany each Activity start/stop with DiagnosticSource event.
- OpenTelemetry: Library code does not know or care if user enabled instrumentation. OpenTelemetry is noop if user did not bring the implementation package and enables tracing if user did bring it.
- Activity/DiagnosticSource: It is library responsibility to understand if there is a listener and if event is enabled and if this request is interesting, etc and behave differently
- OpenTelemetry: When span starts, OpenTelemetry makes sampling decision. Sampler is configurable by user or library. Library may check for Span sampling decision and augment span with more information only if it is sampled in. OpenTelemetry does some internal optimizations for not-sampled spans (e.g. they are not sent for exporting). Typical sampling algorithms are part of the contract and consitent per each span in the same trace.
- Activity/DiagnosticSource: Sampling flag is available on Activity, but there is no other contract (when, who, how, etc). Setting/updating it and making sampling decisions is the listener responsibility.
- OpenTelemetry: implicit propagation is a choice. By default, propagation in explicit.
- Activity/DiagnosticSource: propagation is always implicit. There is no choice (unless someone wants to hack it).
- OpenTelemetry: library defines which information to provide and sets attributes on span as string-string (long, bool, double) pairs. It should follow common conventions for well-known things like HTTP, gRPC or DB calls. No rich payloads are involved.
- Activity/DiagnosticSource: Library gives listener everything it knows: requests/responses payloads and leaves it up to listener to extract what it needs. Library can also set Tags on Activity to stamp information that everyone needs and leave it up to listener to decide whether they need anything else.
Check out Contract difference section for more details. Is DiagnosticSource a good way to instrument a library? (It seems Activity is). How strict we want contract to be?
Activity and Span are the same. Can we avoid having OT Spans in the first place?
Review OT APIs (at least those that should survive after p1 and p2) and influence good design and room for perf improvements now
- decouple Activity.OperationName from DiagnostiSource events
- add Activity.Status
- add Links
- decouple addition context propagation from Activity
- Events
- long, bool, double attributes
Thank you, very good content.
Have you got the answer for this question?