This doc compares the capabilities of popular telemetry sampling systems. The dimensions that are compared:
Temporal resolution: The time range that the limiting occurs on. E.g., limit the number of...
- Spans per second
- Spans per calendar month
Degree of limiting: In a steady state with spans created at a rate R span/s that is greater than the desired limit,
- hard limiting: throughput = limit
- soft limiting: E[throughput] = limit
Horizontally scalable: Is the desired limit enforced per-sampler, or is it a global limit?
- Yes: Global
- No: Per-sampler
Responsiveness: How quickly does the system return to steady state when perturbed (i.e., when R changes)?
Supports statistical estimation: Modifies span metadata such that post hoc analysis can compute unbiased estimates from the data ("count the spans").
- Yes
- No
- supports estimation: No
- limiting:
- temporal resolution: Spans per second
- degree of limiting: Hard
- horizontally scalable: No
- responsiveness: < 1 s (token buckets are replenished each second)
The tailsampling processor implements a ratelimiting
policy (src) equivalent to a token bucket with capacity of spans_per_second
many tokens, replenished every second. Sampling a trace costs trace.SpanCount
many tokens. Support for updating span p-values has been requested in #7962.
It also has a composite
policy which is characterized by a sequence of sub-policies, each of which are subject to individual token bucket limiting. Each bucket's capacity is computed as a share of an overall max_total_spans_per_second
, but otherwise the decisions are identical to those done by ratelimiting
(src).
Takes a concept of "allocating bandwidth" (span throughput) to different families of traces. See design doc linked from open-telemetry/opentelemetry-collector-contrib#1410.
If there's more than one otelcol instance in the system, in order to guarantee complete traces you need to somehow guarantee that all spans in a given trace are routed to a given otelcol instance. One way to do that is with the loadbalancing exporter.
References
- open-telemetry/opentelemetry-collector-contrib#4758
- aggregate processor, described in https://grafana.com/blog/2020/06/18/how-grafana-labs-enables-horizontally-scalable-tail-sampling-in-the-opentelemetry-collector/
- Issue associated with the loadbalancing exporter
- Issue associated with the tailsampling processor's
composite
policy
- supports estimation: No
- limiting (
sampler.type == 'ratelimiting'
):- temporal resolution: Traces per second
- degree of limiting: Hard
- horizontally scalable: No
- responsiveness: < 1 s (token buckets are replenished each second)
- limiting (
SAMPLING_CONFIG_TYPE == 'adaptive'
)- temporal resolution: Traces per second
- degree of limiting: Soft (typically) or none (if data is generated at a high enough volume for
--sampling.min-sampling-probability
to overtake--sampling.target-samples-per-second
) - horizontally scalable: Yes
- responsiveness: Configurable (at most jaeger-client's polling interval + jaeger-collector's
--sampling.calculation-interval
)
Jaeger SDKs (jaeger-client) get sampling policy various ways:
- local: hardcoded
AlwaysOn
,AlwaysOff
,probability
(static p),ratelimiting
(token bucket, parameter: maximum samples per sec). No stratification. - remote,
file
: per-stratumprobability
orratelimiting
. jaeger-collector reloads from filesystem or URL; clients polls jaeger-agent, who proxies requests to jaeger-collector. - remote,
adaptive
: each stratum as a target throughput + some minimums. jaeger-collector maintains policy based on spans it's received; client polls jaeger-agent, who proxies requests to jaeger-collector. - First two options use local memory for
ratelimiting
. Third option has cluster-level coordination. - Spans are stratified by a list of priority-ordered rules: (Service name, Span name) > Span name default > (Service name) > global default.
- In
adaptive
, many jaeger-collectors write strata statistics to shared memory. From this data, every jaeger-collector can independently calculate the whole-system stats needed to adjust sampling probabilities. A collector reads statistics (from a configurable number of epochs back; 1 by default), combines them to get whole-cluster strata stats, and recalculates new per-strata sampling probabilities. Defaults:- stratum sampling probability: initial (1 in 1,000), minimum (1 in 100,000)
- stratum throughput: target (1 /s), minimum (1 /min)
- Because collectors receive spans, clients don't need to explicitly send statistics themselves (contrast w/ X-Ray, whose sampling and collection APIs are independent)
- supports estimation: No
- limiting:
- temporal resolution: Traces per second
- degree of limiting: Soft
- horizontally scalable: Yes
- responsiveness: < 10 s (token buckets are replenished via GetSamplingTargets requests, which occur every 10 s by default)
Each actor performing sampling sends statistics to a central API describing how many spans it's seen in a period. At least two SDKs (Java, Go) have contrib Sampler
implementations that obtain sampling configuration from AWS X-Ray. Like Jaeger's adaptive
remote sampling, X-Ray serves advisory sampling policies to clients. An X-Ray based sampling system behaves like so (on average):
- Define a rule as a triple: a predicate over span attributes, a token bucket (e.g.), and a number in [0, 1] called the rule's fixed rate.
- Define the global sampling policy as an ordered collection of rules.
- Given a root span in need of a sampling decision,
- Match the span to the first rule whose predicate it satisfies.
- If the token bucket contains at least 1 token, deduct 1 token from the bucket and sample the span and its descendants.
- Else, sample with probability equal to the matched rule's fixed rate.
Docs refer to "reservoirs", which are per-rule token buckets: https://github.com/open-telemetry/opentelemetry-java-contrib/blob/42818333e243682bb50e510f4f91381016f61f71/aws-xray/src/main/java/io/opentelemetry/contrib/awsxray/SamplingRuleApplier.java#L272. Actors doing sampling are dynamically allotted portions of the desired reservoir size (token bucket capacity) called ReservoirQuota
in the GetSamplingTargets API response (docs).
References:
- supports estimation: Yes, via span attribute
SampleRate
value = N in "1-in-N" (feature request to support p-value here) - limiting (
EMADynamicSampler
):- temporal resolution: Spans per second
- degree of limiting: Soft
- horizontally scalable: No (limiting is per Refinery node)
- responsiveness: Configurable as
AdjustmentInterval
- limiting (
TotalThroughputSampler
):- temporal resolution: Spans per second
- degree of limiting: Hard
- horizontally scalable: No (limiting is per Refinery node)
- responsiveness: Configurable as
ClearFrequencySec
Horizontally scales by forwarding spans to the appropriate node as necessary. The node which ought to handle a given trace is determined via consistent hashing of trace ID (src). Peers are discovered via either Redis or specified in Refinery's configuration file (docs).
Not set-it-and-forget-it: as one's system's rate of telemetry production increases over time, either GoalSampleRate
or their Honeycomb events-per-month quota will need to be adjusted.
- limiting: Support all of both spans per second, spans per month, GB per month (approximated)
- degree of limiting: Soft is ok
- horizontally scalable: Yes
- Prioritize tail sampling in Collector over head sampling in SDK
- Strive for a configuration that is "set it and forget it" (notwithstanding ad hoc changes to aid in investigation or incident response)
Ah, I didn't know this. Apologies. Haven't used Jaeger myself so all I'm piecing together is primarily from docs. Where is it stored?
probabilistic
sampler stores its sampling probability (0..1) on span tagsampler.param
.adaptive
configuration, do the sampling configs served by jaeger-collectors direct jaeger-clients to useprobabilistic
samplers?