otap-dataflow processes telemetry data (logs, metrics, traces) through a pipeline of three stages:
Receiver ──> Processor ──> Exporter ──> Remote destination
(ingests) (transforms) (sends out)
Each stage is connected by a bounded queue — like a fixed-size conveyor belt. When a downstream stage can't keep up, the queue fills up, and the upstream stage is forced to slow down. This is backpressure — the system's way of protecting itself from being overwhelmed.
There is no central traffic controller. Instead, each queue independently enforces its own capacity limit, and pressure ripples backward naturally through the pipeline.
Every connection between pipeline stages uses a queue with a fixed maximum size (default: 128 messages). This is configurable per deployment.
- Queue not full: Data flows through immediately.
- Queue full: The sender pauses until space opens up. No data is dropped — the sender simply waits.
- Queue closed (shutdown): Remaining data is drained before the pipeline stops.
This means the system self-regulates: if a remote destination slows down, the exporter slows down, its input queue fills up, the processor slows down, its input queue fills up, and ultimately the receiver slows down its ingestion rate.
Receivers accept incoming telemetry from external sources (gRPC, HTTP, syslog, etc.) and push it into the pipeline.
Key backpressure behaviors:
| Receiver | What happens when the pipeline is full? |
|---|---|
| OTLP (gRPC/HTTP) | Automatically limits how many requests it accepts at once, matched to pipeline capacity. Excess requests get HTTP 503 (Service Unavailable). |
| OTAP (Arrow streaming) | Caps concurrent requests at 1000 by default. |
| Topic | Tracks how long it's blocked waiting for pipeline space. Logs warnings when blocked > 500ms. |
| Syslog | Batches messages (up to 100 per batch, or every 100ms) before sending downstream. |
The OTLP receiver is the most sophisticated — it auto-tunes its admission rate to match the pipeline's capacity, so it never accepts more data than the pipeline can handle.
Processors sit in the middle. Some are stateless pass-throughs (filtering, routing), while others accumulate data internally.
Processors with internal buffering:
| Processor | What it buffers | How it handles overload |
|---|---|---|
| Batch | Accumulates items until a size or time threshold is met (default: flush every 200ms). | Rejects (NACKs) incoming data when its internal tracking slots are full (1024 inbound / 512 outbound by default). |
| Durable Buffer | Persists data to disk (write-ahead log) before forwarding. Default: up to 10 GB. | When disk is full, either rejects new data (default) or drops the oldest data to make room. Retries failed downstream sends with exponential backoff (1s to 30s). |
| Temporal Reaggregation | Accumulates metrics over a time window (default: 60 seconds). | Tells the engine to stop sending it data when its tracking capacity is nearly full. Resumes when capacity frees up. |
| Fanout | Tracks messages sent to multiple outputs simultaneously. | Tells the engine to stop sending data when 10,000 messages are in-flight. Resumes as acknowledgments arrive. |
| Retry | No internal buffer — just reschedules failed messages. | Exponential backoff: waits 5s, then 7.5s, then 11.25s, etc., up to 30s per attempt and 5 minutes total. |
Stateless processors (Filter, Attributes, Debug, Content Router, Signal Type Router, Log Sampling, Transform) have no internal buffering — they process each message and immediately pass it along.
Exporters send data to remote destinations. They are intentionally lightweight — they handle concurrent dispatch but rely on upstream processors (especially the Durable Buffer) for retry and durability.
| Exporter | Concurrent sends | What happens when the remote is slow? |
|---|---|---|
| OTLP gRPC | Up to 5 RPCs at once (configurable). | Stops accepting new data from the pipeline until an in-flight RPC completes. Failures are NACKed upstream for retry. |
| OTLP HTTP | Up to 5 requests at once (configurable). | Same as gRPC. Classifies errors as retryable (429, 502, 503, 504) or permanent. |
| OTAP (Arrow streaming) | 64-message internal queue per signal type. | Reconnects with exponential backoff (10ms to 10s) on connection failure. |
| Azure Monitor | Up to 16 concurrent uploads (hardcoded). | Pauses intake when at capacity or when authentication token is unavailable. Batches data into compressed payloads (up to 1 MB compressed). |
| Topic | Configurable. | Either blocks until the topic queue has space, or drops the newest message and NACKs upstream. |
By default, when a receiver accepts data from a client, it responds with "success" as soon as the data is placed onto the internal pipeline queue. If something fails downstream (e.g., the exporter can't reach the remote destination), the client never finds out — from its perspective, the send succeeded.
The wait_for_result setting (default: false) changes this behavior:
| Setting | Client sees... | Trade-off |
|---|---|---|
wait_for_result: false |
Immediate success once data is queued internally. Downstream errors are invisible. | Lower latency, higher throughput. Client can't react to failures. |
wait_for_result: true |
Success or failure based on the next pipeline stage's response. | Higher latency (client waits for processing), but the client can retry on failure. |
Important nuance: wait_for_result: true only waits for the immediate downstream component — not the entire pipeline. If the next stage is a Durable Buffer that ACKs after writing to disk, the client gets a response as soon as the disk write succeeds, even if the final export hasn't happened yet.
- OTLP receiver — Yes (independently for gRPC and HTTP)
- OTAP receiver — Yes
- Topic receiver — No
- Syslog receiver — No (connectionless protocol, fire-and-forget by nature)
When wait_for_result is enabled, each client request holds a concurrency slot for the entire processing duration (not just the time to enqueue). This means:
- Under heavy load, slots fill up sooner
- Excess requests are rejected earlier (HTTP 503 / gRPC RESOURCE_EXHAUSTED)
- Clients see backpressure directly and can implement their own retry logic
When disabled, slots turn over quickly (just the enqueue time), so the receiver appears to have more capacity — but failures are hidden from the client.
When the pipeline is shutting down, receivers with in-flight wait_for_result requests:
- Stop accepting new requests
- Keep the connection open for in-flight requests to complete
- If the shutdown deadline expires before all requests complete, remaining clients receive an explicit error (not a hang)
Every piece of data that enters the pipeline is tracked with an acknowledgment (ACK/NACK) system:
- ACK: "I successfully processed/sent this data." Flows backward through the pipeline so each stage knows it can release its reference.
- NACK (transient): "I failed, but it's worth retrying." Upstream processors (like the Durable Buffer or Retry processor) will re-attempt delivery.
- NACK (permanent): "This data is rejected and should not be retried." The data is dropped.
This means data loss only happens in explicitly configured scenarios (e.g., the Durable Buffer's "drop oldest" policy when disk is full).
When the pipeline shuts down:
- Receivers stop accepting new data first — they drain any in-progress requests.
- Only after all receivers confirm they're drained, processors and exporters receive the shutdown signal.
- Processors and exporters drain their remaining queued data before stopping, with a configurable deadline.
This two-phase approach ensures data in the pipeline is flushed rather than dropped.
| Setting | What it controls | Default |
|---|---|---|
policies.channel_capacity.pdata |
Queue size between pipeline stages | 128 messages |
policies.channel_capacity.control.node |
Control message queue per node | 256 |
Batch processor max_batch_duration |
How long to wait before flushing a partial batch | 200ms |
Durable Buffer retention_size_cap |
Maximum disk storage for buffered data | 10 GB |
Durable Buffer size_cap_policy |
What to do when disk is full: backpressure or drop_oldest |
backpressure |
Durable Buffer poll_interval |
How often to check for data to forward | 100ms |
OTLP Receiver max_concurrent_requests |
Max simultaneous inbound requests (0 = auto-tune) | 0 (auto) |
Receiver wait_for_result |
Whether to hold the client request open until the next stage ACKs/NACKs | false |
Exporter max_in_flight |
Max concurrent outbound requests | 5 (gRPC/HTTP) |
-
No data is silently dropped. Backpressure causes senders to slow down, not discard data. Data loss only occurs through explicit policy (e.g.,
drop_oldest). -
The system is self-regulating. If any stage slows down, pressure automatically propagates backward to the source, slowing ingestion to match throughput.
-
Exporters are thin; durability lives in processors. Exporters focus on efficient dispatch. The Durable Buffer processor provides disk-backed persistence and retry logic.
-
The OTLP receiver is the smartest ingress point. It auto-tunes its admission rate to match downstream capacity, rejecting excess requests at the transport layer before they consume memory.
-
Everything is configurable. Queue sizes, concurrency limits, retry intervals, and capacity policies can all be tuned per deployment.