Adrian Cole codefromthecrypt

I've had many people ask me questions about OpenTracing, often in relation to OpenZipkin. I've seen assertions about how it is vendor neutral and is the lock-in cure. This post is not a sanctioned, polished or otherwise muted view, rather what I personally think about what it is and is not, and what it helps and does not help with. Scroll to the very end if this is too long. Feel free to add a comment if I made any factual mistakes or you just want to add a comment.

So, what is OpenTracing?

OpenTracing is documentation and library interfaces for distributed tracing instrumentation. To be "OpenTracing" requires bundling its interfaces in your work, so that others can use it to time distributed operations with the same library.

So, who is it for?

OpenTracing interfaces are targeted to authors of instrumentation libraries, and those who want to collaborate with traces created by them. Ex something started a trace somewhere and I add a notable event to that trace. Structure logging was recently added to O

I have heard a number of APMs create "spans" (distributed tracing lingo for an operation) and aggregate them for reasons like latency metrics.

In a way, Zipkin does this. The ever popular service dependency diagram is an aggregated view of parent/child links between services with the number of calls between them added for color.

The biggest issue with using a tracing api to back metrics is that most of the time, tracing is sampled (like 1 out of 1000). Sampling is done to reduce costs or prevent a surge of traffic from taking out the

I've been party to some pretty interesting conversations around Zipkin's ID format and UUIDs. Seems interesting to share, since some of this stuff is neat, if geeky.

So, we love UUIDs! Many people use UUID format to pass around things unlikely to ever clash. These are used for long-term retrieval and other handy things. A lot of correlation IDs are UUIDs, too.

What about distributed tracing? Trace IDs should be UUID, too! To this I answer.. maybe?

Firstly, there are a lot of systems inspired by the dapper paper which hints how

The number one problem to solve in tracing is propagation, at least carrying trace identifiers from one side of your process to the other. Usually, some thread local state is involved with this. In I'd say most libraries apart from OpenTracing, this is a separate functionality than timing things. For example, you put variables in scope with one api and you do timing with another. It might look like this.

span = startSpan();
try (Scope scope = propagation.scope(span.ids())){
  // work here can see the span ids via something like Trace.ids() 
} finally {

	I made this with https://textik.com in attempts to explain how service naming work. Since english is
	second language for many coders, diagrams help. In this case, achieved understanding in <20 minutes!


	The tracer on Server A starts a trace since is doesn't read
	headers or anything from the browser's request. In zipkin
	this has annotation "sr" with the serviceName "loginService"
	-\
	-\ Server A has role loginService, so its tracer is named that.
	Browser -\+--------------+

	/**
	* This is an example of a one-way or "messaging span", which is possible by use of the {@link
	* Span#flush()} operator.
	*
	* <p>Note that this uses a span as a kafka key, not because it is recommended, rather as it is
	* convenient for demonstration, since kafka doesn't have message properties.
	*
	* <p>See https://github.com/openzipkin/zipkin/issues/1243
	*/
	public class KafkaExampleIT {

	┌──────────────────────────────┐ ┌──────────────────────────────┐
	│ Tracer: Web │ │ Tracer: App │
	│ │ │ │
	└──────────────────────────────┘ │ └──────────────────────────────┘
	────────────────────────────────────┴───────────────────────────────────

	┌ ─ ─ ─ ─ ─ ─ ─ ─ ┐ ┌─────────────────────────────┐
	(1) │(1) Web tracer starts a trace│
	│ sr │ │ since no headers exist │
	─────────▶ └─────────────────────────────┘

	HttpServletRequest httpRequest = (HttpServletRequest) request;
	TraceContextOrSamplingFlags contextOrFlags = contextExtractor.extract(httpRequest);
	Span span = contextOrFlags.context() != null
	? tracer.joinSpan(contextOrFlags.context())
	: tracer.newTrace(contextOrFlags.samplingFlags());

	# substitute 2017-01-18 for the correct date and foo and bar for the two services names you're interested in
	curl -s localhost:9200/zipkin-2017-01-18/span/_search -d'{
	"query": {
	"bool": {
	"should": [
	{
	"nested": {
	"path": "annotations",
	"query": {
	"bool": {

	# this returns all of the annotation values, not just the http.url
	# for more.. https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-terms-aggregation.html
	curl -s 'localhost:9200/zipkin-*/span/_search' -d'{
	"_source": false,
	"aggs": {
	"binaryAnnotations": {
	"nested": {
	"path": "binaryAnnotations"
	},
	"aggs" : {