I've had many people ask me questions about OpenTracing, often in relation to OpenZipkin. I've seen assertions about how it is vendor neutral and is the lock-in cure. This post is not a sanctioned, polished or otherwise muted view, rather what I personally think about what it is and is not, and what it helps and does not help with. Scroll to the very end if this is too long. Feel free to add a comment if I made any factual mistakes or you just want to add a comment.
OpenTracing is documentation and library interfaces for distributed tracing instrumentation. To be "OpenTracing" requires bundling its interfaces in your work, so that others can use it to time distributed operations with the same library.
OpenTracing interfaces are targeted to authors of instrumentation libraries, and those who want to collaborate with traces created by them. Ex something started a trace somewhere and I add a notable event to that trace. Structure logging was recently added to OpenTracing, which might increase the potential user base. For example, some use tools that automatically put trace identifiers into their logging context, and when they log messages, they can be co-presented with a distributed trace. Some of these users might switch to using OpenTracing's logging api directly.
You wouldn't usually hear about a library api interface. I have never seen anything like this before. We often hear more about interop-driven api+implementation efforts, such as containerd, http/2, grpc, OpenStack. We've been taught that microservices shouldn't require sharing libraries. So.. why are we hearing about OpenTracing?
Lightstep is the majority contributor to OpenTracing efforts, with Uber as a close second. I wouldn't say they've bet their startup on OpenTracing, but they are the number one reason you hear about it.
Ben is more than your typical Silicon Valley startup founder. First, what is typical? Typical is an intense interest in their cause, usually with a high level of credibility, and an endless stream of meetings to align their work with their mission. Ben is this and more. Ben's credibility is obvious in distributed tracing as he's the primary author of Dapper, the paper many tracing systems were based on or are affected by. I sometimes wonder how he has time to be the lead of Lightstep considering it seems he's dedicating dozens of hours weekly on OpenTracing either directly or via conferences. He was the primary driver of OpenTracing into Cloud Native Foundation, even did a keynote at their recent conference.
Priyanka is also more than your average DevRel (in fact I suspect that's only a small fraction of what she does). She treats the OpenTracing library api as a product, which is why you not only see vigor in community engagement, but also things like blogs on various angles of people interacting with it. I don't know precisely who was behind the fancy new web site, but I suspect Priyanka was involved in that, too. If something in the community runs well, it is likely Priyanka's fault directly or indirectly.
While Uber doesn't do as much marketing of OpenTracing, they do some and are very important to it as a chief implementor and character witness. Uber cofounding OpenTracing and making their apis "opentracing first" answer the a practical side of the credibility story. Having a big brand support you is really helpful when trying to get people to pay attention.
There are an increasing amount of people doing work on OpenTracing, but there's no doubt the lion's share of its initial work came from LightStep and Uber
Lightstep alone have more staff affecting OpenTracing adoption than many other projects have in their entire team. If you look at github and follow the commits, or follow what's under blogs or press releases, you'll notice when there's a burden of work a lightstep staffer is often involved. For example, the Finagle to OpenTracing spike. Eventhough Lyft envoy is currently pinned to Lightstep, you can expect it will be lightstep who do the brunt work to make that OpenTracing compatible .
Uber started working on tracing last year in Zipkin, led by at least Yuri and Kike. Requiring a standard tracing library makes a lot of sense in larger companies like Uber who make their own frameworks. This interest led to them cofounding OpenTracing late dec/early jan. Meanwhile, Uber developed their own tracing system in house (Jaeger) which focuses library api parity with wire format. Along the way, (judging by github), they expanded the team to several. Jaeger is being slowly open sourced now.
There are a number of others who have became involved over time. This blog would be too long to cover each type of contribution. That number is increasing and may eventually change the power balance of OpenTracing to make Lightstep and Uber less influential.
Zipkin doesn't care what libraries you use to instrument your apps as long as the data comes out ok. OpenTracing requires sharing client libraries. The scope of Zipkin includes the tracing system and making sure it works soup-to-nuts. Its interface with tracer is its data format which includes annotations and tags, and little more. OpenTracing's tracer is wider, particularly after Uber added logging into its feature scope. Zipkin publishes a propagation and data format, so that interop can be explained and tested between tracers and systems. OpenTracing leaves interop out of scope although systems like Uber's have compatibility tests between their tracers.
I was involved in OpenTracing at the beginning, and dedicated at least a hundred hours in the beginning of the year. I couldn't afford to spend that much time, and had to scale back or risk killing Zipkin by starvation. Mick from the last pickle took over after that, but now Bas van Beek is the primary bridge between OpenTracing n Zipkin. Uber was originally using Zipkin, but they made their own thing (Jaeger), so are less involved now. There are others with co-interest between the two as well.
zipkin-go-opentracing is the best example of something working, and has a primary goal of being compatible with Zipkin. OpenTracing to Zipkin is the same process as writing a normal zipkin tracer, except you have a translation concern: you adjust or drop data until it fits into Zipkin's model. Almost always, you use the same propagation (X-B3) headers.
I have a few opinions on OpenTracing marketing. I'll use direct quotes from slides to guide them under the "Tracing instrumentation has been too hard" section. I also believe it is too hard, but I think critical points aren't made obvious, and can unintentionally mislead. I'm commenting on all points not because I think they are equally important, just for completeness.
OpenTracing is only a partial solution to vendor lock-in, and in some ways it proliferates it. For example, look carefully at blogs and you'll find that many instrumentation efforts require vendor-specific libraries to access trace data. The combination of things being in and out of scope will require some sort of library lock-in even when OpenTracing is involved. Moreover wire and data interop is out-of-scope. Without consulting the vendor, you cannot know if switching from even one tracer version to another will be a compatible change. So, if lock-in is indeed unacceptable, we need to adjust the approach.
While I agree that "monkey patching" can be difficult to scale, vendors do this routinely with agents (at least in Java). For example, naver pinpoint and instana have agents that implicitly instrument code. In doing so, they can instrument multiple versions of the same library, sometimes in a single class. I'm a fan of explicitness, but there are uses of "monkey patching" (for a certain definition of it), and their cons are well understood.
The gripe I have with this is that the more you expose how you instrument something, the more likely you are to have a library conflict when you change that. In other words, explicit instrumentation is good, but we have to remember that when we increase visibility and audience of an api, changing it becomes far more expensive and difficult.
You can get to common semantics without requiring specific library signatures. For example, the largest value OpenTracing has done in my opinion is define some common terms, and have people polish them. Long story short.. tracing semantics? win regardless! Library available to apply those semantics? Win! Implying you need to share a very new library across your entire infrastructure to make tracing easy? hmmm
That said, we have to remember we are dealing with developers here. Many times, these handoffs are internal libraries even when popular external libraries exist. There will never be one tracing api, and there certainly hasn't been one OpenTracing api. OpenTracing will have a version 2, except very likely it will come much sooner than for example log4j2 did. When that happens, there will be a some handoff needed to translate even between OpenTracing. This is the brown field problem we have when we require sharing libraries.
There are some very real issues now with OpenTracing's design and how hand-off is approached. You can be compatible with opentracing while black-holing data. The formerly ubiquitous "annotation" which was a simple string associated with a timestamp has been removed from OpenTracing's api, replaced with a more complex key/value map logging api. It has become a leaky abstraction which will certainly affect handoff even when OpenTracing is used on both sides. For example, you need to look at side-effects to understand what will get into the system at all.
Moreover, and more to personal experiences, handoff can be helped by translation guides. For example, there's no material in OpenTracing to show how to move from existing systems that predate OpenTracing, such as HTrace, dapper or Zipkin to the OpenTracing api set. This puts burden on other projects who may not have been involved in OpenTracing's decisions or actively disagree with them. Freedom and Responsibility tells me that if we create an api and highly market it, we owe some debt to be responsible for matters like these.
I like the energy and docs around OpenTracing. Most folks are really keen, and engaged. My major gripes are around how expensive the project is in terms of downstream work created by confusion and expanding scope. This will lead to even more incompatible tracing libraries if not watched carefully.
I've sortof discussed above what I feel is difficult about the messaging of OpenTracing, the summary is: It creates a larger demand for OpenTracing, but it also increases confusion in some places particularly around interop. I ack that it is hard to be precise with messaging to vastly different audience types, but I think it can do better.
The simplest features of OpenTracing, starting and finishing a span are well defined and reasonably portable. So are the inject/extract parts which I think really are a separate utility incidentally. The signatures on Span are voluminous (in my opinion) and lead to a poor experience. For example, there are 3 ways of dealing with a map, you can log it, you can add it as tags, or you can add it as baggage. People routinely do this differently, and the impact is bad data you eventually find in your system. Many blackhole the data as they don't know what to do with it. This problem is possible because the OpenTracing api isn't a lowest common denominator, or even dapper, it is a combination of a few related apis.
OpenTracing's api is a work in progress, but since very on it had a logging agenda. Even if some features are similar to logging, formalizing logging, especially structured logging, is too much scope for a standard base api. In very new codebases, it makes sense, but in brown field people use their own logging apis. These apis are pretty stable and have established correlation practice. This is a cheap setup. Routing logging through the tracing subsystem dramatically changes the way things have to work both internally and in the end system. It is not cheap, as it requires re-engineering.
I feel it is better to have a simpler testable implementable api which includes "start, annotate, tag, finish", doesn't require support of nested or "any" types. extensions could be used for sophisticated things or when people literally want to expose their tracing api as a logging api. Those people would be signing up for the expense of integrating such a solution.
Marketing is very powerful, and as more is poured into the brand, the more careful we should be with it. For example, I would like to see marketing be more direct/honest about interop. I would like to see some notion of compatibility or interop required for branding. I'd like to see a smaller, cheaper scope. This should reduce the accidental burden OpenTracing creates.
Gripes mentioned, I'd like to see OpenTracing continue! The documentation and website are far beyond anything I've seen for a library definition. The gitter, events and kind people behind them are pleasant part of my life and particularly newcomers. Even if nothing else changes, I think we are better off that OpenTracing has happened than if it didn't.