Skip to content

Instantly share code, notes, and snippets.

@codefromthecrypt
codefromthecrypt / opentracing-zipkin.md
Last active October 27, 2021 01:44
My ramble on OpenTracing (with a side of Zipkin)

I've had many people ask me questions about OpenTracing, often in relation to OpenZipkin. I've seen assertions about how it is vendor neutral and is the lock-in cure. This post is not a sanctioned, polished or otherwise muted view, rather what I personally think about what it is and is not, and what it helps and does not help with. Scroll to the very end if this is too long. Feel free to add a comment if I made any factual mistakes or you just want to add a comment.

So, what is OpenTracing?

OpenTracing is documentation and library interfaces for distributed tracing instrumentation. To be "OpenTracing" requires bundling its interfaces in your work, so that others can use it to time distributed operations with the same library.

So, who is it for?

OpenTracing interfaces are targeted to authors of instrumentation libraries, and those who want to collaborate with traces created by them. Ex something started a trace somewhere and I add a notable event to that trace. Structure logging was recently added to O

@codefromthecrypt
codefromthecrypt / textikflow.txt
Created January 2, 2017 07:39
example tracing flow made with textik
I made this with https://textik.com in attempts to explain how service naming work. Since english is
second language for many coders, diagrams help. In this case, achieved understanding in <20 minutes!
The tracer on Server A starts a trace since is doesn't read
headers or anything from the browser's request. In zipkin
this has annotation "sr" with the serviceName "loginService"
-\
-\ Server A has role loginService, so its tracer is named that.
Browser -\+--------------+
@codefromthecrypt
codefromthecrypt / kakfa-oneway.java
Last active July 12, 2018 19:56
Kafka one-way with Brave
/**
* This is an example of a one-way or "messaging span", which is possible by use of the {@link
* Span#flush()} operator.
*
* <p>Note that this uses a span as a kafka key, not because it is recommended, rather as it is
* convenient for demonstration, since kafka doesn't have message properties.
*
* <p>See https://github.com/openzipkin/zipkin/issues/1243
*/
public class KafkaExampleIT {
@codefromthecrypt
codefromthecrypt / normal-instrumentation.txt
Last active January 11, 2017 05:02
Normal case of Zipkin instrumentation
┌──────────────────────────────┐ ┌──────────────────────────────┐
│ Tracer: Web │ │ Tracer: App │
│ │ │ │
└──────────────────────────────┘ │ └──────────────────────────────┘
────────────────────────────────────┴───────────────────────────────────
┌ ─ ─ ─ ─ ─ ─ ─ ─ ┐ ┌─────────────────────────────┐
(1) │(1) Web tracer starts a trace│
│ sr │ │ since no headers exist │
─────────▶ └─────────────────────────────┘
HttpServletRequest httpRequest = (HttpServletRequest) request;
TraceContextOrSamplingFlags contextOrFlags = contextExtractor.extract(httpRequest);
Span span = contextOrFlags.context() != null
? tracer.joinSpan(contextOrFlags.context())
: tracer.newTrace(contextOrFlags.samplingFlags());
@codefromthecrypt
codefromthecrypt / search-for-two.sh
Created February 16, 2017 19:38
searching for multiple service names
# substitute 2017-01-18 for the correct date and foo and bar for the two services names you're interested in
curl -s localhost:9200/zipkin-2017-01-18/span/_search -d'{
"query": {
"bool": {
"should": [
{
"nested": {
"path": "annotations",
"query": {
"bool": {
@codefromthecrypt
codefromthecrypt / gist:7180c278b62e8f6a216a2aea45d08fc9
Created March 3, 2017 08:54
aggregation for binary annotations
# this returns all of the annotation values, not just the http.url
# for more.. https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-terms-aggregation.html
curl -s 'localhost:9200/zipkin-*/span/_search' -d'{
"_source": false,
"aggs": {
"binaryAnnotations": {
"nested": {
"path": "binaryAnnotations"
},
"aggs" : {
@codefromthecrypt
codefromthecrypt / metrics-from-sampled.md
Created July 2, 2017 18:54
When you derive metrics from sampled traces

I have heard a number of APMs create "spans" (distributed tracing lingo for an operation) and aggregate them for reasons like latency metrics.

In a way, Zipkin does this. The ever popular service dependency diagram is an aggregated view of parent/child links between services with the number of calls between them added for color.

The biggest issue with using a tracing api to back metrics is that most of the time, tracing is sampled (like 1 out of 1000). Sampling is done to reduce costs or prevent a surge of traffic from taking out the

@codefromthecrypt
codefromthecrypt / uuid.md
Created July 18, 2017 11:02
UUID is a 128bit ID like a square is a rectangle

I've been party to some pretty interesting conversations around Zipkin's ID format and UUIDs. Seems interesting to share, since some of this stuff is neat, if geeky.

So, we love UUIDs! Many people use UUID format to pass around things unlikely to ever clash. These are used for long-term retrieval and other handy things. A lot of correlation IDs are UUIDs, too.

What about distributed tracing? Trace IDs should be UUID, too! To this I answer.. maybe?

Firstly, there are a lot of systems inspired by the dapper paper which hints how

@codefromthecrypt
codefromthecrypt / auto-trace-trouble.md
Last active September 21, 2017 21:53
Auto-close tracing can be troublesome

The number one problem to solve in tracing is propagation, at least carrying trace identifiers from one side of your process to the other. Usually, some thread local state is involved with this. In I'd say most libraries apart from OpenTracing, this is a separate functionality than timing things. For example, you put variables in scope with one api and you do timing with another. It might look like this.

span = startSpan();
try (Scope scope = propagation.scope(span.ids())){
  // work here can see the span ids via something like Trace.ids() 
} finally {