Adrian Cole codefromthecrypt

Load testing example of cassandra3 storage

Hypothesis: a recent change to the code caused a problem which would be visible under load as dropped spans.

Validation approach: create a lot of load and check if there are any dropped spans

Conclusion: Hypothesis isn't supported. there could be a different explanation for dropped spans, possibly data in nature.

Steps

Changed brave-webmvc-example to make a lot of separate requests instead of buffering.

This is a reaction post to Misunderstanding "Open Tracing" for the Enterprise by @jkowall

This is opinions with citations (imagine that!). This is not wikipedia. sorry. I didn't run this by anyone, Jonah or otherwise. I do not represent OpenTracing or OpenCensus (or my employer or whatever you might think) in this view. I will give critical thoughts on both from a technical view as people complained to me mostly about lack of details. I do have "a dog in the race" but it isn't what you might think. Yes, I'm the primary maintainer of Zipkin, but my goal is not to disparage anything rather to keep the community healthy with options that exist and free from the suffering caused in my opinion by complete lack of technical view on what things do. Particularly, this is dangerous in interop, and I'll get to that.

I have experience with both tools. Though I left within months, I was implicated in the beginning of OpenTracing. I still mainta

The number one problem to solve in tracing is propagation, at least carrying trace identifiers from one side of your process to the other. Usually, some thread local state is involved with this. In I'd say most libraries apart from OpenTracing, this is a separate functionality than timing things. For example, you put variables in scope with one api and you do timing with another. It might look like this.

span = startSpan();
try (Scope scope = propagation.scope(span.ids())){
  // work here can see the span ids via something like Trace.ids() 
} finally {

I've been party to some pretty interesting conversations around Zipkin's ID format and UUIDs. Seems interesting to share, since some of this stuff is neat, if geeky.

So, we love UUIDs! Many people use UUID format to pass around things unlikely to ever clash. These are used for long-term retrieval and other handy things. A lot of correlation IDs are UUIDs, too.

What about distributed tracing? Trace IDs should be UUID, too! To this I answer.. maybe?

Firstly, there are a lot of systems inspired by the dapper paper which hints how

I have heard a number of APMs create "spans" (distributed tracing lingo for an operation) and aggregate them for reasons like latency metrics.

In a way, Zipkin does this. The ever popular service dependency diagram is an aggregated view of parent/child links between services with the number of calls between them added for color.

The biggest issue with using a tracing api to back metrics is that most of the time, tracing is sampled (like 1 out of 1000). Sampling is done to reduce costs or prevent a surge of traffic from taking out the

	92c92
	< "expr": "rate(zipkin_collector_messages{instance=~\"$instances\"}[1m])\n",
	---
	> "expr": "rate(zipkin_collector_messages_total{instance=~\"$instances\"}[1m])\n",
	100c100
	< "expr": "rate(zipkin_collector_messages_dropped{instance=~\"$instances\"}[1m])\n",
	---
	> "expr": "rate(zipkin_collector_messages_dropped_total{instance=~\"$instances\"}[1m])\n",
	103c103
	< "metric": "{{instance}} {{transport}} dropped messages",

	[
	{
	"traceId": "5aab74dbb904746bb33447baae403ed6",
	"parentId": "05e3ac9a4f6e3b90",
	"id": "e457b5a2e4d86bd1",
	"kind": "CONSUMER",
	"name": "next-message",
	"timestamp": 1521186011929043,
	"duration": 14,
	"localEndpoint": {

	while true; do curl -s http://192.168.99.100:9411/api/v2/spans -H'Content-Type: application/json' -H 'Expect:' -d'[{
	"traceId": "4d1e00c0db9010db86154a4ba6e91385",
	"parentId": "86154a4ba6e91385",
	"id": "4d1e00c0db9010db",
	"kind": "CLIENT",
	"name": "get",
	"timestamp": 1472470996199000,
	"duration": 207000,
	"localEndpoint": {
	"serviceName": "frontend",

	# this returns all of the annotation values, not just the http.url
	# for more.. https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-terms-aggregation.html
	curl -s 'localhost:9200/zipkin-*/span/_search' -d'{
	"_source": false,
	"aggs": {
	"binaryAnnotations": {
	"nested": {
	"path": "binaryAnnotations"
	},
	"aggs" : {

	# substitute 2017-01-18 for the correct date and foo and bar for the two services names you're interested in
	curl -s localhost:9200/zipkin-2017-01-18/span/_search -d'{
	"query": {
	"bool": {
	"should": [
	{
	"nested": {
	"path": "annotations",
	"query": {
	"bool": {