Skip to content

Instantly share code, notes, and snippets.

@codefromthecrypt
codefromthecrypt / troubleshooting_cassandra.md
Created January 27, 2019 12:38
troubleshooting cassandra

Load testing example of cassandra3 storage

Hypothesis: a recent change to the code caused a problem which would be visible under load as dropped spans.

Validation approach: create a lot of load and check if there are any dropped spans

Conclusion: Hypothesis isn't supported. there could be a different explanation for dropped spans, possibly data in nature.

Steps

Changed brave-webmvc-example to make a lot of separate requests instead of buffering.

@codefromthecrypt
codefromthecrypt / differences.diff
Created April 29, 2018 08:57
differences in grafana due to metrics changes
92c92
< "expr": "rate(zipkin_collector_messages{instance=~\"$instances\"}[1m])\n",
---
> "expr": "rate(zipkin_collector_messages_total{instance=~\"$instances\"}[1m])\n",
100c100
< "expr": "rate(zipkin_collector_messages_dropped{instance=~\"$instances\"}[1m])\n",
---
> "expr": "rate(zipkin_collector_messages_dropped_total{instance=~\"$instances\"}[1m])\n",
103c103
< "metric": "{{instance}} {{transport}} dropped messages",
[
{
"traceId": "5aab74dbb904746bb33447baae403ed6",
"parentId": "05e3ac9a4f6e3b90",
"id": "e457b5a2e4d86bd1",
"kind": "CONSUMER",
"name": "next-message",
"timestamp": 1521186011929043,
"duration": 14,
"localEndpoint": {
@codefromthecrypt
codefromthecrypt / my-misunderstandings.md
Last active October 31, 2020 12:39
My take on: Misunderstanding "Open Tracing" for the Enterprise

This is a reaction post to Misunderstanding "Open Tracing" for the Enterprise by @jkowall

This is opinions with citations (imagine that!). This is not wikipedia. sorry. I didn't run this by anyone, Jonah or otherwise. I do not represent OpenTracing or OpenCensus (or my employer or whatever you might think) in this view. I will give critical thoughts on both from a technical view as people complained to me mostly about lack of details. I do have "a dog in the race" but it isn't what you might think. Yes, I'm the primary maintainer of Zipkin, but my goal is not to disparage anything rather to keep the community healthy with options that exist and free from the suffering caused in my opinion by complete lack of technical view on what things do. Particularly, this is dangerous in interop, and I'll get to that.

I have experience with both tools. Though I left within months, I was implicated in the beginning of OpenTracing. I still mainta

@codefromthecrypt
codefromthecrypt / create-some-load.sh
Created October 11, 2017 07:28
create some load against your docker server
while true; do curl -s http://192.168.99.100:9411/api/v2/spans -H'Content-Type: application/json' -H 'Expect:' -d'[{
"traceId": "4d1e00c0db9010db86154a4ba6e91385",
"parentId": "86154a4ba6e91385",
"id": "4d1e00c0db9010db",
"kind": "CLIENT",
"name": "get",
"timestamp": 1472470996199000,
"duration": 207000,
"localEndpoint": {
"serviceName": "frontend",
@codefromthecrypt
codefromthecrypt / auto-trace-trouble.md
Last active September 21, 2017 21:53
Auto-close tracing can be troublesome

The number one problem to solve in tracing is propagation, at least carrying trace identifiers from one side of your process to the other. Usually, some thread local state is involved with this. In I'd say most libraries apart from OpenTracing, this is a separate functionality than timing things. For example, you put variables in scope with one api and you do timing with another. It might look like this.

span = startSpan();
try (Scope scope = propagation.scope(span.ids())){
  // work here can see the span ids via something like Trace.ids() 
} finally {
@codefromthecrypt
codefromthecrypt / uuid.md
Created July 18, 2017 11:02
UUID is a 128bit ID like a square is a rectangle

I've been party to some pretty interesting conversations around Zipkin's ID format and UUIDs. Seems interesting to share, since some of this stuff is neat, if geeky.

So, we love UUIDs! Many people use UUID format to pass around things unlikely to ever clash. These are used for long-term retrieval and other handy things. A lot of correlation IDs are UUIDs, too.

What about distributed tracing? Trace IDs should be UUID, too! To this I answer.. maybe?

Firstly, there are a lot of systems inspired by the dapper paper which hints how

@codefromthecrypt
codefromthecrypt / metrics-from-sampled.md
Created July 2, 2017 18:54
When you derive metrics from sampled traces

I have heard a number of APMs create "spans" (distributed tracing lingo for an operation) and aggregate them for reasons like latency metrics.

In a way, Zipkin does this. The ever popular service dependency diagram is an aggregated view of parent/child links between services with the number of calls between them added for color.

The biggest issue with using a tracing api to back metrics is that most of the time, tracing is sampled (like 1 out of 1000). Sampling is done to reduce costs or prevent a surge of traffic from taking out the

@codefromthecrypt
codefromthecrypt / gist:7180c278b62e8f6a216a2aea45d08fc9
Created March 3, 2017 08:54
aggregation for binary annotations
# this returns all of the annotation values, not just the http.url
# for more.. https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-terms-aggregation.html
curl -s 'localhost:9200/zipkin-*/span/_search' -d'{
"_source": false,
"aggs": {
"binaryAnnotations": {
"nested": {
"path": "binaryAnnotations"
},
"aggs" : {
@codefromthecrypt
codefromthecrypt / search-for-two.sh
Created February 16, 2017 19:38
searching for multiple service names
# substitute 2017-01-18 for the correct date and foo and bar for the two services names you're interested in
curl -s localhost:9200/zipkin-2017-01-18/span/_search -d'{
"query": {
"bool": {
"should": [
{
"nested": {
"path": "annotations",
"query": {
"bool": {