This is a short guide on how to work with .otlp.jsondn.gz
files that you'll find in the artifacts directory of TeamCity runs after a run of the TestWorkload
test.
- DuckDB
- Go
These files are produce by the OTLPFileExporter
. Its documentation will tell you all about the format.
At the time of writing, there's no off the shelf tooling that works with OTeL's file protocol. You can technically use zcat
and curl
to send these archives into an OTLP processor but the work to set them up vastly outweighs their usefulness. Your best bet is Jaeger
's OTLP endpoint. However, the OTLP protocol added a breaking change to their .proto
specs back in some version. The binary encoded formats are backwards compatible but the JSON formats had a key renamed. You'll have to do some post-processing to get this data into Jaeger. Using the version of the OTEL SDK in CRDB to unmarshal the JSON, re-serialize it as binary data, and then send it to the grpc OTLP Jaeger endpoint is going to be your best bet. I don't have a script for doing so because I never found it to be useful.
The OTLP format is pretty heavily nested as it's meant to be sent as batches over the wire. This makes it pretty annoying to work with. otlp-duck.fish
is a fish script that will unnest the data into a single spans
table, which is much nicer to poke around. DuckDB has builtin support for decompressing and reading gzipped jsonnd files, just make sure you're using a recent version. I chose DuckDB because it's fast, support ephermeral databases, and can deal with nested database better than anything else I've tried. If it's not your jam, you can just as easily pipe data into anything else with zcat
and/or jq
.
Next we come to the go script within this gist. It was an attempt to automatically reproduce an error discovered by the RSW. It currently uses the PID
attribute to group queries by connection. As noted by Rafi, this is incorrect and another attribute will have to be added.
To start, you'll want to get DuckDB up and running with your data loaded into the spans
table using the otlp-duck.fish
script.
From there, we'll make a CSV "script" that will be replayed.
I've been using this line of SQL to produce said script. It can probably be improved.
SELECT attributes->'pid', "end", regexp_replace(attributes->>'sql', '\s+', ' ', 'gs'), [x->'Value'->>'StringValue' for x in (attributes->'args'->'values')::JSON[]], error FROM spans
By default DuckDB will use the duckbox
format and omit anything more than 40 lines. You can use these "dot" commands to dump the results to a CSV file:
.mode csv
.out script.csv
Great! We now have a script to try to replay. You'll need two terminal panes next. One to run the go script and the other to run a cockroach demo cluster.
Here's the command for running a demo cluster:
./cockroach-short demo --listening-url-file ./demo-url --nodes 3 --demo-locality=region=us-east1,az=1:region=us-east2,az=1:region=us-east3,az=1 --empty
I've been using demo
because it can emulate multi-region which is done within TestWorkload
. --listening-url-file
just cat
s the connection URL to a file which makes scripting easier. --empty
does what it sounds like and doesn't load in any demo data, again this is what TestWorkload
does/expects.
Before running the replay script, we'll have to make the correct database:
CREATE DATABASE schemachange;
And finally, let's run our replay script (assuming you've go install
'ed it):
sql-replay script.csv (cat demo-url | sed 's/defaultdb/schemachange/')
Note: sed
is used to ensure that we connect to the schemachange
database instead of defaultdb
which can cause errors with the replay at times.
As discussed, this doesn't work great right now. There's a lot of non-determinism within the workload and the connection grouping isn't perfect either. Though it get's further than you'd expect it to most of the time. With a bit of work and some leasing hackery to control schemachange jobs, we can probably get repros to work as expected!