Skip to content

Instantly share code, notes, and snippets.

@the-frey
Forked from joeabbey/README.md
Created January 26, 2022 10:54
Show Gist options
  • Save the-frey/1aa8a3a21b440f8975f44f5b9290ab02 to your computer and use it in GitHub Desktop.
Save the-frey/1aa8a3a21b440f8975f44f5b9290ab02 to your computer and use it in GitHub Desktop.
Measuring Osmosis Epoch

Introduction

Osmosis is an automated market maker for interchain assets. Over the past 7 months, the adoption has continued to accelerate with nearly $1.5B in TVL as of the time of writing. Additionally, the AMM supports 33 unique assets and continues to add new assets as new chains join IBC.

Osmosis is unique from other Cosmos Chains with the implementation of an epochs module. The epochs module hooks the incentives and mint keepers to distribute various rewards once a day. With the growth of the network, increase in incentivized pools, the time to compute the epoch block and produce a NewHeight has increased to roughly 20 minutes.

New users are coming to Osmosis everyday and stay for its ease-of-use, access to many new assets, and incredible speed. The epoch block takes new users by surprise, and can be a negative experience. With more AMMs arriving in the IBC ecosystem, giving users a wider range of choices, the need to reduce the impact of the daily epoch increases.

The impact of the epoch block has gone under a round of intense discussion, and has been well analyzed in the past. The goal of this document is to provide additional data.

In the following sections, I'll describe the procedure for measuring the epoch, analyzing the data, and provide some thoughts on where we can go next.

Understanding Prevotes

To measure the timing of a block's creation, we need to understand a bit about how blocks are created. All Tendermint-based chains rely on a Byzantine Consenus Algorithm run by peers operating as validators to determine the next block. The round-based protocol follows state transitions to produce a NewHeight:

                         +-------------------------------------+
                         v                                     |(Wait til `CommmitTime+timeoutCommit`)
                   +-----------+                         +-----+-----+
      +----------> |  Propose  +--------------+          | NewHeight |
      |            +-----------+              |          +-----------+
      |                                       |                ^
      |(Else, after timeoutPrecommit)         v                |
+-----+-----+                           +-----------+          |
| Precommit |  <------------------------+  Prevote  |          |
+-----+-----+                           +-----------+          |
      |(When +2/3 Precommits for block found)                  |
      v                                                        |
+--------------------------------------------------------------------+
|  Commit                                                            |
|                                                                    |
|  * Set CommitTime = now;                                           |
|  * Wait for block, then stage/save/commit block;                   |
+--------------------------------------------------------------------+

PreVotes include a timestamp which can be used to measure the arrival of votes. Let's take a look a snippet of a the prevotes of a roundset for block 2834022. To do this, we'll need to capture the output of consensus_state over time. I'll be using the following script to capture unique steps per block. This script will run once per 0.1 seconds and gather up the data, with fine enough granularity for 6 second block arrival.

#!/bin/bash

NOW_JSON="$(date +%s.%N).json"
curl -s localhost:26657/consensus_state > ${NOW_JSON}
HRS="$(cat ${NOW_JSON} | jq -r '.result.round_state["height/round/step"]' | tr '/' '_')"
mv ${NOW_JSON} ${HRS}.json
echo ${HRS}

Next, to extract the prevotes we'll need to traverse the json document. For [block 2834022](https://www.mintscan.io/osmosis/blocks/2834022, I'll use the last step to which should have the most complete information, which is stored in 2834022_0_6.json. Using jq we can easily extract the prevotes:

jq -r '.result.round_state.height_vote_set[0].prevotes[]' 2834022_0_6.json

Here's a snippet of that data.

Vote{0:CB5A63B91E8F 2834022/00/SIGNED_MSG_TYPE_PREVOTE(Prevote) AD67900B1097 E919E6BD75BD @ 2022-01-17T11:55:03.532273386Z}
Vote{1:16A169951A87 2834022/00/SIGNED_MSG_TYPE_PREVOTE(Prevote) AD67900B1097 FF18B06CC26E @ 2022-01-17T11:55:03.544551492Z}
Vote{2:9D0281786872 2834022/00/SIGNED_MSG_TYPE_PREVOTE(Prevote) AD67900B1097 B70890682857 @ 2022-01-17T11:55:03.00362945Z}
Vote{3:66B69666EBF7 2834022/00/SIGNED_MSG_TYPE_PREVOTE(Prevote) AD67900B1097 8F66A22BAA82 @ 2022-01-17T11:55:03.385849207Z}
Vote{4:76F706AE73A8 2834022/00/SIGNED_MSG_TYPE_PREVOTE(Prevote) AD67900B1097 6DA36F8194AD @ 2022-01-17T11:55:03.352144029Z}
Vote{5:03C016AB7EC3 2834022/00/SIGNED_MSG_TYPE_PREVOTE(Prevote) AD67900B1097 E5364A2E784C @ 2022-01-17T11:55:03.321100618Z}
Vote{6:6239A498C22D 2834022/00/SIGNED_MSG_TYPE_PREVOTE(Prevote) AD67900B1097 8156037496DC @ 2022-01-17T11:55:03.316470356Z}
Vote{7:844290531EE5 2834022/00/SIGNED_MSG_TYPE_PREVOTE(Prevote) AD67900B1097 5C3E87DFC638 @ 2022-01-17T11:55:03.392949049Z}

Each of these lines are generated by the Vote String() method. Let's break one down

Field Value
ValidatorIndex 0
FingerPrint of Validator Address CB5A63B91E8F
Height 2834022
Round 00
Type SIGNED_MSG_TYPE_PREVOTE
Type String Prevote
FingerPrint of Blockhash AD67900B1097
FingerPrint of Signature E919E6BD75BD
Canonical Time of Vote Timestamp 2022-01-17T11:55:03.532273386Z

We now have the ability to associate a validator's address with the timestamp of their prevote.

Understanding the Data

We can then do some simple charting on this to view as a Histogram. Below we see a histogram of validators time from block proposal to prevote, filtering out 5% outliers:

image

Let's compare that histogram with an epoch block: image

The primary to note with that graph is the dramatic change in X-axis range. From pre-votes arriving in under a second on the first graph, to the second graph ranging from 50 seconds to 20 minutes.

The histogram helps us understand the average arrival rate. We can see that a 36 validators are able to complete the epoch block in under 5 minutes. The histogram doesn't let us easily visualize the prevote arrival rate against a validator's delegations.

Let's us a Bubble chart for this. Let's keep the X-axis representing pre-vote arrival time. Since we can use other dimensions of data let's use the validator's power to represent size and Y-axis position.

image

Infrastructure Setup

This section describes the steps followed for measuring the epoch. Additionally, care will be taken to ensure measurements are made on a node that is not a validator.

The validators and sentries were rapidly synchronized using bootstrap.sh, a script which sets up the tooling required and uses ./scripts/statesync.sh

After each node was synced, I modified the config.toml's to match Tendermint's Recommendations. Additionally, I disabled tx_index by setting it to "null".

I've also firewalled off the mainnet validator from the internet so that it's only accessible via SSH (and peering to the sentries).

Below is a table of the nodes, types, specifications and configurations. Normally one should not expose their validator configuration, but for the purpose of independent verifiction, I've included it in the table below:

Node Type Spec Location Config
osmosis-mainnet-validator validator Contabo VPS XL 800GB NVME St. Louis here
osmosis-mainnet-sentry-001 sentry Contabo VPS XL 800GB NVME St. Louis here
osmosis-mainnet-sentry-002 sentry Contabo VPS XL 800GB NVME St. Louis here
osmosis-mainnet-sentry-003 sentry Contabo VPS XL 800GB NVME St. Louis here
osmosis-mainnet-node node Hetzner AX51-NVME Helsinki here
osmosis-mainnet-node-2 node i3en.2xlarge "Ohio" (us-east-2) here

We'll run this script in a tight loop on osmosis-mainnet-node and osmosis-mainnet-node-2 :

#!/bin/bash

NOW_JSON="$(date +%s.%N).json"
curl -s localhost:26657/consensus_state > ${NOW_JSON}
HRS="$(cat ${NOW_JSON} | jq -r '.result.round_state["height/round/step"]' | tr '/' '_')"
mv ${NOW_JSON} ${HRS}.json
echo ${HRS}

Epoch 1/17/2022

Using Contabo VPS:

Node Type Spec Location
osmosis-mainnet-validator validator Contabo VPS XL 800GB NVME St. Louis
osmosis-mainnet-sentry-001 sentry Contabo VPS XL 800GB NVME St. Louis
osmosis-mainnet-sentry-002 sentry Contabo VPS XL 800GB NVME St. Louis
osmosis-mainnet-sentry-003 sentry Contabo VPS XL 800GB NVME St. Louis
osmosis-mainnet-node node Hetzner AX51-NVME Helsinki
osmosis-mainnet-node-2 node i3en.2xlarge "Ohio" (us-east-2)

epoch_2022_01_17

Epoch 1/18/2022

Using Hetzner Dedicated Servers:

Node Type Spec Location Config
osmosis-mainnet-validator validator Hetzner AX51-NVME Helsinki
osmosis-mainnet-sentry-001 sentry Hetzner AX51-NVME Helsinki
osmosis-mainnet-sentry-002 sentry Hetzner AX51-NVME Helsinki
osmosis-mainnet-sentry-003 sentry Hetzner AX51-NVME Helsinki
osmosis-mainnet-node-2 node i3en.2xlarge "Ohio" (us-east-2)

image

Epoch 1/19/2022

Using Hetzner Dedicated Servers:

Node Type Spec Location Config
osmosis-mainnet-validator validator Hetzner AX51-NVME Helsinki
osmosis-mainnet-sentry-001 sentry Hetzner AX51-NVME Helsinki
osmosis-mainnet-sentry-002 sentry Hetzner AX51-NVME Helsinki
osmosis-mainnet-sentry-003 sentry Hetzner AX51-NVME Helsinki
backup-osmosis-mainnet-sentry-001 sentry Contabo VPS XL 800GB NVME St. Louis
backup-osmosis-mainnet-sentry-002 sentry Contabo VPS XL 800GB NVME St. Louis
backup-osmosis-mainnet-sentry-003 sentry Contabo VPS XL 800GB NVME St. Louis
backup-osmosis-mainnet-validator node Contabo VPS XL 800GB NVME St. Louis

On osmosis-mainnet-sentry-003, I've configured all available peers from the #peers-list channel as persistent_peers, I've also made a few additional tweaks to the config:

  • Attempt to have many connections
# Maximum number of outbound peers to connect to, excluding persistent peers
max_num_outbound_peers = 320
  • Attempt to avoid exponential backoff
# Maximum pause when redialing a persistent peer (if zero, exponential backoff is used)
persistent_peers_max_dial_period = "1s"
  • Make node very impatient.
# Peer connection configuration.
handshake_timeout = "5s"
dial_timeout = "1s"

My theory is if the Osmosis Chain has a few nodes like this, they can help rapidly rebuild the p2p network.

Neither positive nor negative:

image

Observation: Very few IPv6 peers are online.

Epoch 1/20/2022

Unchanged setup from 1/19/2022

The 900-1200 range appears a bit more sparsely populated!

image

Epoch 1/21/2022

Yesterday @valardragon asked:

do we know for the epoch, how much of the time is spent in commit vs execution?

I've set logs to ERROR per the set of optimizations recommended.. so I'll reset that back to INFO so I can get that data today.

I've instrumented ApplyBlock within Tendermint in small branch off of osmosis to be able to better see what is going on internally. Here's a sample of Block 2888662

image

Appendix

Disabling Contabo IPv6 DNS lookups

As of this writing Contabo does not have IPV6 support.

Lower precedence of ipv6 address resolution in /etc/gai.conf

#    For sites which prefer IPv4 connections change the last line to
#
precedence ::ffff:0:0/96  100
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment