Skip to content

Instantly share code, notes, and snippets.

@christroutner
Last active November 20, 2021 07:11
Show Gist options
  • Save christroutner/77c46f1fa9adaf593074d41a508a6401 to your computer and use it in GitHub Desktop.
Save christroutner/77c46f1fa9adaf593074d41a508a6401 to your computer and use it in GitHub Desktop.
SLPDB Review

SLP Indexing Review

Overview

This report was sponsored by an anonymous donor. I'd like to thank them for the generous contribution to the industry, the PSF, and to me.

The purpose of the report is to:

  • Assess the current state of SLPDB and SLP indexing in the industry.
  • Explore the costs of improving the indexer or creating a new one.

The SLPDB indexer gave rise to a prolific token economy on the Bitcoin Cash blockchain. eCash, a fork of Bitcoin Cash, also makes use of SLP tokens as well. Both ecosystems require a robust indexer in order to sustain their token economies.

History of SLPDB

SLPDB was forked from the JavaScript indexer BitDB by James Cramer. The early history of the SLP token economy is captured in this video.

As an indexer, BitDB diverges significantly from the indexers that came before it. The transaction indexer built into the Bitcoin full node, and the Fulcrum address and UTXO indexer, uses a key-value database for fast execution and small resource footprint. In those indexers, only specific, targeted metadata goes into the indexer database.

In contrast, BitDB expands each transaction and saves every part of the expanded data into the database. The benefit of this behavior is rich, expressive queries. A priori knowledge of the data to be indexed is not needed, as any aspect of every transaction can be queried at any time. The downside is two-fold:

  • The database requires a huge footprint (in terms of hard drive space), relative to the original size of the transactions it's indexing.
  • Interacting with the database, in terms of making queries or indexing new transactions, requires large and ever-increasing resources in terms of memory and processing power.

Since its inception, the technology behind SLPDB has been spun out into several smaller modules:

  • slp-parser can read and parse the OP_RETURN data making up an SLP transaction.
  • slp-mdm can generate OP_RETURN data for an SLP transaction.
  • slp-validate can independently validate the DAG of an SLP transaction.

The above tools did not exist prior to the creation of SLPDB. However, future solutions can leverage them. Because SLP tokens are not miner validated, it is critical that any solution adhere to well-tested consensus rules. These tools provide leverage for future indexers, as well as protection against consensus-risk.

Problems with SLPDB

SLPDB still functions, but it's memory requirements grow by the day. Indexing from 'genesis' (block 543375) takes over a week, even on a high-end computer. It currently requires a machine with at least 32GB of RAM to operate a fully-synced SLPDB. With the increasing growth of the SLP economy, it will soon require 64GB as a minimum. Many VPS cloud service providers do not offer 64GB tiers, and the cost of high-memory machines increases exponentially. This has resulted in far fewer SLPDB operators in the space. This centralizing effect makes the SLP economy fragile and more susceptible to attack.

If the memory requirements of SLPDB could be brought down to 16GB or less, it would make the software much more practical for operation, and would increase adoption by decentralized operators.

The large memory requirements are inherent in the architecture of BitDB. If the large memory requirement was only part of the initial synchronization, it could be overcome. But fully-synced operation is also memory intensive. For example, querying the balance of an address requires recursive queries to the database.

Because of this inherent unbounded memory requirement, a simple refactor of the SLPDB code base would be insufficient to solve the current problems facing SLPDB. That leaves two options:

  • Explore alternative indexer options.
  • Create a new indexer for SLP transaction which does not suffer from the same limitations of SLPDB.

SLP Indexer Alternatives

A list of alternative SLP indexers are presented below. They are ranked from least-feasible to most-promising.

  • Bitcoin.com SLP Indexer is written in Java and depends on AWS infrastructure with the intended target of the Bitcoin.com wallet. It also uses MongoDB. The fact that Bitcoin.com has stopped supporting SLP tokens (other than USDT), speaks to the efficacy of this indexer. No one in the industry uses this indexer, other than Bitcoin.com.

  • slp.Indexer is written in C# and uses Microsoft SQL server for the indexer database. It was developed by Eligma Labs, a subsidy of GoCrypto. I have been unable to find anyone running this software other than them.

  • bcash is a JavaScript-based full node implementation for the Bitcoin Cash and eCash blockchains. It fell out of consensus but has been restored and is maintained by Vin Armani. He recently added SLP indexing capability to the code base, though it does not index NFT tokens. This project is promising because it's fast, uses little memory, and it's easy to maintain for a JavaScript-based group like the PSF.

  • Before James Cramer left the space, he integrated SLP indexing into the BCHD full node. It has the ability to validate SLP transactions and can get token information. This is currently the most popular alternative to SLPDB, but lacks many of the rich-data features of SLPDB. The gRPC and protobuf interface is awkward and problematic from the standpoint of a workflow for web developers. BCHD historically had issues scaling to a production environment, and had a consistent bug that would cause it to corrupt it's own database. From speaking with several operators, these problems appear to have been fixed.

Most of the SLPDB alternatives use software stacks other than JavaScript, which makes them a poor fit with the CashStack maintained by the Permissionless Software Foundatoin. Knowing that an indexer alternative exists is very different from being able to successfully operate and maintain the software.

There is a significant danger presented by these alternative indexers as well. Because SLP tokens are not miner validated, there is no 'consensus' to coordinate indexers with slightly different validation rules. This means that tokens passing between different wallets will be treated differently. For example, PSF tokens have been burned in the past, while passing between wallets that alternated between SLPDB and the Bitcoin.com indexer, due to very slight differences in the validation rules of the indexers. More information can be found in the extensive SLPDB test suite README.

Another significant risk is 'platform-risk' or the risk that the platform will cease to function. Both bcash and BCHD are maintained by volunteers, with no reliable cash-flow to guarantee ongoing maintenance. If a network upgrade date passes and these nodes fall out of consensus, their ability to act as an SLP indexer goes with it. This was the fate of the Wormhole token protocol. The development team was defunded, and the protocol died when their full node was forked off the network from a network upgrade. SLPDB was built as standalone software, decoupled from a full node, in order to avoid the fate of Wormhole.

Indexing SLP Transaction

At its core, indexing an SLP transaction is not too difficult. There are three types of transactions defined by the SLP Token Type 1 specification:

  • GENESIS
  • SEND
  • MINT

An indexer simply performs the following actions:

  1. Start at SLP genesis (block 543375)
  2. Scan every transaction in every block for transactions with an OP_RETURN output that complies with the specification.
  3. If a transaction complies with the OP_RETURN rules, validate the DAG of transactions back to the GENESIS transaction.
  4. If steps 2 and 3 pass, update the key-value database.
  5. Repeat from step 2 until the the tip of the chain is reached.
  6. Once fully synced, continue to scan transactions as they come in. Also handle block-reorgs.

This indexing is made much easier by using tools like slp-parser, which also avoids the consensus-risk mentioned above. bch-js has several functions, like Transaction.get() to make the processing of transactions even easier.

As part of the research for this report, a prototype SLP indexer was started, in order to get a 'feel' for what it would take. The prototype is available here.

The cost of developing such a new indexer is estimated as follows:

  • 1 month of full time development (160 hours) by a senior developer to finish the initial version.
  • 3-6 months of part time development (240-480 hours) by a junior developer to debug and refine the code.

Recommendations for Future Work

Refactoring the SLPDB code base would be a herculean effort, requiring significant amount of time and cost. Even if the code could be re-written, there is no reason to expect a refactor to lower the memory footprint. The BitDB approach of indexing every part of a transaction (instead of focused indexing of essential metadata) leads to bloat-by-design. When reading data from the database, the expressiveness of the jq query language makes data retrieval significantly more costly than a simple key-value lookup (as is done in other indexers, such a Fulcrum).

But there is a silver lining. The creation of SLPDB led to the birth of several smaller, high-quality JavaScript libraries like slp-parser and slp-validate. These smaller 'lego blocks' can be used to build an efficient indexer, while assuring strict ahearence to the existing validation rules used by SLPDB.

BCHD and bcash may present low-hanging fruit. There is great short-term value is exploring them as an alternative to SLPDB for back-end infrastructure. However, there is long-term platform-risk due to the tight coupling of the indexer to the full node, consensus-risk, and the lack of reliable funding for maintenance.

Building any indexer is no small task, even if the principles are simple: walk the blockchain, scan each transaction, and store the metadata in a fast key-value database. The bulk of the work lies not in the initial development, but the refinement and long-term maintenance. For this reason, building a new indexer should be considered a last-resort. However, building a new indexer which is decoupled from any specific full node implementation, would present less long-term platform-risk and less maintenance burden than a tightly-coupled full node implementation.

The scope of this report was to assess the current state of SLPDB and SLP indexing in the industry, and to assess the cost of alternatives to SLPDB. The conclusion of this research is two recommendations:

  • Explore the use of bcash and BCHD as alternatives to SLPDB. (short-term)
  • Write a new indexer from scratch, leveraging existing tools like slp-parser and slp-validate. (long-term)
@SuperCipher
Copy link

SuperCipher commented Oct 14, 2021

@christroutner
Most of the operation seems related to DAG traversal.
Have you considered a graph database?

@christroutner
Copy link
Author

christroutner commented Oct 14, 2021

No. The DAG traversal happens during the indexing. The recursive queries depend on the specific use-case. In the case of looking up a balance, the query retrieves all the UTXOs, then sums up the balance of the UTXOs. For addresses with a lot of tokens or a lot of UTXOs, this can get quite 'heavy'.

A more computationally efficient way to do it, would be to have a key-value lookup. The address would be the key, and the value would be its current balance.

@Ekliptor
Copy link

Did you consider using RocksDB instead of LevelDB? It's a built upon LevelDB but optimized exactly for faster key-pair lookup on SSD drives with large data data sets. Sounds like the perfect fit for SLP tokens.

There is a NPM package for Node.js https://www.npmjs.com/package/rocksdb

Bringing this idea further: You should add an abstraction layer for the key-value store so that we can easily switch between different implementations. Redis - being very popular & widely available - would likely be another choice of many server operators.

@christroutner
Copy link
Author

Those are excellent suggestions @Ekliptor. Thank you!

I'll check out that RocksDB package. My understanding was that JS support for RocksDB is fairly new. But your idea for creating an 'abstraction layer' (e.g. an adapter) is a good one. That would let anyone swap out different key-value databases, and would allow for good benchmark tests.

@Ekliptor
Copy link

Building & linking rocksDB can be painful indeed (speaking as a Go dev). I haven't tried the linked NPM, but they say they ship binaries common platforms. API-wise from the JS side it it will be the same leveldown API with Level & RocksDB. Therefore the C++ side when "linking" it using node-gyp is also identical with both.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment