Skip to content

Instantly share code, notes, and snippets.

@supertestnet
Last active October 9, 2025 22:31
Show Gist options
  • Save supertestnet/6aa3dbeb7cb749741c18ed9335a23a81 to your computer and use it in GitHub Desktop.
Save supertestnet/6aa3dbeb7cb749741c18ed9335a23a81 to your computer and use it in GitHub Desktop.
Idea for pruning the UTXO set in bitcoin

Background

While recently considering the concept of “pruning” in bitcoin, a novel way to prune a large portion of the UTXO set occurred to me. Before I outline the idea, I want to say three things:

Assumptions

Number 1, one of bitcoin’s trust assumptions is: when your node requests block data, not all of your peers will withhold it from you. If they did, you could not sync the chain, or receive money.

Number 2, nodes with limited storage space sometimes use a technique called “pruning” to save space. While validating the blockchain, they periodically discard the oldest parts of it – all but the most recent blocks. Whenever a new block gets added to the chaintip, they discard whatever block is oldest. This means their local copy of the blockchain has a roughly constant size, usually a couple hundred megabytes, whereas “full archival” nodes (i.e. nodes that don’t prune) store the “full” blockchain – which grows constantly, and is currently hundreds of gigabytes large.

Number 3, besides the blockchain, nodes have one other significant consumer of disk space: the utxo set. Nodes need to know what utxos are available to spend in order to validate new blocks, and to do that, they store a copy of all utxos. This storage requirement consumes dozens of gigabytes of disk space, and, due to the popularity of certain types of spam (e.g. Stamps), it grows somewhat constantly. And you can’t “prune” this data without some method of recovering it because your node cannot validate a transaction that claims to spend utxo Q if it doesn’t know whether utxo Q is available to spend.

New idea

Ok so here comes the new idea: what if you could use the trust assumption outlined in Number 1 to effectively prune the utxo set just like you can prune the blockchain? Specifically, what if some nodes on the network signaled that they are willing to provide an extra bit of data to their peers upon request, which is this: if your node asks theirs to prove that a certain utxo is spent, they will provide that proof, if they have it, in the form of (1) the earliest transaction in the blockchain that spends that utxo (2) and a merkle proof proving that transaction is included in a certain block’s block header. Your node can then validate that transaction and validate the merkle proof – which even a pruned node can do.

How this helps

If your node connects to peers who signal that feature, then it does not need to store a full copy of the utxo set. The reason is this: suppose you discard some utxos, including utxo Q, and a new block comes in containing a transaction that claims to spend utxo Q. To validate it, you request from your peers a proof that utxo Q was spent. Per the trust assumption stated in Number 1, not all of your peers will withhold this data from you, so if the utxo was spent before the current block, you will learn of it, and that will prove that utxo Q was spent already, so the block is invalid; otherwise, utxo Q was not spent already, so the block is not therefore invalid.

Optimization

Nodes which use this technique could therefore prune their utxo set. To limit network requests, they could also keep a configurable amount of utxos, e.g. the most recently created 1 gigabyte of them, and prune the remainder. Then, if anyone spends an “old” utxo, use the method outlined above to check if it's really unspent, and continue validation.

@JoseSK999
Copy link

JoseSK999 commented Oct 9, 2025

Very interesting. The problem is that, if an UTXO (that a block tries to spend) wasn't even created in the past, then you cannot prove it was spent, but it is still invalid.

So you first need a merkle proof for all the UTXOs referred in inputs, such that you know they actually were created in previous blocks. Then you can proceed with your approach.

But now the merkle proofs for all the prevouts may be more data than the respective Utreexo proofs (not sure about the size comparison). You also need to ask for the UTXOs data, like in Utreexo.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment