Background

While recently considering the concept of “pruning” in bitcoin, a novel way to prune a large portion of the UTXO set occurred to me. Before I outline the idea, I want to say three things:

Assumptions

Number 1, one of bitcoin’s trust assumptions is: when your node requests block data, not all of your peers will withhold it from you. If they did, you could not sync the chain, or receive money.

Number 2, nodes with limited storage space sometimes use a technique called “pruning” to save space. While validating the blockchain, they periodically discard the oldest parts of it – all but the most recent blocks. Whenever a new block gets added to the chaintip, they discard whatever block is oldest. This means their local copy of the blockchain has a roughly constant size, usually a couple hundred megabytes, whereas “full archival” nodes (i.e. nodes that don’t prune) store the “full” blockchain – which grows constantly, and is currently hundreds of gigabytes large.

Number 3, besides the blockchain, nodes have one other significant consumer of disk space: the utxo set. Nodes need to know what utxos are available to spend in order to validate new blocks, and to do that, they store a copy of all utxos. This storage requirement consumes dozens of gigabytes of disk space, and, due to the popularity of certain types of spam (e.g. Stamps), it grows somewhat constantly. And you can’t “prune” this data without some method of recovering it because your node cannot validate a transaction that claims to spend utxo Q if it doesn’t know whether utxo Q is available to spend.

New idea

Ok so here comes the new idea: what if you could use the trust assumption outlined in Number 1 to effectively prune the utxo set just like you can prune the blockchain? Specifically, what if some nodes on the network signaled that they are willing to provide an extra bit of data to their peers upon request, which is this: if your node asks theirs to prove that a certain utxo is spent, they will provide that proof, if they have it, in the form of (1) the earliest transaction in the blockchain that spends that utxo (2) and a merkle proof proving that transaction is included in a certain block’s block header. Your node can then validate that transaction and validate the merkle proof – which even a pruned node can do.

How this helps

If your node connects to peers who signal that feature, then it does not need to store a full copy of the utxo set. The reason is this: suppose you discard some utxos, including utxo Q, and a new block comes in containing a transaction that claims to spend utxo Q. To validate it, you request from your peers a proof that utxo Q was spent. Per the trust assumption stated in Number 1, not all of your peers will withhold this data from you, so if the utxo was spent before the current block, you will learn of it, and that will prove that utxo Q was spent already, so the block is invalid; otherwise, utxo Q was not spent already, so the block is not therefore invalid.

Optimization

Nodes which use this technique could therefore prune their utxo set. To limit network requests, they could also keep a configurable amount of utxos, e.g. the most recently created 1 gigabyte of them, and prune the remainder. Then, if anyone spends an “old” utxo, use the method outlined above to check if it's really unspent, and continue validation.

supertestnet/pruning-utxos.md

Background

Assumptions

New idea

How this helps

Optimization

JoseSK999 commented Oct 9, 2025 •

edited

Loading

Uh oh!

supertestnet/pruning-utxos.md

Background

Assumptions

New idea

How this helps

Optimization

JoseSK999 commented Oct 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

JoseSK999 commented Oct 9, 2025 •

edited

Loading