Skip to content

Instantly share code, notes, and snippets.

@achow101
Last active November 7, 2025 15:23
Show Gist options
  • Save achow101/fd05f28f8ee5a65028076720ec898fcc to your computer and use it in GitHub Desktop.
Save achow101/fd05f28f8ee5a65028076720ec898fcc to your computer and use it in GitHub Desktop.

Stopping Arbitrary Data

Disclaimer: This is not an endorsement of any of the ideas presented in this document.

These are notes from the 2025-09-15 livestream: https://www.twitch.tv/videos/2567397716

Disallow From Output Scripts

First have to validate output scripts at all.

  • Any opcode must be a defined opcode and not disabled or immediate failure (e.g. OP_CAT, or OP_RETURN)
  • An output script must fail with invalid stack operation when executed by itself
    • Bypass: Start script with OP_DROP, then push all the data
  • Or, script must fail with invalid stack operation, but interpreter can ignore that and the script must be cleanstack
    • Bypass: push data, then OP_DROP everything, add an extra OP_DROP

Basically intractable to do analysis on complex output scripts since output scripts are only half of the script.

  • Only allowed output script are P2PKH, P2SH, P2WPKH, P2WSH, P2TR
    • P2PKH, P2SH, and P2WPKH would still allow 20 bytes of data per output
    • P2WSH and P2TR allows 32 bytes of data per output
  • Other current standard ones (i.e P2PK, Bare multisig, OP_RETURN) are no longer valid

Disallow From Input Scripts

  • Make the inscription envelope (OP_FALSE OP_IF) invalid
    • Bypass: Move OP_FALSE to the stack
    • Bypass: OP_1 OP_1 OP_SUB OP_IF
  • Scripts with unreachable branches are invalid
  • For tapscript, OP_IF as the first opcode is disallowed
    • Bypass: OP_CHECKSIG then the OP_IF ...
  • Disallow OP_IF
    • Massive drawbacks, kills HTLCs and also other useful things, e.g. used in miniscript
  • Scripts must be valid Miniscript, and analysis on all branches must be satisfiable
    • Bypass: big multisig
  • Scripts must be valid Miniscript, except no multi() or multi_a()
    • Multisigs need to use Taproot and MuSig or FROST
    • Bypass: or_i() a bunch of hash fragments
    • Removes all upgrade paths because no OP_NOPs or OP_SUCCESS
      • Keeping OP_NOP or OP_SUCCESS breaks the analysis

Every Pubkey Needs To Be Valid

  • Creating an output requires revealing the redeemScript, witnessScript, or tapscripts for that output, then each pubkey must come with a signature to prove that a private key exists
    • To preserve non-interactivity, the signature is over some other fixed message defined by consensus
    • New sidecar data structure for pubkey signatures
    • Maybe easier to also just make output scripts tapscript, instead of hiding behind hashes since the script needs to be revealed anyways
      • Loses benefits of taproot
    • Probably no pubkey hashing
    • Scripts must be Miniscript so that pubkeys can be identified
  • Still possible to encode some data with grinding, or the privkey thing that bitmex described
    • thresh(1, pk_k(k1), pk_k(k2)...) as the script contains the pubkeys from the private key encoding thing with the fixed signatures using known k values.
  • Somehow all of the scripts and pubkeys and signatures need to be communicated to senders in order to even be put into outputs

Don't Forget The Control Block and Annex

  • Reduce the scriptpath merkle tree depth, but there will always be room for more data

Upgrade Paths Allow Data

  • No Annex: Annex allows for an arbitrary amount of data to be pushed to the stack
  • No unknown witness versions: can have up to 40 bytes of data in the output script
    • Unknown witness versions can have anything in their witness stack which means that there can be maximal data included

All upgrades must be via hard fork.... eww.

No Expressivity, Keys Only

  • Every output is a P2TR, no scriptpath, with a signature over a fixed message
  • Still possible to MuSig and FROST for multisigs
  • Can't do anything else interesting, including HTLCs, or covenants, or whatever other neat script thing that a bunch of people want to do.

Other Random Places in Txs to put data

  • Always enforce locktime regardless of input sequence to prevent data in locktime
    • As long as the locktime is less than current timestamp or block height, data can be stored
  • Every transaction must be >= v2 to prevent data in input sequences

Instead of Stopping, Make Arbitrary Data More Expensive

  • Increase the weight cost on outputs, instead of 4 weight per byte, 20 weight per byte
  • Unexecuted branches in inputs don't get witness discount
    • Bypass: Push then OP_DROP
  • Original stack items that are consumed but not processed by an opcode (e.g. OP_DROP) don't get witness discount
    • Bypass: OP_EQUAL OP_NOT
  • Any stack item consumed by OP_CHECK(MULTI)SIG(ADD) or Tapscript keypath
    • Bypass: CHECKMULTISIG or CHECKSIGADD with fake pubkeys
  • Any pubkey which wasn't used in the input (no/invalid sig) doesn't get the discount
  • Altstack must be empty at the end of the script
@Filiprogrammer
Copy link

  • Unexecuted branches in inputs don't get witness discount
    • Bypass: Push then OP_DROP
  • Original stack items that are consumed but not processed by an opcode (e.g. OP_DROP) don't get witness discount
    • Bypass: OP_EQUAL OP_NOT
  • Any stack item consumed by OP_CHECK(MULTI)SIG(ADD) or Tapscript keypath
    • Bypass: CHECKMULTISIG or CHECKSIGADD with fake pubkeys
  • Any pubkey which wasn't used in the input (no/invalid sig) doesn't get the discount
  • Altstack must be empty at the end of the script

These sound like good ideas worth thinking about.

But I don't like this:

  • Increase the weight cost on outputs, instead of 4 weight per byte, 20 weight per byte

This would make monetary transactions considerably more expensive while barely impacting large inscriptions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment