Skip to content

Instantly share code, notes, and snippets.

@jonasschnelli
Last active November 14, 2022 13:22
Show Gist options
  • Save jonasschnelli/c530ea8421b8d0e80c51486325587c52 to your computer and use it in GitHub Desktop.
Save jonasschnelli/c530ea8421b8d0e80c51486325587c52 to your computer and use it in GitHub Desktop.

  BIP: 324
  Layer: Peer Services
  Title: Version 2 Peer-to-Peer Message Transport Protocol
  Author: Jonas Schnelli <[email protected]>
          Dhruv Mehta <[email protected]>
  Status: Draft
  Type: Standards Track
  Created: 2019-03-08
  License: PD

Table of Contents

Abstract

This BIP describes a new Bitcoin peer to peer transport protocol with opportunistic encryption.

Objectives and Motivation

Add end-to-end encryption: The current Bitcoin p2p protocol(referred to as v1 in this document) is in plaintext. With the current unencrypted message transport, BGP hijack, block delay attacks and message tampering are inexpensive and can be executed covertly (undetectable MITM)[1].[2]

Increase observability of attacks:: Adding opportunistic encryption as described in this document, introduces a high risk for attackers of being detected. Peer operators can compare encryption session IDs or use other form of authentication schemes [3] to identify an attack.

Add encryption without computation overhead: Each v1 p2p message uses a double-SHA256 checksum truncated to 4 bytes. Roughly the same amount of computation power would be required for encrypting and authenticating a v2 p2p message with ChaCha20 & Poly1305.

Add encryption without risk of network partition: Implementing this proposal maintains compatibility between v1 peers and v2 peers. There should be no risk of network patitions.

Specification

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119[4].

A peer that supports the v2 p2p transport protocol as defined in this proposal MUST accept encryption requests from all peers.

The encryption handshake MUST happen before any other messages are exchanged between the peers. A shared secret is established at the end of the Elliptic Curve Diffie-Hellman(ECDH) handshake.

Both communication directions have different symmetric cipher keys deterministically derived from the same shared secret.

If the responding peer closes the connection after receiving the handshake request, the initiating peer MAY try to connect again with the v1 p2p transport protocol. Such reconnects allow an attacker to "downgrade" the encryption to plaintext communication and thus, accepting v1 connections MUST not be done when the Bitcoin peer-to-peer network has almost entirely embraced v2 communication.

Signaling BIP324 support

Peers supporting the transport protocol proposed here MUST signal the NODE_P2P_V2 = (1 << 11) service flag. Such peers MUST accept encrypted communication specified in this proposal.

A peer usually learns an address along with the expected service flags which MAY be used to filter possible outbound peers.

Peers MAY make outbound connections exclusively to peers supporting NODE_P2P_V2.

Handshake

 ----------------------------------------------------------------------------------------
 | Initiator                             Responder                                      |
 |                                                                                      |
 | x, X         := SECP256k1_KEYGEN()                                                   |
 | CLIENT_HDATA := X                                                                    |
 |                                                                                      |
 |               --- CLIENT_HDATA --->                                                  |
 |                                                                                      |
 |                                       y, Y           := SECP256k1_KEYGEN()           |
 |                                       ECDH_KEY       := SECP256k1_ECDH(X,y)          |
 |                                       SERVER_HDATA   := Y                            |
 |                                                                                      |
 |               <-- SERVER_HDATA ----                                                  |
 |                                                                                      |
 | ECDH_KEY     := SECP256k1_ECDH(x,Y)                                                  |
 ----------------------------------------------------------------------------------------

To request encrypted communication (only possible if yet no other messages have been sent or received), the initiating peer generates an EC secp256k1 ephemeral key and sends the corresponding 32-byte public key to the responding peer and waits for the remote 32-byte public key from the counterparty.

ODD secp256k1 public keys MUST be used (public keys starting with 0x03). If the public key from the generated ephemeral key is an EVEN public key (starting with 0x02), its private key SHOULD be negated and then recalculated. Only using ODD public keys makes it more complex to identify the handshake based by analyzing the traffic and looking for 33 bytes that start with 0x02 or 0x03. This in turn increases the cost of censorship.

The handshake request and response message are raw 32 byte payloads containing no header, length or checksum and MUST be sent before anything else.

To aid identification of the v2 handshake from a v1 p2p message, public keys starting with the 4-byte network magic are forbidden and MUST lead to local regeneration of an ephemeral key.

Pseudocode for the ephemeral-key generation

ecdh_priv_key = MakeNewKey()
for {
    ecdh_pub_key = ecdh_priv_key.GetPubKey()
    if (ecdh_pub_key[0] == 2) {
        // Public key is even, negate the private key and try again
        ecdh_priv_key.Negate();
    } else if (ecdh_pub_key[0..3] == NETWORK_MAGIC) {
        // Public key cannot start with the network magic bytes
        ecdh_priv_key = MakeNewKey();
    } else {
        break;
    }
}

Once a peer has received the public key from its counterparty, the shared secret MUST be calculated by using secp256k1 ECDH.

Private keys will never be transmitted. The shared secret can only be calculated if an attacker knows at least one private key and the counterparty's public key.[5]

After a successful v2 handshake, both peers:

  • MUST use the v2 message structure described in this document. Unencrypted v1 messages from either peer MUST lead to an immediate connection termination.
  • MUST wipe the ephemeral session key from memory and persistent storage.
A v1 peer will not perform the described handshake and will send a v1 version message to initiate a connection. v2 peers supporting this BIP MAY optionally allow unencrypted v1 communication by detecting a v1 version message by the initial 11-byte sequence of NETWORK_MAGIC || "version".

Keys and Session ID Derivation

The authenticated encryption construction proposed here requires two ChaCha20Forward4064-Poly1305 cipher suite instances per communication direction. Each cipher suite instance requires a 256-bit key. We derive the four required keys from the established ECDH secret (ECDH_KEY = SECP256k1_ECDH(X,y) = SECP256k1_ECDH(x,Y)) known to both peers after the handshake. The keys MUST be derived with HKDF [6] as shown below. Peer A is the handshake initiator, peer B is the responder.

Both peers MUST also calculate the 256-bit session id, SID.

// Convert ECDH_KEY input key material into a pseudo-random key using HKDF extract.
PRK = HKDF_EXTRACT(hash=SHA256, salt="BitcoinSharedSecret||INITIATOR_32BYTES_PUBKEY||RESPONDER_32BYTES_PUBKEY||NETWORK_MAGIC", ikm=ECDH_KEY)

// Derive 32 byte keys used to encipher data A-> B
K1A = HKDF_EXPAND(prk=PRK, hash=SHA256, info="BitcoinK_1_A", L=32)
K2A = HKDF_EXPAND(prk=PRK, hash=SHA256, info="BitcoinK_2_A", L=32)

// Derive 32 byte keys used to encipher data B -> A
K1B = HKDF_EXPAND(prk=PRK, hash=SHA256, info="BitcoinK_1_B", L=32)
K2B = HKDF_EXPAND(prk=PRK, hash=SHA256, info="BitcoinK_2_B", L=32)

// Derive session id
SID = HKDF_EXPAND(prk=PRK, hash=SHA256, info="BitcoinSessionID", L=32)

Detecting MITM attacks

The v2 protocol is based on ECDH key exchange and cannot prevent MITM attacks. However, comparing the session id SID derived by both peers on a secure channel can help authenticate the session and make any MITM attacks observable.

v2 peers supporting this BIP, MUST present the session id to the user on request.

ChaCha20Forward4064-Poly1305@Bitcoin Cipher Suite

Existing cryptographic primitives

BIP324 leverages two cryptographic primitives designed by Daniel Bernstein:

ChaCha20 PRF

The ChaCha20 PRF (pseudo-random function)[7] takes as input: (1) 128 fixed bits (2) 256 bits of key material (3) a 64 bit IV/nonce and (4) a 64 bit counter and outputs: 512 pseudo-random bits. We will represent it as ChaCha20PRF(key, iv, ctr)

The ChaCha20 PRF can be composed into the ChaCha20 DRBG (deterministic random bit generator) to produce a keystream by incrementing the counter up to 2^70.

ChaCha20DRBG(key, iv) = ChaCha20PRF(key, iv, ctr=0) || ChaCha20PRF(key, iv, ctr=1) || ...

The ChaCha20 DRBG thus constructed does not provide forward and backward security in case of a compromised key. i.e if an attacker learns the key, they can decrypt all passively collected past and future ciphertext. This proposal outlines a new DRBG construction, the ChaCha20Forward4064 DRBG described below to add forward and backward security.

Poly1305 MAC

Poly1305 [8] is a one-time Carter-Wegman MAC that computes a 128 bit integrity tag given a message and a single-use 256 bit secret key.

New cryptographic primitives

ChaCha20Forward4064 DRBG

Assuming a node sends only ping messages (28 bytes in the v2 protocol) every 20 minutes (the timeout interval for post-BIP31 connections) on a connection, the node will transmit 4064 bytes in a little over 2 days. To provide forward and backward security, we propose re-keying the ChaCha20 DRBG every 4064 bytes of keystream. Once 4064 bytes of keystream from the ChaCha20 DRBG has been consumed, the next 32 bytes are used to re-key the cipher instance before continuing to obtain up to another 4064 bytes of keystream. The IV is initialized to 0 and incremented on every re-key event.

k0 = key
iv = 0
ks0, k1 = ChaCha20DRBG(k0, iv)[0:4064], ChaCha20DRBG(k0, iv)[4064:4096]
iv = iv + 1
ks1, k2 = ChaCha20DRBG(k1, iv)[0:4064], ChaCha20DRBG(k1, iv)[4064:4096]
...
ChaCha20Forward4064DRBG(key) = ks0 || ks1 || ks2 || ...

Both peers MUST keep track of the re-keying sequence numbers (uint64) to set as the ChaCha20 IV upon re-keying.

ChaCha20Forward4064-Poly1305@Bitcoin combines the ChaCha20Forward4064 DRBG(using it as a stream cipher by XORing the DRBG keystream with the plaintext or ciphertext) and the Poly1305 MAC into an authenticated encryption mode with addition of encryption of the packet lengths. The detailed construction follows.

Detailed Construction

The ChaCha20Forward4064-Poly1305@Bitcoin cipher suite requires two 256-bit keys from the key exchange per communication direction. Each key (K1 and K2) is used by two separate instances of a ChaCha20Forward4064 DRBG. We will call the instances F(used for fixed-length purposes, using 35 bytes per message) and V(used for variable length purposes).

F = ChaCha20Forward4064DRBG(key=K1)
V = ChaCha20Forward4064DRBG(key=K2)

Packet Handling

When encrypting a message M of length len(M) bytes:

  1. Read 3 bytes from the DRBG instance, F, XOR them with the 3-byte litte endian encoding for len(M) and set the result as ciphertext C.
  2. Read len(M) bytes from V, XOR them with the len(M) bytes of M and append the result to C.
  3. Read 32 bytes from F and use it to instantiate a Poly1305 MAC, P.
  4. Compute a 16 byte MAC tag of contents in C using P. Append the tag to C. C is now the content that can be transmit to maintain confidentiality, integrity and optionally, authentication.
When receiving a packet with ciphertext and MAC, called C:

  1. The length MUST be decrypted when 3 bytes of ciphertext have been received. Read 3 bytes from the DRBG instance, F, XOR them with the first 3 bytes of the ciphertext. Interpret the resulting bytes as the little endian encoding of len(M)
  2. Read 32 bytes from F and use it to instantiate a Poly1305 MAC, P.
  3. Compute a 16 byte MAC tag of contents in C[0:3 + len(M)] and compare it to the 16 bytes at C[3 + len(M):]. The byte comparison must be done in a constant time manner. If the expected MAC tag differs from the provided MAC tag, the connection MUST be immediately terminated.
  4. Read len(M) bytes from V and XOR them with C[3:3 + len(M)] to obtain the plaintext.
The initiating peer MUST use F(K1A), V(K2A) to encrypt messages on the send channel, F(K1B), V(K2B) MUST be used to decrypt messages on the receive channel.

The responding peer MUST use F(K1A), V(K2A) to decrypt messages on the receive channel, F(K1B), V(K2B) MUST be used to encrypt messages on the send channel.

Two separate cipher instances are used here so as to keep the packet lengths confidential (best effort; for passive observing) but not create an oracle for the packet payload cipher by decrypting and using the packet length prior to checking the MAC. By using an independently-keyed cipher instance to encrypt the length, an active attacker seeking to exploit the packet input handling as a decryption oracle can learn nothing about the payload contents or its MAC (assuming key derivation, ChaCha20 and Poly1305 are secure). Active observers can still obtain the message length (ex. active ciphertext bit flipping or traffic semantics analysis)

Optimized implementations of ChaCha20Forward4064-Poly1305@bitcoin are relatively fast, therefore it is unlikely that encrypted messages will require additional CPU cycles per byte when compared to the v1 p2p message format (double SHA256).

AEAD Test Vectors

TODO: Update the test vectors after swapping k1/k2 in code

message   00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
k1 (DATA) 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
k2 (AAD)  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

ciphertext
76 b8 e0 76 b8 e0 ad a0 f1 3d 90 40 5d 6a e5 53 86 bd 28 bd d2 19 b8 a0 8d ed 1a a8 36 ef cc 8b

MAC
df cd 91 c8 75 ee 88 71 8b 63 34 5d 75 0c 68 4a
message   01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
k1 (DATA) 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
k2 (AAD)  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

ciphertext
77 b8 e0 76 b8 e0 ad a0 f1 3d 90 40 5d 6a e5 53 86 bd 28 bd d2 19 b8 a0 8d ed 1a a8 36 ef cc 8b 

MAC
fb 6c f9 dc d7 e2 ee 80 7d 5f f9 81 eb 4a 13 5a
message
ff 00 00 f1 95 e6 69 82 10 5f fb 64 0b b7 75 7f 57 9d a3 16 02 fc 93 ec 01 ac 56 f8 5a c3 c1 34 a4 54 7b 73 3b 46 41 30 42 c9 44 00 49 17 69 05 d3 be 59 ea 1c 53 f1 59 16 15 5c 2b e8 24 1a 38 00 8b 9a 26 bc 35 94 1e 24 44 17 7c 8a de 66 89 de 95 26 49 86 d9 58 89 fb 60 e8 46 29 c9 bd 9a 5a cb 1c c1 18 be 56 3e b9 b3 a4 a4 72 f8 2e 09 a7 e7 78 49 2b 56 2e f7 13 0e 88 df e0 31 c7 9d b9 d4 f7 c7 a8 99 15 1b 9a 47 50 32 b6 3f c3 85 24 5f e0 54 e3 dd 5a 97 a5 f5 76 fe 06 40 25 d3 ce 04 2c 56 6a b2 c5 07 b1 38 db 85 3e 3d 69 59 66 09 96 54 6c c9 c4 a6 ea fd c7 77 c0 40 d7 0e af 46 f7 6d ad 39 79 e5 c5 36 0c 33 17 16 6a 1c 89 4c 94 a3 71 87 6a 94 df 76 28 fe 4e aa f2 cc b2 7d 5a aa e0 ad 7a d0 f9 d4 b6 ad 3b 54 09 87 46 d4 52 4d 38 40 7a 6d eb 3a b7 8f ab 78 c9

k1 (DATA) 00 01 02 03 04 05 06 07 08 09 0a 0b 0c 0d 0e 0f 10 11 12 13 14 15 16 17 18 19 1a 1b 1c 1d 1e 1f
k2 (AAD)  ff 01 02 03 04 05 06 07 08 09 0a 0b 0c 0d 0e 0f 10 11 12 13 14 15 16 17 18 19 1a 1b 1c 1d 1e 1f

ciphertext
39 40 c1 c8 68 cd 14 5b d5 46 91 e9 b6 b4 02 c7 8b d7 ea 9c 37 24 fc 50 df c6 9a 4a 96 be 8d ec 4e 70 e9 58 18 8a a6 92 22 ea ef 3f 47 f8 00 3f 1b c1 3d cf 9e 66 1b e8 e1 b6 71 e9 cf 46 ba 70 5b ca 96 3e 04 77 a5 b3 c2 e2 c6 6f eb 82 07 26 9d db 01 b1 37 2a ad 68 56 3b b4 aa d1 35 af b0 6f be 40 b3 10 b6 3b ef 57 8f f9 39 f3 a0 0a 6d a9 e7 44 d2 8b a0 70 29 4e 57 46 d2 ca 7b b8 ac 2c 8e 3a 85 5a b4 c9 bc d0 d5 85 5e 11 b5 2c ac aa 2d db 34 c0 a2 6c d0 4f 4b c1 0d e6 dc 15 1d 4e e7 ce d2 c2 b0 de 8d ed 33 ff 11 f3 01 e4 02 75 59 e8 93 8b 69 bc eb 1e 5e 25 9d 41 22 05 6f 6a db d4 8a 06 28 b9 12 f9 0d 72 83 8f 2f 3a af 6b 88 34 2c f5 ba c3 cb 68 8a 9b 0f 7a fc 73 a7 e3 ca d8 e7 12 54 c7 86 ea 00 02 40 ae 7b d1 df 8b cf ca 07 f3 b8 85 72 3a 9d 7f 89 73 64 61 

MAC
7a c8 d9 35 a4 1b f9 54 64 32 36 0e 1c 54 37 08

v2 Messages Structure

Field Size Description Data type Comments
3 length 24 bits Encrypted length of ciphertext payload (not counting the 16 byte MAC tag or the 3 byte encrypted length) in number of bytes
1-13 encrypted message-type variable ASCII message-type (or one byte message-type-ID)
? encrypted payload ? The actual data
16 MAC tag ? 128bit MAC-tag

The v2 message structure does not need the 4byte network magic. The HKDF key derivation process embeds the network magic bytes into the salt for the extraction step. Using a different network identifier in that step will result in different derived keys per network.

The maximum message size is 2^24 (16’777’216) bytes. Future communication MAY exceed this limit and thus MUST be split into different messages.

The 4 byte sha256 checksum in v1 messages is no longer required because the MAC tag provides integrity and authentication.

The message-type field MUST start with a byte that defines the length of the ASCII message-type string up to 12 chars (1 to 12) or a message-type-ID (see below).

Message-Type-ID

To save valuable bandwidth, the v2 message format supports message-type-IDs. The ID/string mapping is a peer to peer arrangement and MAY be negotiated between the initiating and responding peer. A peer conforming to this proposal MUST support message-type-IDs based on the table below and SHOULD use message-type-IDs for outgoing messages.

Number Message Type
13 ADDR
14 BLOCK
15 BLOCKTXN
16 CMPCTBLOCK
17 FEEFILTER
18 FILTERADD
19 FILTERCLEAR
20 FILTERLOAD
21 GETADDR
22 GETBLOCKS
23 GETBLOCKTXN
24 GETDATA
25 GETHEADERS
26 HEADERS
27 INV
28 MEMPOOL
29 MERKLEBLOCK
30 NOTFOUND
31 PING
32 PONG
33 REJECT
34 SENDCMPCT
35 SENDHEADERS
36 TX
37 VERACK
38 VERSION
39 GETCFILTERS
40 CFILTER
41 GETCFHEADERS
42 CFHEADERS
43 GETCFCHECKPT
44 CFCHECKPT
45 WTXIDRELAY
46 ADDRV2
47 SENDADDRV2

Length comparisons between v1 and v2 messages

v1 inv: 4(Magic)+12(Message-Type)+4(MessageSize)+4(Checksum)+37(Payload) == 61
v2 inv: 3(MessageSize)+1(Message-Type)+37(Payload)+16(MAC) == 57
(93.44%)

v1 ping: 4(Magic)+12(Message-Type)+4(MessageSize)+4(Checksum)+8(Payload) == 32
v2 ping: 3(MessageSize)+1(Message-Type)+8(Payload)+16(MAC) == 28
(87.5%)

v1 block: 4(Magic)+12(Message-Type)+4(MessageSize)+4(Checksum)+1’048’576(Payload) = 1’048’600
v2 block: 3(MessageSize)+1(Message-Type)+1’048’576(Payload)+16(MAC) = 1’048’596
(99.9996%)

Test Vectors

TODO: Update the test vectors after swapping k1/k2 in code

message   verack
k1 (DATA) 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
k2 (AAD)  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
sequence: 0

Message before encryption
01 00 00 25

Ciphertext (3byte AD, 1byte message-type, 16 bytes MAC)
77 b8 e0 53 14 05 09 d3 48 60 7a 07 58 00 77 44 be 48 21 ef
message   PING (nonce=123456)
k1 (DATA) 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
k2 (AAD)  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
sequence: 0

Message before encryption (3 bytes packet length, 1byte message-type-ID, 4 byte message payload [ping nonce])
05 00 00 1f 40 e2 01 00

Ciphertext (3byte AD, 1byte message-type-id, 4 byte message payload [ping nonce], 16 bytes MAC)
73 b8 e0 69 f8 02 ac a0 19 62 2f 1a 1d d3 d3 be de 22 2e d9 ff 17 51 40

Risks

The encryption does not include an authentication scheme. This BIP does not cover a proposal to avoid MITM attacks during the encryption initialization. However, peers MUST show the session id to the user on request which allows to identify a MITM by a manual verification on a secure channel.

Optional authentication schemes may be covered by other proposals [3].

An attacker could delay or halt v2 protocol enforcement by providing a reasonable amount of peers not supporting the v2 protocol.

Compatibility

This proposal is backward compatible (as long as not enforced). Non-supporting peers can still use unencrypted communications.

Reference implementation

References

  1. ^ Hijacking Bitcoin: Routing Attacks on Cryptocurrencies - M. Apostolaki, A. Zohar, L.Vanbever
  2. ^ Encrypting traffic between peers is already possible with VPN, tor, I2P, stunnel, curveCP or any other encryption mechanism on a networking layer below the application, however, most of those solutions require significant technical experience in setting up a secure channel and are therefore not widely deployed.
  3. a b BIP150
  4. ^ RFC 2119
  5. ^ This key-exchange is based on the discrete log problem and thus not sufficiently strong against known forms of possible quantum computer algorithms. Adding an additional quantum resistant key exchange like NewHope is possible but out of scope for this proposal.
  6. ^ HKDF (RFC 5869)
  7. ^ ChaCha20
  8. ^ Poly1305

Acknowledgments

  • Pieter Wuille and Gregory Maxwell for most of the ideas in this BIP.
  • Tim Ruffing for the review and the hint for the enhancement of the symmetric key derivation.
  • Jonathan Cross for re-wording and grammar corrections

Copyright

This work is placed in the public domain.

@gmaxwell
Copy link

The situation for 6979 isn't the same-- a big part of the point of it is so you can compare/audit implementations. So there is value in using the 'standard' construct even though it has horrific performance, and the horrific performance of RFC6979 is because they wanted to define something that was already a FIPS certified random bit generator. :)

Sorry, I don't understand that question. What kind of parallelism are you referring to?

like your AVX512 letting you do 4 chacha20 permutations concurrently. Generally the sales pitch for stream ciphers is that they're embarrassingly parallel, so you can get arbitrary speedups via hardware or SIMD by just doing a bunch of them in parallel.

there's no point in rekeying the MAC except for after huge amounts of data as in the current draft. There's nothing like forward integrity.

Well if the same cipher stream is used to originate both the lengths and the mac keys you'd want to rekey it for length confidentiality.
I agree that there isn't much reason for rekeying the macs... in fact, some protocols have wanted the opposite of forward integrity: things like OTR leak the mac keys after messages are accepted to make it easier to forge transcripts. (not a concern for this protocol, but just mentioning it)

What you mention (repeating keystreams) is an issue if nonces are repeated with the same key, because then the keystream repeats.. AFAIU all of this is relevant only close to the birthday bound. If you have a permutation such as the ChaCha20 block function and use it in counter mode, then yes, it will repeat only after 2^noncebits.

I agree, but I just wouldn't presume that there isn't some security proof that cares about the difference between the expected time until repeat vs the worst case.

Wait, what is leakage resistance? I thought this is about forward secrecy. Are are you talking about different attacks? Just checking if we have the same thing in mind.

Sorry, :) I think I've seen the same property described as leakage resistance or backtracking resistance in CSPRNGs. I'm not sure that it's best to call this forward secrecy (which is what it is) because that term will be reliably confused with stuff like connection level PFS ciphersuites (I think it's kind of absurd that all protocols aren't PFS normally at least at a "session" level...) :)

@sipa
Copy link

sipa commented Oct 22, 2020

So specifically for ChaCha20 the idea of using key stream output as new key is pretty appealing, as ChaCha20 has zero key setup cost. This would be very different for AES based constructions, which have a relatively high preprocessing step to expand the key first. So rekeying I think it just literally generating 32 bytes extra ChaCha output from some previous stream (or a small multiple of that, because of the separate length/data streams).

If we're worried about the risk of ending up in a state that loops, I think there is an easy solution: don't reset the message counter when switching to a new key?

The big question is how frequently to do it. If there are no cryptographic arguments for doing based on the amount of data sent, and we're only changing keys to provide forward secrecy, I feel that the best criterion is probably something time-based. I don't think we care about forward secrecy better than a 1-second resolution, and rekeying once per second, in both directions, for 2 key streams, for 1000 concurrent connections, would still only translate to an additional 0.02% of CPU load (on my i7-7820HQ).

I'm not actually advocating for rekeying every second, just saying that even at that short interval, it's likely negligible from a performance standpoint.

@gmaxwell
Copy link

The nice thing about doing it byte based (really, permutation invocation count based) is it could all be pipelined and you don't need any signalling overhead. It also avoids questions like what do you do with a non-conforming counterparty that never rekeys. I totally agree that the timeframes you're discussing are fine ... I also though an hours timeframe was fine for that matter. :)

With byte-based you could define the function as generating 64 blocks of output (4096 bytes), you steal the first or last 32 bytes to key the next group, and output 4064 bytes of random key-stream. Being able to do 64 blocks in parallel even without pipelining should be plenty for any future simd or hardware support... and would achieve 1 second rekeying for 32kbps streams. (I'm not too fixed on 4096-- it just seemed like a reasonable approximation of 'big' for this purpose).

The slowdown is just 4064 vs 4096, 0.78% of time.

If the same construct were used on the stream that generated length encryption it wouldn't have the same forward secrecy speed (and would take a lot longer if messages are large) but I don't think that is an issue.

The idea of just letting the counter continue also did occur to me, I just didn't want to go too far down the rabbit hole. +1 on that.

@real-or-random
Copy link

real-or-random commented Oct 23, 2020

I just rediscovered https://blog.cr.yp.to/20170723-random.html , which is a long blog post covering exactly this topic (with respect to PRGs, not to encryption itself, where this seems not common for whatever reason.)

The nice thing about doing it byte based (really, permutation invocation count based) is it could all be pipelined and you don't need any signalling overhead. It also avoids questions like what do you do with a non-conforming counterparty that never rekeys. I totally agree that the timeframes you're discussing are fine ... I also though an hours timeframe was fine for that matter. :)

I really don't know enough about the P2P protocol to have an idea about this: Is it likely that peers have connections where they don't send enough data for hours or days so that rekeying would not occur? I don't claim that this is an issue. The attacker can at most look ~ 4096 bytes "in the past". But I still wonder.

@sipa
Copy link

sipa commented Oct 23, 2020

I really don't know enough about the P2P protocol to have an idea about this: Is it likely that peers have connections where they don't send enough data for hours or days so that rekeying would not occur? I don't claim that this is an issue. The attacker can at most look ~ 4096 bytes "in the past". But I still wonder.

Assuming a ping/pong message every 20 minutes (the timeout interval for post-BIP31 connections), you'd reach 4096 bytes in just over 24 hours.

@gmaxwell
Copy link

gmaxwell commented Oct 23, 2020

Of course, if your traffic is only ping/pong messages ... that isn't very interesting. If you want faster forgetting, you can ping/pong (or otherwise send padding traffic) at a higher rate. For 4064 bytes a forgetting time of 1 minute requires 541 bits per second.

A smaller interval would probably be fine too-- the widest implementation I can find discussed (for AVX512) is 256 bytes wide... going further for future SIMD hardware is probably a reasonable idea, 64-way is probably a bit overkill it just seemed roughly reasonable to me.

It's also worth mentioning that an implementation can have more forward security than implied by the rekey interval: Just generate the key stream one re-key interval ahead and throw the key stream out as you use it, giving you effectively bit-at-a-time forward security and then the only requirement on the rekey interval is that it's short enough that the memory required for a key-ahead buffer per-peer isn't burdensome.

Edit: Ah, I see the DJB link also suggests the compute ahead and erase approach and actually a pretty similar construction to what we were discussing here.

@jonasschnelli
Copy link
Author

Did a minor update of the BIP. Changes "command" to "message-type" as well as "command short id" to "message-type-ID".
Added the BIP157 message-type-IDs.

@jonasschnelli
Copy link
Author

If we would change the AEAD to byte based (really, permutation invocation count based [read more in the last comments]), ... would not hashing the new key (coming from the key stream of the current chacha20 instance) with the session-ID (a HKDF-hashed output of the ECDH-secret) result in being more prone to chacha20 key extraction attacks?

@sipa
Copy link

sipa commented Nov 14, 2020

So I think there are a number of different reasons why rekeying may be desirable and/or is done in real protocols:

  • The underlying stream cipher may have limitations on how much data can be securely encrypted with the same key. I believe ChaCha20 has no such limitation, except that the counter can't wrap around (within the same key/IV). With a 64-bit counter it won't ever wrap around in practice (needs a zebibyte).
  • Providing forward security (if an attacker gets access to the cipher key, he can't decrypt historical encrypted data he may have intercepted). Any kind of rekeying is sufficient for this, including just using the ChaCha20 output.
  • Providing "backward" security (not sure if that's the right term, but I mean guaranteeing that an attacker who gets access to the stream key at one point in time, can't use that to decrypt all future data in the stream). I'd say there are two variants here:
    • Situations where the attacker has access to the node's internal secret state and gets to read the session key (and other keys), but then loses this access. Dealing with that requires redoing DH, which is annoying, but it's also a weird security model: how did the attacker lose their access. Furthermore, they may have gotten access to your private (wallet) keys as well, or even have overwritten your binary. If we only want this on a sufficiently long timescale, there is a simpler solution: just disconnect and reconnect. OpenSSH can't do this, but for us on a scale of days or more probably isn't a problem.
    • Situations where the attacker manages to extract the key from just observing the ciphertext. I'm not sure that's worth dealing with, as my understanding is that this is effectively a break of ChaCha20. It is indeed resolved by rekeying using a non-exposed session key, but if an attack on ChaCha20 is known, we should probably just switch to a different stream cipher, rather than patching it up with an infrequent session hash.

So concretely, here is what I suggest:

  • We define 4 stream ciphers from the session, 2 in each direction:
    • One for per-message metadata keys (poly1305 authentication keys, length encryption), called M.
    • One for variable-length data keys (payload encryption), called P.
  • Each stream cipher has its own 256-bit key (derived from the session key), a 64-bit IV (which is always 0), and a 64-bit counter (which starts at 0). Each time 4064 bytes have been produced, the next 32 bytes become the stream's next key (while the IV stays 0 and the counter keeps incrementing; it does not reset). This is transparent, and the output of the stream is just a byte stream, but the counter goes up by 64 every 4064 bytes instead of every 4096 bytes.
    • Maybe the IV can be derived from the session key too, effectively turning it into a 320-bit key. Won't add much, won't hurt.
  • Every packet draws 35 bytes from M (32 bytes for poly1305, 3 bytes for length encryption), and N bytes from the P stream (for payload encryption). Everything is byte-aligned; there is no 21 lengths per counter increment or so; just use all output bytes.
  • Add a recommendation to not keep connections open more than a week perhaps.
  • Byte-level forward security is possible by precomputing 4096 bytes of stream output, caching it, resetting the key to the final 32 bytes of the output, and then wiping the remaining 4064 bytes of cached data as it gets used.

@LLFourn
Copy link

LLFourn commented Nov 15, 2020

The nice thing about doing it byte based (really, permutation invocation count based) is it could all be pipelined and you don't need any signalling overhead. It also avoids questions like what do you do with a non-conforming counterparty that never rekeys. I totally agree that the timeframes you're discussing are fine ... I also though an hours timeframe was fine for that matter. :)

I can't see how you could do time frame rekeying without magically synchronized clocks. You either have to do byte based or message based re-keying (or explicit rekeying as originally suggested in this proposal).

We define 4 stream ciphers from the session, 2 in each direction:

  • One for per-message metadata keys (poly1305 authentication keys, length encryption), called M.
  • One for variable-length data keys (payload encryption), called P.

I like this design. I was originally skeptical of it since the motivation seemed to be that it added some kind of security. Having two black box streams and just pulling data out for each purpose is a nice idea.

Each stream cipher has its own 256-bit key (derived from the session key), a 64-bit IV (which is always 0), and a 64-bit counter (which starts at 0). Each time 4064 bytes have been produced, the next 32 bytes become the stream's next key (while the IV stays 0 and the counter keeps incrementing; it does not reset). This is transparent, and the output of the stream is just a byte stream, but the counter goes up by 64 every 4064 bytes instead of every 4096 bytes.

Alternatively you could just increment the IV and reset the block counter after each 4096 byte chunk. This is aesthetically pleasing to me because then you could just make the next key always be the first 32 bytes of block=0 rather than the last 32 of a 4096 byte boundary. This is also what noise does:

Note that rekey only updates the cipherstate's k value, it doesn't reset the cipherstate's n value, so applications performing rekey must still perform a new handshake if sending 2^64 or more transport messages.

Where n referes to the nonce/IV. Though noise is talking about message based re-keying.

@real-or-random
Copy link

I like this design. I was originally skeptical of it since the motivation seemed to be that it added some kind of security. Having two black box streams and just pulling data out for each purpose is a nice idea.

Indeed, it appears simple to reason about.

@sipa's suggestion sounds fine. We should make sure that our use of ChaCha20 can be done with a "standard API" which allows setting the key, the IV and the counter. But I think this is the case, as long as we reset the first 128 bits of the state to the fixed init string.

@LLFourn
Copy link

LLFourn commented Nov 16, 2020

We should make sure that our use of ChaCha20 can be done with a "standard API" which allows setting the key, the IV and the counter

ChaCha APIs do not always let you set the counter I think because stream ciphers per se don't have "blocks" e.g. the rust-crypto API looks like: fn new(key, nonce). It starts at block 0 and and increments the counter internally on demand. There is another API that lets you seek to a certain byte position in the stream which would make it at least possible to do @sipa's suggestion.
I think just incrementing the nonce for each new chunk and pulling out the first 4096 bytes from the more generic StreamCipher API would be a bit simpler though.

@real-or-random
Copy link

I think just incrementing the nonce for each new chunk and pulling out the first 4096 bytes from the more generic StreamCipher API would be a bit simpler though.

Since everything else is equal, this is indeed better if you have an API that doesn't let you control the counter. And it still works with both the original and the IETF variant.

@sipa
Copy link

sipa commented Nov 19, 2020

Alternatively you could just increment the IV and reset the block counter after each 4096 byte chunk. This is aesthetically pleasing to me because then you could just make the next key always be the first 32 bytes of block=0 rather than the last 32 of a 4096 byte boundary. This is also what noise does:

I like the idea of resetting the counter but incrementing IV every 4064(4096) bytes. That makes the output of the rekeying stream cipher just a concatenation of 4064 bytes out of 4096-byte consecutive messages for the underlying stream cipher. (technically I don't think there is a reason to even increment the IV, as updating the key also has a progress effect - but by incrementing the IV, we guarantee that even if the key ends up in a cycle, the stream output doesn't).

It's orthogonal to the question of using the first or last 32 bytes, though. I used the last 32 bytes because it means a very-low-memory or very-low-latency implementation is possible that doesn't precompute, and doesn't need to remember the future key or compute a block twice.

@jonasschnelli
Copy link
Author

@sipa:

Every packet draws 35 bytes from M (32 bytes for poly1305, 3 bytes for length encryption), and N bytes from the P stream (for payload encryption). Everything is byte-aligned; there is no 21 lengths per counter increment or so; just use all output bytes.

Is there a reason why you switched the stream cipher instance for the poly1305 key?
ChaCha20-Poly1305@openssh as well as this current proposal take the poly1305 key from the payload cipher instance (and not from the length encryption cipher instance). I don't know it if makes a difference but was wondering if you had a reason for that change.

@gmaxwell
Copy link

Poly and length use a fixed number of bits per messages sent while payload is variable, so changing lengths can't cause either set of bits to get misinterpreted as the wrong kind. Keeping them separate I think is much more obvious in its protection against decryption failure from acting as an oracle for the other key-stream. E.g. corrupting the length of the prior packet won't cause the receiver to use a different but highly related poly1305 key for this packet.

I think it also has a side benefit of making the usage of the two streams more equal which would make the secrecy duration more uniform in the case where the implementation isn't computing the streams ahead and forgetting the initial values.

@jonasschnelli
Copy link
Author

jonasschnelli commented Jan 19, 2021

I have changed this document (including the test vectors) to adapt the internal 4096byte rekeying rule. Thanks for an additional review.
Implementation of that new AEAD is here: bitcoin/bitcoin#20962

@sipa
Copy link

sipa commented Jan 20, 2021

@jonasschnelli Cool, happy to see progress here.

As for explaining the protocol, perhaps it's clearer if it's described as a number of layers? The current flat description makes it a bit hard to see the big picture.

Layers I can think of:

  • Existing cryptographic primitives used:
    • The ChaCha20 PRF: a function from (in our usage) 256-bit key, 64-bit IV to a byte sequence (up to 2^70 bytes)
    • The Poly1305 MAC: a function from a single-use 256-bit key and a variable-length message to a 128-bit authenticator
  • The ChaCha20/forward4064 PRF: a function taking a 256-bit key and producing a byte sequence:
    • This is constructed by starting with k0=key and concatenating:
    • c0=ChaCha20(key=k0, IV=0)[0:4096]; call k1=c0[4064:4096]; output c0[0:4064]
    • c1=ChaCha20(key=k1, IV=1)[0:4096]; call k2=c1[4064:4096]; output c1[0:4064]
    • ...
  • The bitcoin cipher stream, taking as input two 256-bit keys k1 and k2 up front, and then a sequence of variable-length messages that are converted into a byte stream.
    • At initialization time, instantiate two ChaCha20/forward4064 PRFs called F (fixed, with key k1) and V (variable, with key k2).
    • To encode, for every message M with length L:
      • Read 3 bytes from F, which are XOR'ed with the 24-bit LE encoding of L, and output.
      • Read 32 bytes from F, which are used to instantiate a Poly1305 MAC P with that key
      • Feed the 3 unencrypted (or encrypted?) length bytes into P (this is the additional data part of the AEAD). Anything else? Perhaps the network magic? This part seems unspecified in the current text. We may want this padded to a multiple of 16 bytes, so that the ChaCha20 32-byte blocks align with the Poly1305 blocks.
      • For every byte B in M:
        • Read 1 byte from V, XOR it with B, and output it.
        • Feed the resulting byte (B xor V) into P.
      • Compute the 16-byte authenticator from P and output it.
    • To decode, ...
    • (this isn't actually a AEAD anymore, as there is no discernible per-message encryption/authentication, and things only make sense when described as operating on a sequence of messages. I don't think this is a problem, but it may be worth explaining. It probably also explains some of the oddities in the openssh protocol, which still can be described more or less on a per-message basis)
  • The overall protocol which consists of:
    • ECDH negotiation
    • Instantiating two cipher streams (one in each direction), using the two derived HKDF-derived keys from the ECDH output
    • Switching over to the application protocol, with a particular packet layout (each packet is command, either 1 byte or spelled out, plus payload), turning every packet into one

I think using this kind of structure makes it easier to explain the various design decisions and rationale for every step.

@LLFourn
Copy link

LLFourn commented Feb 1, 2021

It's exciting to see this new design moving along. Thanks @jonasschnelli.

I was wondering what people thought of the idea of using the ECDH + transport encryption described here for applications other than Bitcoin p2p e.g. Lightning or other Bitcoin L2 protocols. We can easily anticipate many layer 2 protocols will need an encrypted transport. The DLC group is already in need of a proposal. Although it would be possible to re-use some parts of BOLT-8 the 2-byte max frame length that noise enforces means they cannot map authenticated frames to messages as they do in lightning (DLCs require quite large messages). I've also heard this is also a bit of a nuisance for LN developers in sending large routing messages (but I don't know the details and would happy to be corrected).

The only difficulty is that these protocols will often need authentication against static public keys e.g. Lightning. Also I guess Bitcoin p2p may want authentication at some point in the future? I am not suggesting we include how authentication should work here but I was wondering what others thought it could look like. It would be nice if the spec here could be modular so the line between Bitcoin p2p and the key exchange + transport message encryption bit were very clear.

@gmaxwell
Copy link

gmaxwell commented Feb 1, 2021

The way auth is accomplished has already been covered in other old drafts, you authenticate a hash of the session key inside the encrypted session.

@sipa and I had intended to propose a fairly novel ZKP that, in the context of bitcoin p2p like usage, would prevent an attacker from knowing if authentication was in use at all-- greatly increasing the risk of detection for large scale active observers. This avoids the problem in more traditional protocols where an evesdropper just drop the connection and let the connection re-establish without interception when they detect authentication is in use. ( https://gist.github.com/sipa/d7dcaae0419f10e5be0270fada84c20b .. I think thats the latest update on that? )

The nice thing about authenticating an encrypted session is that it fits nicely with doing whatever adaptive application supported and potentially user interactive stuff you might want to do. The down side is that it may require an additional protocol round trip-- totally irrelevant for long lived connections like in Bitcoin, but I presume that is a reason that separation of concerns isn't more common in established standards.

@gmaxwell
Copy link

gmaxwell commented Feb 2, 2021

@LLFourn out of curiosity, are you aware of any protocol that works like this: An authentication scheme like poly1305 produces a 256-bit authentication tag and 96 bits of it are emitted in the message, but then the remaining 160 bits are fed in as the initial state of the authenticator for the next message. As a result even though the probability of accepting a false single message is the kind-of-high 1:2^96 the probability of not disconnecting at a subsiquent message becomes extremely low extremely fast.

@sipa
Copy link

sipa commented Feb 2, 2021

@gmaxwell Poly1305 only has 128-bit auth tags, so it isn't that much of a difference.

@LLFourn Yeah, this proposal grew out of an earlier one that added both encryption and (optional, obviously) authentication. The encryption part became this document, and the authentication part turned into the optional authentication scheme that @gmaxwell came up with originally and linked to. Have a look, it's pretty cool. Once we're a bit further here I'll try to get working on that again.

@jonasschnelli That brings me to this: I remember having discussions with @gmaxwell and @real-or-random about an ability to request protocol upgrades before the application-level stuff starts. This could be used for e.g. augmenting the protocol (say, a PQC key negotiation) but may also allow doing authentication early instead of inside the application layer. I don't remember the details, but for now it may suffice to just say: the side that is creating the connection (outbound) must start by sending an (properly encrypted, ...) empty packet (or some specific thing) to indicate "we're done negotiating". Anything else can then be reserved for future updates. Thoughts?

@LLFourn
Copy link

LLFourn commented Feb 9, 2021

@gmaxwell

The way auth is accomplished has already been covered in other old drafts, you authenticate a hash of the session key inside the encrypted session.

Seems reasonable. Although, this does leak some information over what noise does. Receiving the signature allows you to convince a third party that you (or someone) did indeed establish a connection with the party.

There is a cool trick I found reading Damgard's notes on witness indistinguishable sigma protocols [1]. Instead of proving I know secret key for X, I can prove that I know the secret key for X OR the DH shared secret. Since the verifier knows I don't know the secret key for the shared secret the verifier is convinced of the same thing (i.e. I know the secret key for X). However they cannot re-use the proof to convince anyone else because no one else could be convinced that the DH key is unknown to them.

I am not sure that this is a practical issue though.

@sipa and I had intended to propose a fairly novel ZKP that, in the context of bitcoin p2p like usage, would prevent an attacker from knowing if authentication was in use at all-- greatly increasing the risk of detection for large scale active observers. This avoids the problem in more traditional protocols where an evesdropper just drop the connection and let the connection re-establish without interception when they detect authentication is in use. ( https://gist.github.com/sipa/d7dcaae0419f10e5be0270fada84c20b .. I think thats the latest update on that? )

I think I had seen this briefly -- it looks cool. Having authenticated connections covertly probe for interception is a nice idea esp in the case of Bitcoin p2p. I wonder if it achieves the property I mentioned above against a malicious verifier just because of its structure.

The nice thing about authenticating an encrypted session is that it fits nicely with doing whatever adaptive application supported and potentially user interactive stuff you might want to do. The down side is that it may require an additional protocol round trip-- totally irrelevant for long lived connections like in Bitcoin, but I presume that is a reason that separation of concerns isn't more common in established standards.

This is what I was thinking too. Looking at what is done in the noise XK [2] that lightning uses I don't think there actually would be an extra round of communication doing in session auth with a simple signature (or WI proof).

@LLFourn out of curiosity, are you aware of any protocol that works like this: An authentication scheme like poly1305 produces a 256-bit authentication tag and 96 bits of it are emitted in the message, but then the remaining 160 bits are fed in as the initial state of the authenticator for the next message.

I haven't seen anything like this. I haven't studied how Poly1305 works at all so I have no idea what's gonna happen when you use it in a non-black-box way :)

[1] https://www.cs.au.dk/~ivan/Sigma.pdf (end of Section 7).
[2] http://www.noiseprotocol.org/noise.html#handshake-patterns

@jonasschnelli
Copy link
Author

@dhruv is taking over this BIP (with my help [and others] in the background).
I just updated this gist to @dhruv latest version (https://gist.github.com/dhruv/5b1275751bc98f3b64bcafce7876b489).

@LLFourn
Copy link

LLFourn commented Aug 18, 2021

I made the decision to post my comments over on the @dhruv's gist since I guess that's where further updates will go for now?

@michaelfolkson
Copy link

@dhruv's gist is currently a fork of this one. Agreed, it would be nice to know which one is the intended latest draft of the BIP from this point onwards.

@dhruv
Copy link

dhruv commented Aug 23, 2021

Yes, let's communicate on my fork for now and treat that as the latest draft.

There's another feature in development. Once we have that in the draft, I'll move the conversation over to the mailing list and create a PR into the bips repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment