Using LEB128 encoding scheme for custom abi/interface.

ERC-138128: Custom function selector and calldata encoding scheme declaration standard

A contract SHOULD use ERC-165 standard to declare its support for ERC-138128 standard.
A contract supporting ERC-138128 MUST have these two view functions: customSelectors() and customEncodings().

`customSelectors()`

This function is used to declare alternative function selectors for any number of standard function signatures. This function MUST have the following interface.

function customSelectors() external view returns (bytes4[] standardSelectors, bytes[] alternativeSelectors);

alternativeSelectors.length MUST equal standardSelectors.length.
For all n, alternativeSelectors[n] MUST be a selector to the same function as standardSelectors[n].
An alternativeSelector MUST NOT clash with any other alternative selector or any selector from standard interface. If a signature matches a selector from its start (e.g.: ab matches abc), it means it is clashing.

`customEncodings()`

This function is used to define which encoding scheme is used in each function selector to decode the arguments in calldata. The standard abi encoded functions are excluded. This function MUST have the following interface.

function customEncodings() external view returns (bytes[] selectors, string[] schemeIds);

Currently, there is only leb128-nooffset encoding scheme designed for this EIP. Draft version of leb128-nooffset can be accessed here: TBA. There is no standardized method to register new encoding schemes. That system can be developed later.

`leb128-nooffset`: LEB128 encoding scheme for ERC-138128

This encoding scheme shall have the identifier leb128-nooffset, which can be used in ERC-138128's customEncodings() function.

The first argument in calldata MUST start at the next byte of the function selector. For example, if custom function selector defined in ERC-138128 is 0x0001, the first argument will start from the 3rd byte of calldata, since first 2 bytes were used as function selector.

All arguments are back to back. There is no padding. There is no offset.

Arguments are encoded as

uint* & address: Unsigned LEB128 encoded.
int*: Signed LEB128 encoded
bytes*: bytes themselves
bytes & string: (Length as unsigned LEB128 || bytes themselves)
bool & enum: single byte

If a LEB128 decoded value's size is greater than the type's size and since LEB128 encoded value size can only be 7n bits, often the sizes will be greater than the type's size, in which case, truncation is performed. If decoded value's size is less than type's size, then zero left padding is added.

Not defining offsets of arguments makes this encoding scheme mostly unusable within the code. The idea that this should be decoded and encoded to standard abi as the first thing in the function. This might be costly for long and complex arguments.

TODO: Much more to add here to properly standardize this.

Examples

ERC20 supporting ERC-138128

An ERC20 token supporting ERC-138128 to optimize its transfer and approve calls might look like this.

// SPDX-License-Identifier: MIT
pragma solidity ^0.8.20;

import {ERC20} from "@openzeppelin/contracts/token/ERC20/ERC20.sol";
import {LibLEB128} from "TBA";

contract MyToken is ERC20("MyToken", "MTK") {
    function customSelectors() external pure returns (bytes4[] memory, bytes[] memory) {
        bytes4[] memory standardSelectors = new bytes4[](2);
        standardSelectors[0] = 0x095ea7b3; // `approve(address,uint256)`
        standardSelectors[0] = 0xa9059cbb; // `transfer(address,uint256)`
        
        bytes[] memory alternativeSelectors = new bytes[](2);
        alternativeSelectors[0] = hex"00"; // `approve`: `0x00`
        alternativeSelectors[1] = hex"01"; // `transfer`: `0x01`

        return (standardSelectors, alternativeSelectors);
    }
    
    function customEncodings() external pure returns (bytes[] memory, string[] memory) {
        bytes[] memory selectors = new bytes[](2);
        selectors[0] = hex"00"; // `approve`: `0x00`
        selectors[1] = hex"01"; // `transfer`: `0x01`

        string[] memory schemeIds = new string[](2);
        schemeIds[0] = "leb128-nooffset";
        schemeIds[1] = "leb128-nooffset";

        return (selectors, schemeIds);
    }

    fallback(bytes calldata data) external {
        // The values to decode, both for `approve` and `transfer`. And calldata ptr.
        address to; uint256 amount; uint256 ptr = 1;

        // Decode using LibLEB128.
        (to, ptr) = data[ptr:].decodeAddress();
        (amount,) = data[ptr:].decodeUint();

        // In our testing suite we verify that there is no default function signatures that start
        // with `0x00` or `0x01`. Therefore, we can use these identifiers as alternative selectors
        // for `approve` and `transfer`, respectively.
        uint256 firstByte; assembly "memory-safe" { firstByte := shr(248, calldataload(0)) }
        if (firstByte == 0) approve(to, amount);            
        else if (firstByte == 1) transfer(to, amount);
        else revert();
    }
}

The above example show

0x00 and 0x01 can be used as alternative method ids for approve and transfer, respectively, and
0x00 and 0x01 use the custom leb128-nooffset encoding scheme, and not the standard ABI encoding scheme.

Now, we have two ways for both approving and transferring. For example, if we wanted to transfer 420*10**18 tokens to 0xdead, we could use either

standard calldata: 0x095ea7b3000000000000000000000000000000000000000000000000000000000000dead000000000000000000000000000000000000000000000016c4abbebea0100000, or
leb128-nooffset encoded calldata: 0x0103bdad2dc4d5efd7ea80c08080
- Selector: 0x01
- to address: 0x03bdad
- value uint: 0x2dc4d5efd7ea80c08080

Complex `leb128-nooffset` decoding example

We will encode g(uint256[][],string[]) with values ([[1, 2], [3]], ["one", "two", "three"]). In standard calldata encoding this results in a 617 bytes behemoth. You can find how to encode this using standard ABI in Solidity documentation.

Assuming this function is accessible through custom selector 0x00, in leb128-nooffset, we can encode the calldata in just 22 bytes: 0x0002020102010303036f6e650374776f057468726565

0x
00         - custom function selector
02         - LEB128 encoded arg0.length
02         - LEB128 encoded arg0[0].length
01         - LEB128 encoded arg0[0][0]
02         - LEB128 encoded arg0[0][1]
01         - LEB128 encoded arg0[1].length
03         - LEB128 encoded arg0[1][0]
03         - LEB128 encoded arg1.length
03         - LEB128 encoded arg1[0].length
6f6e65     - arg1[0]
03         - LEB128 encoded arg1[1].length
74776f     - arg1[1]
05         - LEB128 encoded arg1[2].length
7468726565 - arg1[2]

That is ~3.6% the size of the standard encoded calldata.

UX

Needs user interfaces supporting this. Given the gas savings on L2s, it might pick up. UI would just need a simple plugin, UI devs wouldn't think about it. When intending to do an ERC20.approve, the UI would automatically check ERC138128 support through ERC165 or other standard methods. If it finds alternative selector, it checks its encoding, and if it knows the the custom encoding standard, it automatically converts the standard encoding to that and prompts signing that. Until wallets add support, users would see the un-decoded calldata and would have hard time knowing what they are signing.

stonegao/EIP138128.md

ERC-138128: Custom function selector and calldata encoding scheme declaration standard

customSelectors()

customEncodings()

leb128-nooffset: LEB128 encoding scheme for ERC-138128