Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Select an option

  • Save holgerd77/fb4a5487a41f115b61c0d50356b63f2f to your computer and use it in GitHub Desktop.

Select an option

Save holgerd77/fb4a5487a41f115b61c0d50356b63f2f to your computer and use it in GitHub Desktop.
EIP-7928 BAL Size Reduction/Optimization Suggestions

User

Ok, seems this dramatic change might not be worth the savings, but thanks a lot for the try. Let's go back to the initial code: after all your experiments with re-implementing, size comparisons and the like? Do you have very simple/local/mininmal-change suggestions how we can safe on BAL size here? These can be very local things, small optimizations, small data format adjustments, .... Whatever you can imagine.


Cursor

Good question. Let me look at the actual old-format BAL structure and find concrete, local optimizations with real measured impact.

Now let me do a deep forensic breakdown of exactly what's eating space in the old RLP format:

Good data. Let me now run the most promising ones across all 10 blocks:

Now let me also check a few more micro-optimizations to be thorough:

Excellent data. Now I have a clear picture. Here's the analysis:


Local/Minimal Optimizations for BAL Size

Where the bytes actually go (block 24476000, 65 KiB)

Component Raw payload RLP overhead Total est. % of BAL
Addresses (670 * 20B) 13,400 ~1,700 ~15,100 23%
Storage change slots + values 19,124 ~6,500 ~23,400 36%
Storage reads (slot keys) 14,422 ~1,200 ~15,600 24%
Balance changes (741 entries) 5,264 + 1,181 idx ~2,200 ~8,700 13%
Nonce changes (274 entries) 365 + 421 idx ~1,000 ~1,800 3%
Access-only overhead (5 empty lists * 90) 450 - 450 1%

The three local wins (ranked by impact)

1. Final-only storage values: flatten [[idx, val], ...] to just val (~7% savings)

Currently each storage slot change stores:

[slot_key, [[blockAccessIdx_1, postValue_1], [blockAccessIdx_2, postValue_2], ...]]

70 out of 481 changed slots have multi-write entries (up to 12 writes per slot). Only the final value matters for state reconstruction. Flattened:

[slot_key, finalValue]

This eliminates intermediate values AND one level of RLP list nesting (which is a significant overhead reducer given 481 total slots).

2. Final-only balance/nonce: drop per-tx BlockAccessIndex (~8% savings)

The killer data point: the coinbase has 273 balance entries (one per transaction). That's a single address contributing ~2.5 KiB of redundant intermediate balance values. 11 accounts in total have >1 balance entry. Currently:

balanceChanges: [[idx_1, postBalance_1], [idx_2, postBalance_2], ...]

Simplified to just the final value:

balance: [finalBalance]    // optional list: present = changed, empty = unchanged
nonce:   [finalNonce]      // same

3. Compact access-only addresses (~1% savings, but a nice cleanup)

90 addresses are access-only but currently encoded as [addr, [], [], [], [], []] (27 bytes each). Splitting into a separate flat address list saves 5 bytes per address (~450B total). Not huge, but it also cleans up the structure conceptually.

Combined impact across all 10 blocks

Block Baseline Opt 2 (bal/nonce) Opt 3 (all final) Savings
24476000 65,205 59,814 (-8.3%) 55,692 -14.6%
24476001 104,042 95,892 (-7.8%) 88,036 -15.4%
24476002 60,081 54,683 (-9.0%) 51,425 -14.4%
24476003 63,650 58,636 (-7.9%) 53,068 -16.6%
24476004 86,089 80,154 (-6.9%) 74,068 -14.0%
24476005 27,805 25,799 (-7.2%) 23,537 -15.4%
24476006 173,672 166,019 (-4.4%) 152,337 -12.3%
24476007 8,198 7,541 (-8.0%) 7,365 -10.2%
24476008 110,080 98,952 (-10.1%) 92,404 -16.1%
24476009 167,880 160,643 (-4.3%) 144,470 -13.9%

Consistent ~15% reduction, with zero structural change to the BAL -- just flattening the per-tx nesting to final values.

What didn't help (and why)

  • Merging storageReads into storageChanges: Actually increases size by ~1% because storageReads are currently bare slot keys (no value), and merging them into the change structure adds value-field overhead (even if empty).
  • Indexed/deduplicated address encoding: Addresses only appear once per account anyway, so there's nothing to deduplicate in the old format.
  • Dropping precompiles: ~7 addresses * 27B = ~189B. Noise level.

The trade-off

All three optimizations trade away per-tx change attribution (knowing which transaction caused each intermediate state change). The BlockAccessIndex and per-tx storage values would be gone. You keep:

  • Which addresses/slots were accessed (unchanged)
  • The final post-block value for each changed field (sufficient for state reconstruction)

You lose: the ability to replay intermediate states from the BAL alone. If a consumer needs that, they'd need to re-execute transactions. For parallel execution hinting and state reconstruction (the two primary BAL use cases), the final values suffice.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment