EIP-7928 BAL Size Reduction/Optimization Suggestions

User

Ok, seems this dramatic change might not be worth the savings, but thanks a lot for the try. Let's go back to the initial code: after all your experiments with re-implementing, size comparisons and the like? Do you have very simple/local/mininmal-change suggestions how we can safe on BAL size here? These can be very local things, small optimizations, small data format adjustments, .... Whatever you can imagine.

Cursor

Good question. Let me look at the actual old-format BAL structure and find concrete, local optimizations with real measured impact.

Now let me do a deep forensic breakdown of exactly what's eating space in the old RLP format:

Good data. Let me now run the most promising ones across all 10 blocks:

Now let me also check a few more micro-optimizations to be thorough:

Excellent data. Now I have a clear picture. Here's the analysis:

Local/Minimal Optimizations for BAL Size

Where the bytes actually go (block 24476000, 65 KiB)

Component	Raw payload	RLP overhead	Total est.	% of BAL
Addresses (670 * 20B)	13,400	~1,700	~15,100	23%
Storage change slots + values	19,124	~6,500	~23,400	36%
Storage reads (slot keys)	14,422	~1,200	~15,600	24%
Balance changes (741 entries)	5,264 + 1,181 idx	~2,200	~8,700	13%
Nonce changes (274 entries)	365 + 421 idx	~1,000	~1,800	3%
Access-only overhead (5 empty lists * 90)	450	-	450	1%

The three local wins (ranked by impact)

1. Final-only storage values: flatten [[idx, val], ...] to just val (~7% savings)

Currently each storage slot change stores:

[slot_key, [[blockAccessIdx_1, postValue_1], [blockAccessIdx_2, postValue_2], ...]]

70 out of 481 changed slots have multi-write entries (up to 12 writes per slot). Only the final value matters for state reconstruction. Flattened:

[slot_key, finalValue]

This eliminates intermediate values AND one level of RLP list nesting (which is a significant overhead reducer given 481 total slots).

2. Final-only balance/nonce: drop per-tx BlockAccessIndex (~8% savings)

The killer data point: the coinbase has 273 balance entries (one per transaction). That's a single address contributing ~2.5 KiB of redundant intermediate balance values. 11 accounts in total have >1 balance entry. Currently:

balanceChanges: [[idx_1, postBalance_1], [idx_2, postBalance_2], ...]

Simplified to just the final value:

balance: [finalBalance]    // optional list: present = changed, empty = unchanged
nonce:   [finalNonce]      // same

3. Compact access-only addresses (~1% savings, but a nice cleanup)

90 addresses are access-only but currently encoded as [addr, [], [], [], [], []] (27 bytes each). Splitting into a separate flat address list saves 5 bytes per address (~450B total). Not huge, but it also cleans up the structure conceptually.

Combined impact across all 10 blocks

Block	Baseline	Opt 2 (bal/nonce)	Opt 3 (all final)	Savings
24476000	65,205	59,814 (-8.3%)	55,692	-14.6%
24476001	104,042	95,892 (-7.8%)	88,036	-15.4%
24476002	60,081	54,683 (-9.0%)	51,425	-14.4%
24476003	63,650	58,636 (-7.9%)	53,068	-16.6%
24476004	86,089	80,154 (-6.9%)	74,068	-14.0%
24476005	27,805	25,799 (-7.2%)	23,537	-15.4%
24476006	173,672	166,019 (-4.4%)	152,337	-12.3%
24476007	8,198	7,541 (-8.0%)	7,365	-10.2%
24476008	110,080	98,952 (-10.1%)	92,404	-16.1%
24476009	167,880	160,643 (-4.3%)	144,470	-13.9%

Consistent ~15% reduction, with zero structural change to the BAL -- just flattening the per-tx nesting to final values.

What didn't help (and why)

Merging storageReads into storageChanges: Actually increases size by ~1% because storageReads are currently bare slot keys (no value), and merging them into the change structure adds value-field overhead (even if empty).
Indexed/deduplicated address encoding: Addresses only appear once per account anyway, so there's nothing to deduplicate in the old format.
Dropping precompiles: ~7 addresses * 27B = ~189B. Noise level.

The trade-off

All three optimizations trade away per-tx change attribution (knowing which transaction caused each intermediate state change). The BlockAccessIndex and per-tx storage values would be gone. You keep:

Which addresses/slots were accessed (unchanged)
The final post-block value for each changed field (sufficient for state reconstruction)

You lose: the ability to replay intermediate states from the BAL alone. If a consumer needs that, they'd need to re-execute transactions. For parallel execution hinting and state reconstruction (the two primary BAL use cases), the final values suffice.

holgerd77/EIP-7928_BAL_Size_Reduction_Optimization_suggestions.md

Select an option

No results found