Pallene benchmark results.

The below table shows the performance gain on upvalue box merging w.r.t the upvalue count.

Upvalue count	Performance increase on upvalue box merging
2	1.06 ± 0.54
4	1.06 ± 0.51
8	1.18 ± 0.58
16	1.10 ± 0.45
32	1.27 ± 0.43
64	1.56 ± 0.28
128	1.58 ± 0.32
200	1.76 ± 0.24

NOTE This might be a bit inaccurate because I ran these on a weak laptop and hyperfine kept saying this after the 64 mark:

Warning: Statistical outliers were detected. 
Consider re-running this benchmark on a quiet PC without any interferences from other programs. 
It might help to use the '--warmup' or '--prepare' options.

Assuming the way I ran the benchmarks was fair and didn't have any big mistakes, and that the above table is mostly accurate, my thoughts would be:

When the number of upvalues gets large, the speed difference starts to matter.
However, is it really realistic to expect 50+ upvalues in a single closure?
How often does this happen?
How often are such closures called? How likely is it for such a closure to be called in performance sensitive regions like a game loop?
This was an artifically crafted example which makes for an extremely ideal scenario to merge boxes. In a real program however, the boxes may not be mergeable because of escaping etc.

I think judging some stats on existing luarocks packages may help determine the tradeoffs of this optimization. Where things stand now, from my personal POV it might make sense to move forward with the optimization, but it's still unclear to what extent it would benefit us without some real data.

srijan-paul/bench.md