BENCHMARK PERFORMANCE (NEW!) Prompts and metrics included in the abstract so you can benchmark yourself! Coding Output Benchmark Research Output Benchmark Memory continuation Benchmark