I recently asked Opus 4.6, Gemini 3 Pro, and GPT 5.2 Thinking to compute some radionuclide data. I have $20/month plans with GPT and Gemini and a $100/month plan with Opus, so it's not a fair comparison. But Opus did perform best.
From the Gemini UI, it wasn't clear whether I should choose Thinking or Pro. From a web search, it seems Thinking is the small model with CoT, whereas Pro is the big model. So I chose Pro.
Claude's split-view UI for artifacts is a big win. I'm amazed the other two haven't copied it. Though according to Claude, it doesn't know which artifact you're viewing, so you have to mention it all the time. How hard could it be to make Claude aware of this?
None of the models could produce a complete list of radionuclides with half-lives 4-40 years. So I made the list myself with NuDat 3 and gave it to them. Fun fact: there are only 24.