The most important claim of this talk is simple: reinforcement learning (RL) is no longer best understood as a narrow subfield about agents playing games. That picture was historically useful, but in 2026 it is too small. RL increasingly functions as a general mechanism for turning prediction into decision-making, for turning static models into agents, and for turning raw capability into behavior that can be optimized against goals, feedback, and constraints. The question "what's next for RL?" therefore should not be framed as "what is the next Atari or AlphaGo moment?" It should be framed as: wherever AI systems must act, search, plan, improve from feedback, or optimize behavior under delayed consequences, RL is trying to re-enter the stack.
That is part of why the field feels different now. In March 2025, ACM announced that Andrew Barto and Richard Sutton had received the 2024 A.M. Turing Award for "developing the conceptual and algorithmic fou