This document contains my simplified understanding of how this works.
it is by no means an exhaustive explanation and is meant to get the "vibe" of what's happening while glossing over implementation detail.
- Ask deepseek R1 to think about something.
- It thinks about it for a long time, spending $$$ on compute.
- It eventually comes to a conclusion.
- Take the question + the conclusion and generate a "thought trace" which is the "correct" version or the "shortest version" of the "thought process"