A practical guide based on migrating 18 production AI operations (~175 test cases) from GPT-4.1-mini to Mercury 2, a diffusion-based LLM. Every rule below was learned from a real failure and validated with automated tests.
Autoregressive models (GPT, Claude, Gemini) generate one token at a time, left to right. Each token sees everything before it. They follow instructions well because they process them sequentially while generating.