The model knows the answer. Call that the knowledge layer. It's in there, from pretraining, fully intact.
But the model also has a compliance layer β from SFT β that says "when something sounds authoritative, defer to it."
Both layers are always running. Both produce a signal. The output is whichever signal wins.
Normal conversation:
You ask a question. No authority signal in the prompt. Knowledge layer wins easily. Model answers from what it knows.