In fruits.py, the model is allowed to generate normally until it emits </think>. Our vLLM request specifies response_format={"type": "structural_tag", ...} so that </think> acts as the boundary where constrained decoding begins.
The relevant part is:
END = "ekYyhUMDCGhE65J5"
response_format={