Any one get the issues of <think> tag not showing?
最后,确保所有引用的数据准确,比如“January 2024”和“July”保持原样,不添加年份。风险部分要强调审批结果的不确定性, ...... 。
I've noticed an issue with my model inference output. In the responses, I only see a closing tag that appears after the thinking process and before the final answer, but there's no opening tag at the beginning of the thinking section.
Has anyone else encountered this problem? I'm trying to properly implement a thinking/reflection format where the model's internal reasoning is clearly separated from its final response.
I'm specifically wondering how to ensure the opening tag appears in the output. I'm using VLLM to serve the model and making API calls to generate responses.
最后,确保所有引用的数据准确,比如“January 2024”和“July”保持原样,不添加年份。风险部分要强调审批结果的不确定性, ...... 。
I've noticed an issue with my model inference output. In the responses, I only see a closing tag that appears after the thinking process and before the final answer, but there's no opening tag at the beginning of the thinking section.
Has anyone else encountered this problem? I'm trying to properly implement a thinking/reflection format where the model's internal reasoning is clearly separated from its final response.
I'm specifically wondering how to ensure the opening tag appears in the output. I'm using VLLM to serve the model and making API calls to generate responses.
Could it be related to parameters? What are your inference parameters? I know Deepseek distills are sensitive about temperature - if you set it too low or too high, thinking may not work properly. I think the suggested temperature range for Deepseek was about 0.4 - 0.6. When using QwQ-32B GGUF, I usually use the same preset that I originally created for Deepseek distill model which has temperature 0.6, but then again your model format is different, so perhaps that may also affect its quality and consequently its ability to generate the output in the expected format with proper thinking tags. In any case, you could start troubleshooting by trying to adjust the parameters. That seems like the easiest thing to start with.
Edit:
Some people suggest to force the model to start with the <think>
tag by prepending it to the response. Maybe you could try that if everything else fails. If you're further processing the response programmatically, you could split the response string by the closing </think>
tag to get the CoT and final response separately, depending on what you're trying to do with the response.