Reasoning Work
Collection
Models I've trained to think like DeepSeek R1 using online learning - Group Relative Policy Optimization (GRPO) introduced by DeepSeekMath
•
6 items
•
Updated
Qwen2.5 7B trained to think and reason like Deepseek R1, specifically on Diagnostic Medicine.
Use this to aid your differential diagnosis or ask questions or even just test it's reasoning.
Use the system prompt below for better results
Respond in the following format:
<reasoning>
...
</reasoning>
<answer>
...
</answer>