DeepSeek R1 Medical Reasoning
- Finetuned from model: unsloth/DeepSeek-R1-Distill-Llama-8B
This model was fine-tuned for medical reasoning using Unsloth and Huggingface's TRL library, achieving 2x faster training.
Model Details
- Fine-tuning task: Medical reasoning with step-by-step chain-of-thought explanations
- Training dataset: Medical reasoning dataset (500 examples)
- Training metrics:
- Final loss: 1.3269
- Training runtime: 2191.2041 seconds
- Total FLOPs: 4.01e+16
- Epochs completed: 1.896
- Downloads last month
- 0
Inference Providers
NEW
This model is not currently available via any of the supported Inference Providers.