DeepSeek-R1-Distill-Llama-8B - Fine-Tuned for Medical Chain-of-Thought Reasoning
Model Overview
The DeepSeek-R1-Distill-Llama-8B model has been fine-tuned for medical chain-of-thought (CoT) reasoning. This fine-tuning process enhances the model's ability to generate structured, concise, and accurate medical reasoning outputs. The model was trained using a 500-sample subset of the medical-o1-reasoning-SFT dataset, with optimizations including 4-bit quantization and LoRA adapters to improve efficiency and reduce memory usage.
Key Features
- Base Model: unsloth/DeepSeek-R1-Distill-Llama-8B
- Fine-Tuning Objective: Adaptation for structured, step-by-step medical reasoning tasks.
- Training Dataset: 500 samples from medical-o1-reasoning-SFT dataset.
- Tools Used:
- Unsloth: Accelerates training by 2x.
- 4-bit Quantization: Reduces model memory usage.
- LoRA Adapters: Enables parameter-efficient fine-tuning.
- Training Time: 44 minutes.
Performance Improvements
- Response Length: Reduced from an average of 450 words to 150 words, improving conciseness.
- Reasoning Style: Shifted from verbose explanations to more focused, structured reasoning.
- Answer Format: Transitioned from bulleted lists to paragraph-style answers for clarity.
Intended Use
This model is designed for use by:
- Medical professionals requiring structured diagnostic reasoning.
- Researchers seeking assistance in medical knowledge extraction.
- Developers integrating the model for medical CoT tasks in clinical settings, treatment planning, and education.
Typical use cases include:
- Clinical diagnostics
- Treatment planning
- Medical education and training
- Research assistance
Training Details
Key Components:
- Model: unsloth/DeepSeek-R1-Distill-Llama-8B
- Dataset: medical-o1-reasoning-SFT (500 samples)
- Training Tools:
- Unsloth: Optimized training for faster results (2x speedup).
- 4-bit Quantization: Optimized memory usage for efficient training.
- LoRA Adapters: Enables lightweight fine-tuning with reduced computational costs.
Fine-Tuning Process:
Install Required Packages: Installed necessary libraries, including unsloth and kaggle.
Authentication: Authenticated with Hugging Face Hub and Weights & Biases for tracking experiments and versioning.
Model Initialization: Initialized the base model with 4-bit quantization and a sequence length of up to 2048 tokens.
Pre-Fine-Tuning Inference: Conducted an initial inference to establish the model’s baseline performance on a medical question.
Dataset Preparation: Structured and formatted the training data using a custom template tailored to medical CoT reasoning tasks.
Application of LoRA Adapters: Incorporated LoRA adapters for efficient parameter tuning during fine-tuning.
Supervised Fine-Tuning: Utilized SFTTrainer to fine-tune the model with optimized hyperparameters for 44 minutes.
Post-Fine-Tuning Inference: Evaluated the model’s improved performance by testing it on the same medical question after fine-tuning.
Saving and Loading: Stored the fine-tuned model, including LoRA adapters, for easy future use and deployment.
Model Deployment: Pushed the fine-tuned model to Hugging Face Hub in GGUF format with 4-bit quantization enabled for efficient use.
Notebook
Access the implementation notebook for this modelhere. This notebook provides detailed steps for fine-tuning and deploying the model.
- Downloads last month
- 112
Model tree for SURESHBEEKHANI/Deep-seek-R1-Medical-reasoning-SFT
Base model
deepseek-ai/DeepSeek-R1-Distill-Llama-8B