DeepSeek-R1-Distill-Llama-8B - Fine-Tuned for Medical Chain-of-Thought Reasoning

Model Overview

The DeepSeek-R1-Distill-Llama-8B model has been fine-tuned for medical chain-of-thought (CoT) reasoning. This fine-tuning process enhances the model's ability to generate structured, concise, and accurate medical reasoning outputs. The model was trained using a 500-sample subset of the medical-o1-reasoning-SFT dataset, with optimizations including 4-bit quantization and LoRA adapters to improve efficiency and reduce memory usage.

Key Features

Base Model: unsloth/DeepSeek-R1-Distill-Llama-8B
Fine-Tuning Objective: Adaptation for structured, step-by-step medical reasoning tasks.
Training Dataset: 500 samples from medical-o1-reasoning-SFT dataset.
Tools Used:
- Unsloth: Accelerates training by 2x.
- 4-bit Quantization: Reduces model memory usage.
- LoRA Adapters: Enables parameter-efficient fine-tuning.
Training Time: 44 minutes.

Performance Improvements

Response Length: Reduced from an average of 450 words to 150 words, improving conciseness.
Reasoning Style: Shifted from verbose explanations to more focused, structured reasoning.
Answer Format: Transitioned from bulleted lists to paragraph-style answers for clarity.

Intended Use

This model is designed for use by:

Medical professionals requiring structured diagnostic reasoning.
Researchers seeking assistance in medical knowledge extraction.
Developers integrating the model for medical CoT tasks in clinical settings, treatment planning, and education.

Typical use cases include:

Clinical diagnostics
Treatment planning
Medical education and training
Research assistance

Training Details

Key Components:

Model: unsloth/DeepSeek-R1-Distill-Llama-8B
Dataset: medical-o1-reasoning-SFT (500 samples)
Training Tools:
- Unsloth: Optimized training for faster results (2x speedup).
- 4-bit Quantization: Optimized memory usage for efficient training.
- LoRA Adapters: Enables lightweight fine-tuning with reduced computational costs.

Fine-Tuning Process:

Install Required Packages: Installed necessary libraries, including unsloth and kaggle.
Authentication: Authenticated with Hugging Face Hub and Weights & Biases for tracking experiments and versioning.
Model Initialization: Initialized the base model with 4-bit quantization and a sequence length of up to 2048 tokens.
Pre-Fine-Tuning Inference: Conducted an initial inference to establish the model’s baseline performance on a medical question.
Dataset Preparation: Structured and formatted the training data using a custom template tailored to medical CoT reasoning tasks.
Application of LoRA Adapters: Incorporated LoRA adapters for efficient parameter tuning during fine-tuning.
Supervised Fine-Tuning: Utilized SFTTrainer to fine-tune the model with optimized hyperparameters for 44 minutes.
Post-Fine-Tuning Inference: Evaluated the model’s improved performance by testing it on the same medical question after fine-tuning.
Saving and Loading: Stored the fine-tuned model, including LoRA adapters, for easy future use and deployment.
Model Deployment: Pushed the fine-tuned model to Hugging Face Hub in GGUF format with 4-bit quantization enabled for efficient use.

Notebook

Access the implementation notebook for this modelhere. This notebook provides detailed steps for fine-tuning and deploying the model.

SURESHBEEKHANI
/

Deep-seek-R1-Medical-reasoning-SFT