DeepSeek-R1-Distill-Llama-8B - Fine-Tuned for Medical Chain-of-Thought Reasoning

Model Overview

The DeepSeek-R1-Distill-Llama-8B model has been fine-tuned for medical chain-of-thought (CoT) reasoning. This fine-tuning process enhances the model's ability to generate structured, concise, and accurate medical reasoning outputs. The model was trained using a 500-sample subset of the medical-o1-reasoning-SFT dataset, with optimizations including 4-bit quantization and LoRA adapters to improve efficiency and reduce memory usage.

Key Features

  • Base Model: unsloth/DeepSeek-R1-Distill-Llama-8B
  • Fine-Tuning Objective: Adaptation for structured, step-by-step medical reasoning tasks.
  • Training Dataset: 500 samples from medical-o1-reasoning-SFT dataset.
  • Tools Used:
    • Unsloth: Accelerates training by 2x.
    • 4-bit Quantization: Reduces model memory usage.
    • LoRA Adapters: Enables parameter-efficient fine-tuning.
  • Training Time: 44 minutes.

Performance Improvements

  • Response Length: Reduced from an average of 450 words to 150 words, improving conciseness.
  • Reasoning Style: Shifted from verbose explanations to more focused, structured reasoning.
  • Answer Format: Transitioned from bulleted lists to paragraph-style answers for clarity.

Intended Use

This model is designed for use by:

  • Medical professionals requiring structured diagnostic reasoning.
  • Researchers seeking assistance in medical knowledge extraction.
  • Developers integrating the model for medical CoT tasks in clinical settings, treatment planning, and education.

Typical use cases include:

  • Clinical diagnostics
  • Treatment planning
  • Medical education and training
  • Research assistance

Training Details

Key Components:

  • Model: unsloth/DeepSeek-R1-Distill-Llama-8B
  • Dataset: medical-o1-reasoning-SFT (500 samples)
  • Training Tools:
    • Unsloth: Optimized training for faster results (2x speedup).
    • 4-bit Quantization: Optimized memory usage for efficient training.
    • LoRA Adapters: Enables lightweight fine-tuning with reduced computational costs.

Fine-Tuning Process:

  1. Install Required Packages: Installed necessary libraries, including unsloth and kaggle.

  2. Authentication: Authenticated with Hugging Face Hub and Weights & Biases for tracking experiments and versioning.

  3. Model Initialization: Initialized the base model with 4-bit quantization and a sequence length of up to 2048 tokens.

  4. Pre-Fine-Tuning Inference: Conducted an initial inference to establish the model’s baseline performance on a medical question.

  5. Dataset Preparation: Structured and formatted the training data using a custom template tailored to medical CoT reasoning tasks.

  6. Application of LoRA Adapters: Incorporated LoRA adapters for efficient parameter tuning during fine-tuning.

  7. Supervised Fine-Tuning: Utilized SFTTrainer to fine-tune the model with optimized hyperparameters for 44 minutes.

  8. Post-Fine-Tuning Inference: Evaluated the model’s improved performance by testing it on the same medical question after fine-tuning.

  9. Saving and Loading: Stored the fine-tuned model, including LoRA adapters, for easy future use and deployment.

  10. Model Deployment: Pushed the fine-tuned model to Hugging Face Hub in GGUF format with 4-bit quantization enabled for efficient use.

Notebook

Access the implementation notebook for this modelhere. This notebook provides detailed steps for fine-tuning and deploying the model.

Downloads last month
112
GGUF
Model size
8.03B params
Architecture
llama

4-bit

5-bit

8-bit

Inference Providers NEW
This model is not currently available via any of the supported third-party Inference Providers, and HF Inference API was unable to determine this model's library.

Model tree for SURESHBEEKHANI/Deep-seek-R1-Medical-reasoning-SFT

Dataset used to train SURESHBEEKHANI/Deep-seek-R1-Medical-reasoning-SFT