Model Card: Gemma 2B Medical ORPO RLHF Fine-Tuning

Model Overview

Model Description

This model is a fine-tuned version of the Gemma 2B model using ORPO RLHF to enhance its medical reasoning capabilities. The fine-tuning process leverages a medical-reasoning dataset to improve decision-making and contextual understanding in healthcare-related queries.

Intended Use

This model is designed for:

  • Assisting in medical reasoning and diagnosis
  • Enhancing clinical decision support
  • Providing explanations for medical queries
  • Research and educational purposes in the medical field

Limitations:

  • Not a substitute for professional medical advice.
  • May contain biases based on the dataset.
  • Performance is dependent on prompt formulation.

Training Details

  • Dataset Used: SURESHBEEKHANI/medical-reasoning-orpo
  • Number of Training Steps: 30 (Demo setting, increase for full training)
  • Batch Size: 1 per device
  • Gradient Accumulation Steps: 4
  • Optimizer: AdamW (8-bit)
  • Learning Rate Scheduler: Linear
  • Precision: Mixed (Bfloat16 or Float16 depending on hardware)
  • Quantization: 4-bit (q4_k_m, q8_0, q5_k_m)

Model Performance

The model was evaluated based on:

  • Accuracy in medical reasoning tasks
  • Fluency in response generation
  • Coherence and factual correctness
  • Comparison with baseline medical AI models

Ethical Considerations

  • The model should not be used for making actual medical decisions without professional oversight.
  • Potential biases in medical datasets may lead to inaccurate or misleading outputs.
  • Always verify responses with medical professionals before acting on them.

How to Use

from unsloth import FastLanguageModel
from transformers import AutoTokenizer

model, tokenizer = FastLanguageModel.from_pretrained(
    "SURESHBEEKHANI/Gemma_2B_Medical_ORPO_RLHF_Fine_Tuning",
    load_in_4bit=True
)

prompt = "### Instruction: Diagnose the following symptoms...\n### Input: Fever, headache, and rash\n### Response:"
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)

Citation

If you use this model, please cite:

@misc{gemma2b_orpo_medical,
  author = {Suresh Beekhanii},
  title = {Fine-Tuning Gemma 2B for Medical Reasoning using ORPO RLHF},
  year = {2024},
  publisher = {Hugging Face},
  url = {https://huggingface.co/SURESHBEEKHANI/Gemma_2B_Medical_ORPO_RLHF_Fine_Tuning}
}

Contact

For any issues or questions, please contact Suresh Beekhanii or open an issue in the Hugging Face repository.

Downloads last month
50
GGUF
Model size
2.51B params
Architecture
gemma
Hardware compatibility
Log In to view the estimation

4-bit

5-bit

8-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for SURESHBEEKHANI/Gemma_2B_Medical_ORPO_RLHF_Fine_Tuning

Quantized
(39)
this model

Dataset used to train SURESHBEEKHANI/Gemma_2B_Medical_ORPO_RLHF_Fine_Tuning