--- base_model: unsloth/deepseek-r1-distill-llama-8b-unsloth-bnb-4bit library_name: peft license: mit datasets: - FreedomIntelligence/medical-o1-reasoning-SFT language: - en tags: - medical --- # Model Card for DeepSeek-R1-Medical-COT ## Model Details ### Model Description DeepSeek-R1-Medical-COT is a fine-tuned version of the DeepSeek-R1 model, optimized for medical chain-of-thought (COT) reasoning. It is designed to assist in medical-related tasks such as question-answering, reasoning, and decision support. This model is particularly useful for applications requiring structured reasoning in the medical domain. - **Developed by:** Mohamed Mahmoud - **Funded by [optional]:** Independent project - **Shared by:** Mohamed Mahmoud - **Model type:** Transformer-based Large Language Model (LLM) - **Language(s) (NLP):** English (en) - **License:** MIT - **Finetuned from model:** unsloth/deepseek-r1-distill-llama-8b-unsloth-bnb-4bit ### Model Sources - **Repository:** [Hugging Face Model Repo](https://huggingface.co/thesnak/DeepSeek-R1-Medical-COT) - **LinkedIn:** [Mohamed Mahmoud](https://www.linkedin.com/in/mohamed-thesnak) ## Uses ### Direct Use The model can be used directly for medical reasoning tasks, including: - Answering medical questions - Assisting in medical decision-making - Supporting clinical research and literature review ### Downstream Use - Fine-tuning for specialized medical applications - Integration into chatbots and virtual assistants for medical advice - Educational tools for medical students ### Out-of-Scope Use - This model is not a replacement for professional medical advice. - Should not be used for clinical decision-making without expert validation. - May not perform well in languages other than English. ## Bias, Risks, and Limitations While fine-tuned for medical reasoning, the model may still have biases due to the limitations of its training data. Users should exercise caution and validate critical outputs with medical professionals. ### Recommendations Users should verify outputs, particularly in sensitive medical contexts. The model is best used as an assistive tool rather than a primary decision-making system. ## How to Get Started with the Model ```python from transformers import AutoModelForCausalLM, AutoTokenizer import torch model_name = "thesnak/DeepSeek-R1-Medical-COT" tokenizer = AutoTokenizer.from_pretrained(model_name) model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.float16, device_map="auto") input_text = "What are the symptoms of pneumonia?" inputs = tokenizer(input_text, return_tensors="pt").to("cuda") outputs = model.generate(**inputs, max_new_tokens=100) print(tokenizer.decode(outputs[0], skip_special_tokens=True)) ``` ## Training Details ### Training Data The model was fine-tuned using the **FreedomIntelligence/medical-o1-reasoning-SFT** dataset, which contains medical question-answer pairs designed to improve reasoning capabilities. ### Training Procedure #### Preprocessing - Tokenization using LLaMA tokenizer - Text cleaning and normalization #### Training Hyperparameters - **Precision:** bf16 mixed precision - **Optimizer:** AdamW - **Batch size:** 16 - **Learning rate:** 2e-5 - **Epochs:** 3 #### Speeds, Sizes, Times - **Training time:** Approximately 12 hours on a P100 GPU (Kaggle) - **Model size:** 8B parameters (bnb 4-bit quantized) #### Training Loss | Step | Training Loss | | ---- | ------------- | | 10 | 1.919000 | | 20 | 1.461800 | | 30 | 1.402500 | | 40 | 1.309000 | | 50 | 1.344400 | | 60 | 1.314100 | ## Evaluation ### Testing Data, Factors & Metrics #### Testing Data - The model was evaluated on held-out samples from **FreedomIntelligence/medical-o1-reasoning-SFT**. #### Factors - Performance was assessed on medical reasoning tasks. #### Metrics - **Perplexity:** Measured for general coherence. - **Accuracy:** Evaluated based on expert-verified responses. - **BLEU Score:** Used to assess response relevance. ### Results - **Perplexity:** - **Accuracy:** - **BLEU Score:** ## Model Examination Further interpretability analyses can be conducted using tools like Captum and SHAP to analyze how the model derives its medical reasoning responses. ## Environmental Impact - **Hardware Type:** P100 GPU (Kaggle) - **Hours used:** 2 hours - **Cloud Provider:** Kaggle - **Compute Region:** N/A - **Carbon Emitted:** Estimated at 9.5 kg CO2eq - **[Kaggle Notebook](https://www.kaggle.com/code/thesnak/fine-tune-deepseek)** ## Technical Specifications ### Compute Infrastructure #### Hardware - P100 GPU (16GB VRAM) on Kaggle ## Citation **BibTeX:** ```bibtex @misc{mahmoud2025deepseekmedcot, title={DeepSeek-R1-Medical-COT}, author={Mohamed Mahmoud}, year={2025}, url={https://huggingface.co/thesnak/DeepSeek-R1-Medical-COT} } ``` ## Model Card Authors - Mohamed Mahmoud ## Model Card Contact - [LinkedIn](https://www.linkedin.com/in/mohamed-thesnak)