---
language:
- en
license: apache-2.0
pipeline_tag: text-generation
tags:
- healthcare
- diabetes

model-index:
- name: HAH 2024 v0.11
  results:
    - task:
        name: Text Generation
        type: text-generation
      dataset:
        name: Custom Dataset (3000 review articles on diabetes)
        type: diabetes
      metrics:
        - name: Placeholder Metric for Development
          type: Placeholder Type
          value: 0  # Temporary placeholder value

model-description:
  short-description: "HAH 2024 v0.1 is a state-of-the-art language model fine-tuned specifically for generating text based on diabetes-related content. Leveraging a dataset constructed from 3000 open-source review articles, this model provides informative and contextually relevant answers to various queries about diabetes care, research, and therapies."

intended-use:
  primary-use: "HAH 2024 v0.1 is intended to for research purposes only."
  secondary-potential-uses:
    - "a Prototype for researchers to assess (not to formally use in real life cases) generating educational content for patients and the general public about diabetes care and management."
    - "Check the use of adapters to assist researchers in summarizing large volumes of diabetes-related literature."

limitations:
  - "While HAH 2024 v0.1 excels at generating contextually appropriate responses, it may occasionally produce outputs that require further verification."
  - "The training dataset, being limited to published articles, might not capture all contemporary research or emerging trends in diabetes care."

training-data:
  description: "The training data for HAH 2024 v0.1 consists of 3000 open-source review articles about diabetes, carefully curated to cover a wide range of topics within the field. The dataset was enriched with questions generated through prompting OpenAI GPT-4 to ensure diversity in content and perspectives."

training-procedure:
  description: "HAH 2024 v0.1 was fine-tuned on an A100 GPU using Google Colab. The fine-tuning process was carefully monitored to maintain the model's relevance to diabetes-related content while minimizing biases that might arise from the dataset's specific nature."

---

# Model Card for HAH 2024 v0.1

This modelcard aims to be a base template for new models. It has been generated using [this raw template](https://github.com/huggingface/huggingface_hub/blob/main/src/huggingface_hub/templates/modelcard_template.md?plain=1).

## Model Details

### Model Description

HAH 2024 v0.11 aim is to ASSESS how an advanced language model fine-tuned for generating insights from diabetes-related healthcare data will perform. HAH 2024 v0.1 is intended to for research purposes only.

- **Developed by:** Dr M As'ad
- **Funded by:** Self funded
- **Model type:** Transformer-based language model
- **Language(s) (NLP):** English
- **License:** Apache-2.0
- **Finetuned from model [optional]:** Mistral 7b Instruct v0.2

## Uses

### Direct Use

HAH 2024 v0.11 is designed to assess the performance for direct use in chat interface on diabetes domain.

### Downstream Use [optional]

The model can also be fine-tuned for specialized tasks sch a subtypes or subgroups in diabetes field.

### Out-of-Scope Use

This model is not recommended for non-English text or contexts outside of healthcare, 
IT is research project not for any deployments to be used in real chat interface.

## Bias, Risks, and Limitations

The model may inherently carry biases from the training data related to diabetes literature, potentially reflecting the geographic and demographic focus of the sources.

### Recommendations

Users should verify the model-generated information with current medical guidelines and consider a manual review for sensitive applications.

## How to Get Started with the Model

Use the code below to get started with the model:

```python
  from transformers import pipeline, AutoModelForCausalLM, AutoTokenizer
  
  # Assuming the model and tokenizer are loaded with 'username/HAH_2024_v0.1'
  model = AutoModelForCausalLM.from_pretrained("drmasad/HAH_2024_v0.11")
  tokenizer = AutoTokenizer.from_pretrained("drmasad/HAH_2024_v0.11")
  
  # Setting up the instruction and the user prompt
  instructions = "you are an expert endocrinologist. Answer the query in accurate informative language any patient will understand."
  user_prompt = "what is diabetic retinopathy?"
  
  # Using the pipeline for text-generation
  pipe = pipeline(task="text-generation", model=model, tokenizer=tokenizer, max_length=200)
  
  # Formatting the input with special tokens [INST] and [/INST] for instructions
  result = pipe(f"<s>[INST] {instructions} [/INST] {user_prompt}</s>")
  
  # Extracting generated text and post-processing
  generated_text = result[0]['generated_text']
  
  # Split the generated text to get the text after the last occurrence of </s>
  answer = generated_text.split("</s>")[-1].strip()
  
  # Print the answer
  print(answer)