--- language: - en license: apache-2.0 pipeline_tag: text-generation tags: - healthcare - diabetes model-index: - name: HAH 2024 v0.11 results: - task: name: Text Generation type: text-generation dataset: name: Custom Dataset (3000 review articles on diabetes) type: diabetes metrics: - name: Placeholder Metric for Development type: Placeholder Type value: 0 # Temporary placeholder value model-description: short-description: "HAH 2024 v0.1 is a state-of-the-art language model fine-tuned specifically for generating text based on diabetes-related content. Leveraging a dataset constructed from 3000 open-source review articles, this model provides informative and contextually relevant answers to various queries about diabetes care, research, and therapies." intended-use: primary-use: "HAH 2024 v0.1 is intended to for research purposes only." secondary-potential-uses: - "a Prototype for researchers to assess (not to formally use in real life cases) generating educational content for patients and the general public about diabetes care and management." - "Check the use of adapters to assist researchers in summarizing large volumes of diabetes-related literature." limitations: - "While HAH 2024 v0.1 excels at generating contextually appropriate responses, it may occasionally produce outputs that require further verification." - "The training dataset, being limited to published articles, might not capture all contemporary research or emerging trends in diabetes care." training-data: description: "The training data for HAH 2024 v0.1 consists of 3000 open-source review articles about diabetes, carefully curated to cover a wide range of topics within the field. The dataset was enriched with questions generated through prompting OpenAI GPT-4 to ensure diversity in content and perspectives." training-procedure: description: "HAH 2024 v0.1 was fine-tuned on an A100 GPU using Google Colab. The fine-tuning process was carefully monitored to maintain the model's relevance to diabetes-related content while minimizing biases that might arise from the dataset's specific nature." --- # Model Card for HAH 2024 v0.1 This modelcard aims to be a base template for new models. It has been generated using [this raw template](https://github.com/huggingface/huggingface_hub/blob/main/src/huggingface_hub/templates/modelcard_template.md?plain=1). ## Model Details ### Model Description HAH 2024 v0.11 aim is to ASSESS how an advanced language model fine-tuned for generating insights from diabetes-related healthcare data will perform. HAH 2024 v0.1 is intended to for research purposes only. - **Developed by:** Dr M As'ad - **Funded by:** Self funded - **Model type:** Transformer-based language model - **Language(s) (NLP):** English - **License:** Apache-2.0 - **Finetuned from model [optional]:** Mistral 7b Instruct v0.2 ## Uses ### Direct Use HAH 2024 v0.11 is designed to assess the performance for direct use in chat interface on diabetes domain. ### Downstream Use [optional] The model can also be fine-tuned for specialized tasks sch a subtypes or subgroups in diabetes field. ### Out-of-Scope Use This model is not recommended for non-English text or contexts outside of healthcare, IT is research project not for any deployments to be used in real chat interface. ## Bias, Risks, and Limitations The model may inherently carry biases from the training data related to diabetes literature, potentially reflecting the geographic and demographic focus of the sources. ### Recommendations Users should verify the model-generated information with current medical guidelines and consider a manual review for sensitive applications. ## How to Get Started with the Model Use the code below to get started with the model: ```python from transformers import pipeline, AutoModelForCausalLM, AutoTokenizer # Assuming the model and tokenizer are loaded with 'username/HAH_2024_v0.1' model = AutoModelForCausalLM.from_pretrained("drmasad/HAH_2024_v0.11") tokenizer = AutoTokenizer.from_pretrained("drmasad/HAH_2024_v0.11") # Setting up the instruction and the user prompt instructions = "you are an expert endocrinologist. Answer the query in accurate informative language any patient will understand." user_prompt = "what is diabetic retinopathy?" # Using the pipeline for text-generation pipe = pipeline(task="text-generation", model=model, tokenizer=tokenizer, max_length=200) # Formatting the input with special tokens [INST] and [/INST] for instructions result = pipe(f"[INST] {instructions} [/INST] {user_prompt}") # Extracting generated text and post-processing generated_text = result[0]['generated_text'] # Split the generated text to get the text after the last occurrence of answer = generated_text.split("")[-1].strip() # Print the answer print(answer)