#### Overview BioMed-VITAL is a multimodal foundation model specifically tuned for biomedical applications. It leverages visual and textual data to improve understanding and reasoning within the biomedical domain. #### Model Training The training of BioMed-VITAL involved two key stages, both incorporating clinician preferences to ensure the relevance and quality of the training data: 1. **Data Generation:** During this stage, the GPT-4V generator was prompted with a diverse set of clinician-selected demonstrations. This approach facilitated the generation of domain-specific, preference-aligned data candidates, tailored to reflect real-world clinical scenarios and preferences. 2. **Data Selection:** A separate selection model was trained to explicitly incorporate clinician and policy-guided preferences. This model employed a sophisticated rating function to evaluate and select the highest quality data for further tuning of BioMed-VITAL. This selection process was critical in refining the dataset to ensure that only the most relevant and accurate instructional data was used. #### Performance and Evaluation The effectiveness of BioMed-VITAL was demonstrated through significant improvements in two key areas: - **Open Visual Chat:** The model showed a relative improvement of 18.5%, indicating enhanced capabilities in engaging in visual dialogues pertinent to biomedical contexts. - **Medical Visual Question Answering (VQA):** BioMed-VITAL achieved a win rate of up to 81.73% in this domain, showcasing its superior performance in interpreting and responding to complex medical imagery and queries.