AdaptLLM
/

Adapt-MLLM-to-Domains

Model card Files Files and versions Community

AdaptLLM commited on Dec 1, 2024

Commit

12aa561

·

verified ·

1 Parent(s): e92526c

Update README.md

Files changed (1) hide show

README.md +2 -1

README.md CHANGED Viewed

@@ -10,13 +10,14 @@ We investigate domain adaptation of MLLMs through post-training, focusing on dat
 **(1) Data Synthesis**: Using open-source models, we develop a visual instruction synthesizer that effectively generates diverse visual instruction tasks from domain-specific image-caption pairs. **Our synthetic tasks surpass those generated by manual rules, GPT-4, and GPT-4V in enhancing the domain-specific performance of MLLMs.**
 **(2) Training Pipeline**: While the two-stage training--initially on image-caption pairs followed by visual instruction tasks--is commonly adopted for developing general MLLMs, we apply a single-stage training pipeline to enhance task diversity for domain-specific post-training.
 **(3) Task Evaluation**: We conduct experiments in two domains, biomedicine and food, by post-training MLLMs of different sources and scales (e.g., Qwen2-VL-2B, LLaVA-v1.6-8B, Llama-3.2-11B), and then evaluating MLLM performance on various domain-specific tasks.
 <p align='left'>
     <img src="https://cdn-uploads.huggingface.co/production/uploads/650801ced5578ef7e20b33d4/-Jp7pAsCR2Tj4WwfwsbCo.png" width="600">
 </p>
 <p align='left'>
-    <img src="https://cdn-uploads.huggingface.co/production/uploads/650801ced5578ef7e20b33d4/5ZaevyEheMTjTIoaDcLn-.png" width="900">
 </p>

 **(1) Data Synthesis**: Using open-source models, we develop a visual instruction synthesizer that effectively generates diverse visual instruction tasks from domain-specific image-caption pairs. **Our synthetic tasks surpass those generated by manual rules, GPT-4, and GPT-4V in enhancing the domain-specific performance of MLLMs.**
 **(2) Training Pipeline**: While the two-stage training--initially on image-caption pairs followed by visual instruction tasks--is commonly adopted for developing general MLLMs, we apply a single-stage training pipeline to enhance task diversity for domain-specific post-training.
 **(3) Task Evaluation**: We conduct experiments in two domains, biomedicine and food, by post-training MLLMs of different sources and scales (e.g., Qwen2-VL-2B, LLaVA-v1.6-8B, Llama-3.2-11B), and then evaluating MLLM performance on various domain-specific tasks.
 <p align='left'>
     <img src="https://cdn-uploads.huggingface.co/production/uploads/650801ced5578ef7e20b33d4/-Jp7pAsCR2Tj4WwfwsbCo.png" width="600">
 </p>
 <p align='left'>
+    <img src="https://cdn-uploads.huggingface.co/production/uploads/650801ced5578ef7e20b33d4/BzpZU5u7DrS6p0d58PQIs.png" width="900">
 </p>