Update README.md
Browse files
README.md
CHANGED
@@ -10,13 +10,14 @@ We investigate domain adaptation of MLLMs through post-training, focusing on dat
|
|
10 |
**(1) Data Synthesis**: Using open-source models, we develop a visual instruction synthesizer that effectively generates diverse visual instruction tasks from domain-specific image-caption pairs. **Our synthetic tasks surpass those generated by manual rules, GPT-4, and GPT-4V in enhancing the domain-specific performance of MLLMs.**
|
11 |
**(2) Training Pipeline**: While the two-stage training--initially on image-caption pairs followed by visual instruction tasks--is commonly adopted for developing general MLLMs, we apply a single-stage training pipeline to enhance task diversity for domain-specific post-training.
|
12 |
**(3) Task Evaluation**: We conduct experiments in two domains, biomedicine and food, by post-training MLLMs of different sources and scales (e.g., Qwen2-VL-2B, LLaVA-v1.6-8B, Llama-3.2-11B), and then evaluating MLLM performance on various domain-specific tasks.
|
|
|
13 |
<p align='left'>
|
14 |
<img src="https://cdn-uploads.huggingface.co/production/uploads/650801ced5578ef7e20b33d4/-Jp7pAsCR2Tj4WwfwsbCo.png" width="600">
|
15 |
</p>
|
16 |
|
17 |
|
18 |
<p align='left'>
|
19 |
-
<img src="https://cdn-uploads.huggingface.co/production/uploads/650801ced5578ef7e20b33d4/
|
20 |
</p>
|
21 |
|
22 |
|
|
|
10 |
**(1) Data Synthesis**: Using open-source models, we develop a visual instruction synthesizer that effectively generates diverse visual instruction tasks from domain-specific image-caption pairs. **Our synthetic tasks surpass those generated by manual rules, GPT-4, and GPT-4V in enhancing the domain-specific performance of MLLMs.**
|
11 |
**(2) Training Pipeline**: While the two-stage training--initially on image-caption pairs followed by visual instruction tasks--is commonly adopted for developing general MLLMs, we apply a single-stage training pipeline to enhance task diversity for domain-specific post-training.
|
12 |
**(3) Task Evaluation**: We conduct experiments in two domains, biomedicine and food, by post-training MLLMs of different sources and scales (e.g., Qwen2-VL-2B, LLaVA-v1.6-8B, Llama-3.2-11B), and then evaluating MLLM performance on various domain-specific tasks.
|
13 |
+
|
14 |
<p align='left'>
|
15 |
<img src="https://cdn-uploads.huggingface.co/production/uploads/650801ced5578ef7e20b33d4/-Jp7pAsCR2Tj4WwfwsbCo.png" width="600">
|
16 |
</p>
|
17 |
|
18 |
|
19 |
<p align='left'>
|
20 |
+
<img src="https://cdn-uploads.huggingface.co/production/uploads/650801ced5578ef7e20b33d4/BzpZU5u7DrS6p0d58PQIs.png" width="900">
|
21 |
</p>
|
22 |
|
23 |
|