AdaptLLM commited on
Commit
12aa561
·
verified ·
1 Parent(s): e92526c

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -1
README.md CHANGED
@@ -10,13 +10,14 @@ We investigate domain adaptation of MLLMs through post-training, focusing on dat
10
  **(1) Data Synthesis**: Using open-source models, we develop a visual instruction synthesizer that effectively generates diverse visual instruction tasks from domain-specific image-caption pairs. **Our synthetic tasks surpass those generated by manual rules, GPT-4, and GPT-4V in enhancing the domain-specific performance of MLLMs.**
11
  **(2) Training Pipeline**: While the two-stage training--initially on image-caption pairs followed by visual instruction tasks--is commonly adopted for developing general MLLMs, we apply a single-stage training pipeline to enhance task diversity for domain-specific post-training.
12
  **(3) Task Evaluation**: We conduct experiments in two domains, biomedicine and food, by post-training MLLMs of different sources and scales (e.g., Qwen2-VL-2B, LLaVA-v1.6-8B, Llama-3.2-11B), and then evaluating MLLM performance on various domain-specific tasks.
 
13
  <p align='left'>
14
  <img src="https://cdn-uploads.huggingface.co/production/uploads/650801ced5578ef7e20b33d4/-Jp7pAsCR2Tj4WwfwsbCo.png" width="600">
15
  </p>
16
 
17
 
18
  <p align='left'>
19
- <img src="https://cdn-uploads.huggingface.co/production/uploads/650801ced5578ef7e20b33d4/5ZaevyEheMTjTIoaDcLn-.png" width="900">
20
  </p>
21
 
22
 
 
10
  **(1) Data Synthesis**: Using open-source models, we develop a visual instruction synthesizer that effectively generates diverse visual instruction tasks from domain-specific image-caption pairs. **Our synthetic tasks surpass those generated by manual rules, GPT-4, and GPT-4V in enhancing the domain-specific performance of MLLMs.**
11
  **(2) Training Pipeline**: While the two-stage training--initially on image-caption pairs followed by visual instruction tasks--is commonly adopted for developing general MLLMs, we apply a single-stage training pipeline to enhance task diversity for domain-specific post-training.
12
  **(3) Task Evaluation**: We conduct experiments in two domains, biomedicine and food, by post-training MLLMs of different sources and scales (e.g., Qwen2-VL-2B, LLaVA-v1.6-8B, Llama-3.2-11B), and then evaluating MLLM performance on various domain-specific tasks.
13
+
14
  <p align='left'>
15
  <img src="https://cdn-uploads.huggingface.co/production/uploads/650801ced5578ef7e20b33d4/-Jp7pAsCR2Tj4WwfwsbCo.png" width="600">
16
  </p>
17
 
18
 
19
  <p align='left'>
20
+ <img src="https://cdn-uploads.huggingface.co/production/uploads/650801ced5578ef7e20b33d4/BzpZU5u7DrS6p0d58PQIs.png" width="900">
21
  </p>
22
 
23