Update README.md
Browse files
README.md
CHANGED
@@ -1,8 +1,9 @@
|
|
1 |
-
# Adapting Multimodal Large Language Models to Domains
|
2 |
|
3 |
-
This repository
|
|
|
|
|
4 |
|
5 |
-
We investigate domain adaptation of MLLMs through post-training, focusing on data synthesis, training pipelines, and task evaluation.
|
6 |
(1) **Data Synthesis**: Using open-source models, we develop a visual instruction synthesizer that effectively generates diverse visual instruction tasks from domain-specific image-caption pairs. Our synthetic tasks surpass those generated by manual rules, GPT-4, and GPT-4V in enhancing the domain-specific performance of MLLMs.
|
7 |
(2) **Training Pipeline**: While the two-stage training—initially on image-caption pairs followed by visual instruction tasks—is commonly adopted for developing general MLLMs, we apply a single-stage training pipeline to enhance task diversity for domain-specific post-training.
|
8 |
(3) **Task Evaluation**: We conduct experiments in two domains, biomedicine and food, by post-training MLLMs of different sources and scales (Qwen2-VL-2B, LLaVA-v1.6-8B, Llama-3.2-11B), and then evaluating MLLM performance on various domain-specific tasks.
|
|
|
1 |
+
# Adapting Multimodal Large Language Models to Domains via Post-Training
|
2 |
|
3 |
+
This repository provides an implementation preview of our paper, **On Domain-Specific Post-Training for Multimodal Large Language Models**.
|
4 |
+
|
5 |
+
Building on our previous work, [AdaptLLM](https://huggingface.co/papers/2309.09530), which develops domain-specific LLMs through continued training on domain-specific corpora, we introduce **AdaMLLM**. This framework explores domain adaptation of MLLMs through post-training, focusing on data synthesis, training pipelines, and task evaluation.
|
6 |
|
|
|
7 |
(1) **Data Synthesis**: Using open-source models, we develop a visual instruction synthesizer that effectively generates diverse visual instruction tasks from domain-specific image-caption pairs. Our synthetic tasks surpass those generated by manual rules, GPT-4, and GPT-4V in enhancing the domain-specific performance of MLLMs.
|
8 |
(2) **Training Pipeline**: While the two-stage training—initially on image-caption pairs followed by visual instruction tasks—is commonly adopted for developing general MLLMs, we apply a single-stage training pipeline to enhance task diversity for domain-specific post-training.
|
9 |
(3) **Task Evaluation**: We conduct experiments in two domains, biomedicine and food, by post-training MLLMs of different sources and scales (Qwen2-VL-2B, LLaVA-v1.6-8B, Llama-3.2-11B), and then evaluating MLLM performance on various domain-specific tasks.
|