File size: 2,720 Bytes
35fbf56 34da571 f38e65e 35fbf56 0868e7e 34da571 1f7a442 34da571 f38e65e 585929e f38e65e 1f7a442 f38e65e 585929e 0868e7e f38e65e 585929e 0868e7e f38e65e 585929e ce75bfb f38e65e 585929e ce75bfb |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 |
# Adapting Multimodal Large Language Models to Domains via Post-Training
This repository provides an implementation preview of our paper, **On Domain-Specific Post-Training for Multimodal Large Language Models**.
We investigate domain adaptation of MLLMs through post-training, focusing on data synthesis, training pipelines, and task evaluation. Our resulting model, **AdaMLLM**, consistently outperforms general MLLMs across various tasks in two domains: biomedicine and food.
<p align='left'>
<img src="https://cdn-uploads.huggingface.co/production/uploads/650801ced5578ef7e20b33d4/iklQIKW_6TyCT13BMq5-d.png" width="600">
</p>
### **Updates**
- **[2024/11/28]** Released our paper.
## About
**AdaMLLM** is our third effort to enhance **task generalization** by scaling synthetic supervised tasks from unsupervised contexts.
<p align='left'>
<img src="https://cdn-uploads.huggingface.co/production/uploads/650801ced5578ef7e20b33d4/GOoo9WxxFsJgTvbgrX2y8.png" width="900">
</p>
- **1st Work: [AdaptLLM](https://huggingface.co/papers/2309.09530)**
We employ rule-based methods to extract tasks from domain-specific corpora, reformatting them into reading comprehension tasks for continued pre-training. Our 7B finance model outperforms domain-specific models of much larger scales, such as BloombergGPT-50B.
- **2nd Work: [Instruction Pretraining](https://huggingface.co/instruction-pretrain)**
We develop a general-purpose instruction synthesizer which significantly increases task diversity for LM pre-training, outperforming Vanilla Pretraining in both general pretraining from scratch and domain-adaptive continual pretraining.
- **3rd Work: AdaMLLM**
We extend supervised task synthesis to multimodality, introducing a unified **visual instruction synthesizer** to extract task pairs from image-caption data. Our synthetic tasks outperform those generated by manual rules, GPT-4, and GPT-4V in improving domain-specific performance for MLLMs.
Looking ahead, we aim to further broaden the scope of supervised task synthesis, efficiently enhancing the general capabilities of trained models.
## Citation
```bibtex
@article{instructPT,
title={Instruction Pre-Training: Language Models are Supervised Multitask Learners},
author={Cheng, Daixuan and Gu, Yuxian and Huang, Shaohan and Bi, Junyu and Huang, Minlie and Wei, Furu},
journal={arXiv preprint arXiv:2406.14491},
year={2024}
}
@inproceedings{
adaptllm,
title={Adapting Large Language Models via Reading Comprehension},
author={Daixuan Cheng and Shaohan Huang and Furu Wei},
booktitle={The Twelfth International Conference on Learning Representations},
year={2024},
url={https://openreview.net/forum?id=y886UXPEZ0}
}
``` |