Image-Text-to-Text
xtuner
osanseviero's picture
Fix metadata to have right library and task
9d322cc verified
|
raw
history blame
2.39 kB
metadata
datasets:
  - liuhaotian/LLaVA-Pretrain
  - liuhaotian/LLaVA-Instruct-150K
pipeline_tag: image-text-to-text
library_name: xtuner

Generic badge

Model

llava-v1.5-13b-xtuner is a LLaVA model fine-tuned from Vicuna-13B-v1.5 and CLIP-ViT-Large-patch14-336 with LLaVA-Pretrain and LLaVA-Instruct by XTuner.

Quickstart

Installation

pip install -U 'xtuner[deepspeed]'

Chat

xtuner chat lmsys/vicuna-13b-v1.5 \
  --visual-encoder openai/clip-vit-large-patch14-336 \
  --llava xtuner/llava-v1.5-13b-xtuner \
  --prompt-template vicuna \
  --image $IMAGE_PATH

Training

  1. Alignment module pretraining (saved by default in ./work_dirs/)
NPROC_PER_NODE=8 xtuner train llava_vicuna_13b_v15_clip_vit_large_p14_336_e1_gpu8_pretrain --deepspeed deepspeed_zero2
  1. Instruction following fine-tuning (saved by default in ./work_dirs/)
NPROC_PER_NODE=8 xtuner train llava_vicuna_13b_v15_qlora_clip_vit_large_p14_336_lora_e1_gpu8_finetune --deepspeed deepspeed_zero2

MMBench Evaluation

XTuner integrates the MMBench evaluation, and you can perform evaluations with the following command!

xtuner mmbench lmsys/vicuna-13b-v1.5 \
  --visual-encoder openai/clip-vit-large-patch14-336 \
  --llava xtuner/llava-v1.5-13b-xtuner \
  --prompt-template vicuna \
  --data-path $MMBENCH_DATA_PATH \
  --work-dir $RESULT_PATH

After the evaluation is completed, if it's a development set, it will directly print out the results; If it's a test set, you need to submit mmbench_result.xlsx to the official MMBench for final evaluation to obtain precision results!

Citation

@misc{2023xtuner,
    title={XTuner: A Toolkit for Efficiently Fine-tuning LLM},
    author={XTuner Contributors},
    howpublished = {\url{https://github.com/InternLM/xtuner}},
    year={2023}
}