|
--- |
|
datasets: |
|
- MMInstruction/VLFeedback |
|
--- |
|
# Model Card for Silkie |
|
|
|
<!-- Provide a quick summary of what the model is/does. --> |
|
|
|
Silkie is a visual language model trained using preference distillation on GPT-4V annotated AI feedback. It is a fine-tuned version of [Qwen/Qwen-VL-Chat](https://huggingface.co/Qwen/Qwen-VL-Chat) and was trained on our [MMInstruction/VLFeedback](https://huggingface.co/datasets/MMInstruction/VLFeedback) dataset with direct preference optimization (DPO). Silkie is a visual language model trained by preference distillation on GPT-4V annotated AI feedback. It is a fine-tuned version of [Qwen/Qwen-VL-Chat](https://huggingface.co/Qwen/Qwen-VL-Chat) that is trained on our [MMInstruction/VLFeedback](https://huggingface.co/datasets/MMInstruction/VLFeedback) dataset with direct preference optimization (DPO). Compared with the original model, Silkile achieves 6.9% and 9.5% relative improvement on the MME benchmark regarding the perception and cognition capabilities, respectively. Besides, Silkie sets a new state-of-the-art score of 3.02 on MMHal-Bench regarding hallucination evaluation. Please refer to our [project page](https://vlf-silkie.github.io/) for more details. |
|
|
|
## Model Sources |
|
|
|
<!-- Provide the basic links for the model. --> |
|
|
|
- **Project page:** https://vlf-silkie.github.io/ |
|
- **Dataset:** https://huggingface.co/datasets/MMInstruction/VLFeedback |
|
- **Paper:** Coming soon. |
|
- **Repository:** Coming soon. |
|
|
|
## Uses |
|
|
|
<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. --> |
|
|
|
Silkie is intended for research purposes, particularly for alignment research in multimodal models. |
|
|
|
## How to Get Started |
|
|
|
Below is a simple Python code snippet to get started with the model. |
|
|
|
```python |
|
from transformers import AutoModelForCausalLM, AutoTokenizer |
|
|
|
tokenizer = AutoTokenizer.from_pretrained( |
|
"MMInstruction/Silkie", trust_remote_code=True |
|
) |
|
model = AutoModelForCausalLM.from_pretrained( |
|
"MMInstruction/Silkie", device_map="cuda", trust_remote_code=True |
|
).eval() |
|
query = tokenizer.from_list_format( |
|
[ |
|
{"image": "https://farm8.staticflickr.com/137/383965780_db4815011c_o.jpg"}, |
|
{"text": "Which wooden stool has a vase with red flower on it?"}, |
|
] |
|
) |
|
response, history = model.chat(tokenizer, query=query, history=None) |
|
``` |
|
|
|
## Citation |
|
|
|
``` |
|
Coming soon. |
|
``` |
|
|