File size: 2,418 Bytes
45d883d |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 |
---
datasets:
- MMInstruction/VLFeedback
---
# Model Card for Silkie
<!-- Provide a quick summary of what the model is/does. -->
Silkie is a visual language model trained using preference distillation on GPT-4V annotated AI feedback. It is a fine-tuned version of [Qwen/Qwen-VL-Chat](https://huggingface.co/Qwen/Qwen-VL-Chat) and was trained on our [MMInstruction/VLFeedback](https://huggingface.co/datasets/MMInstruction/VLFeedback) dataset with direct preference optimization (DPO). Silkie is a visual language model trained by preference distillation on GPT-4V annotated AI feedback. It is a fine-tuned version of [Qwen/Qwen-VL-Chat](https://huggingface.co/Qwen/Qwen-VL-Chat) that is trained on our [MMInstruction/VLFeedback](https://huggingface.co/datasets/MMInstruction/VLFeedback) dataset with direct preference optimization (DPO). Compared with the original model, Silkile achieves 6.9% and 9.5% relative improvement on the MME benchmark regarding the perception and cognition capabilities, respectively. Besides, Silkie sets a new state-of-the-art score of 3.02 on MMHal-Bench regarding hallucination evaluation. Please refer to our [project page](https://vlf-silkie.github.io/) for more details.
## Model Sources
<!-- Provide the basic links for the model. -->
- **Project page:** https://vlf-silkie.github.io/
- **Dataset:** https://huggingface.co/datasets/MMInstruction/VLFeedback
- **Paper:** Coming soon.
- **Repository:** Coming soon.
## Uses
<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
Silkie is intended for research purposes, particularly for alignment research in multimodal models.
## How to Get Started
Below is a simple Python code snippet to get started with the model.
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained(
"MMInstruction/Silkie", trust_remote_code=True
)
model = AutoModelForCausalLM.from_pretrained(
"MMInstruction/Silkie", device_map="cuda", trust_remote_code=True
).eval()
query = tokenizer.from_list_format(
[
{"image": "https://farm8.staticflickr.com/137/383965780_db4815011c_o.jpg"},
{"text": "Which wooden stool has a vase with red flower on it?"},
]
)
response, history = model.chat(tokenizer, query=query, history=None)
```
## Citation
```
Coming soon.
```
|