Silkie / README.md
Zhihui's picture
Create README.md
45d883d
|
raw
history blame
2.42 kB
metadata
datasets:
  - MMInstruction/VLFeedback

Model Card for Silkie

Silkie is a visual language model trained using preference distillation on GPT-4V annotated AI feedback. It is a fine-tuned version of Qwen/Qwen-VL-Chat and was trained on our MMInstruction/VLFeedback dataset with direct preference optimization (DPO). Silkie is a visual language model trained by preference distillation on GPT-4V annotated AI feedback. It is a fine-tuned version of Qwen/Qwen-VL-Chat that is trained on our MMInstruction/VLFeedback dataset with direct preference optimization (DPO). Compared with the original model, Silkile achieves 6.9% and 9.5% relative improvement on the MME benchmark regarding the perception and cognition capabilities, respectively. Besides, Silkie sets a new state-of-the-art score of 3.02 on MMHal-Bench regarding hallucination evaluation. Please refer to our project page for more details.

Model Sources

Uses

Silkie is intended for research purposes, particularly for alignment research in multimodal models.

How to Get Started

Below is a simple Python code snippet to get started with the model.

from transformers import AutoModelForCausalLM, AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained(
    "MMInstruction/Silkie", trust_remote_code=True
)
model = AutoModelForCausalLM.from_pretrained(
    "MMInstruction/Silkie", device_map="cuda", trust_remote_code=True
).eval()
query = tokenizer.from_list_format(
    [
        {"image": "https://farm8.staticflickr.com/137/383965780_db4815011c_o.jpg"},
        {"text": "Which wooden stool has a vase with red flower on it?"},
    ]
)
response, history = model.chat(tokenizer, query=query, history=None)

Citation

Coming soon.