MILVLG
/

Imp-v1.5-2B-Qwen1.5

Text Generation

Transformers

Safetensors

imp_qwen2

conversational

custom_code

Model card Files Files and versions Community

Oyoy1235 commited on May 21, 2024

Commit

d80fe34

1 Parent(s): cb298cd

update readme

Browse files

Files changed (1) hide show

README.md +4 -4

README.md CHANGED Viewed

@@ -12,7 +12,7 @@ datasets:
 > &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;——*George R.R. Martin, A Clash of Kings*
-\[Technical report (coming soon)\]&nbsp;&nbsp;[[Demo](https://xmbot.net/imp/)\]&nbsp;&nbsp;[[Github](https://github.com/MILVLG/imp)\]
 ## Introduction
@@ -28,7 +28,7 @@ We release our model weights and provide an example below to run our model . Det
 **Install dependencies**
 ```bash
-pip install transformers # latest version is ok, but we recommend v4.31.0
 pip install -q pillow accelerate einops
 ```
@@ -50,7 +50,7 @@ model = AutoModelForCausalLM.from_pretrained(
 tokenizer = AutoTokenizer.from_pretrained("MILVLG/Imp-v1.5-2B-Qwen1.5", trust_remote_code=True)
 #Set inputs
-text = "A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions. USER: <image>\nWhat are the colors of the bus in the image? ASSISTANT:"
 image = Image.open("images/bus.jpg")
 input_ids = tokenizer(text, return_tensors='pt').input_ids
@@ -71,7 +71,7 @@ We conduct evaluation on 9 commonly-used benchmarks, including 5 academic VQA be
 | Models | Size | VQAv2 | GQA | SQA(IMG) | TextVQA | POPE |  MME(P) | MMB  |MMBCN  |MM-Vet|
 |:--------:|:-----:|:----:|:-------------:|:--------:|:-----:|:----:|:-------:|:-------:|:-------:|:-------:|
 | [LLaVA-v1.5-lora](https://huggingface.co/liuhaotian/llava-v1.5-7b) | 7B |79.10 | 63.00|  68.40 |58.20| 86.40 | 1476.9 | 66.10 |- |30.2|
-| **Imp-v1.5-2B-Qwen1.5** | 3B | **81.18**  | **63.54** | **72.78**| **59.84** | **88.87**| **1446.4** | **72.94**| 46.65 |**43.3**|
 ## License
 This project is licensed under the Apache License 2.0 - see the [LICENSE](https://www.apache.org/licenses/LICENSE-2.0) file for details.

 > &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;——*George R.R. Martin, A Clash of Kings*
+\[[Paper](https://arxiv.org/abs/2405.12107)\]&nbsp;&nbsp;[[Demo](https://xmbot.net/imp/)\]&nbsp;&nbsp;[[Github](https://github.com/MILVLG/imp)\]
 ## Introduction
 **Install dependencies**
 ```bash
+pip install transformers # latest version is ok, but we recommend v4.36.0
 pip install -q pillow accelerate einops
 ```
 tokenizer = AutoTokenizer.from_pretrained("MILVLG/Imp-v1.5-2B-Qwen1.5", trust_remote_code=True)
 #Set inputs
+text = "<|im_start|>system\nA chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions.<|im_end|>\n<|im_start|>user\n<image>\nWhat are the colors of the bus in the image?<|im_end|>\n<|im_start|>assistant"
 image = Image.open("images/bus.jpg")
 input_ids = tokenizer(text, return_tensors='pt').input_ids
 | Models | Size | VQAv2 | GQA | SQA(IMG) | TextVQA | POPE |  MME(P) | MMB  |MMBCN  |MM-Vet|
 |:--------:|:-----:|:----:|:-------------:|:--------:|:-----:|:----:|:-------:|:-------:|:-------:|:-------:|
 | [LLaVA-v1.5-lora](https://huggingface.co/liuhaotian/llava-v1.5-7b) | 7B |79.10 | 63.00|  68.40 |58.20| 86.40 | 1476.9 | 66.10 |- |30.2|
+| **Imp-v1.5-2B-Qwen1.5** | 3B | 79.2 | 61.93 | 66.14| 54.52 | 86.74| 1304.8 | 63.83| 61.34 |33.5|
 ## License
 This project is licensed under the Apache License 2.0 - see the [LICENSE](https://www.apache.org/licenses/LICENSE-2.0) file for details.