lixuejing
commited on
Commit
·
01ddf5c
1
Parent(s):
79840d6
update
Browse files- src/about.py +114 -15
src/about.py
CHANGED
@@ -106,28 +106,127 @@ EVALUATION_QUEUE_TEXT = """
|
|
106 |
## Evaluation Queue for the FlagEval VLM Leaderboard
|
107 |
Models added here will be automatically evaluated on the FlagEval cluster.
|
108 |
|
109 |
-
Currently, we
|
|
|
|
|
110 |
|
111 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
112 |
```python
|
113 |
-
|
114 |
-
|
115 |
-
|
116 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
117 |
```
|
118 |
-
If this step fails, follow the error messages to debug your model before submitting it. It's likely your model has been improperly uploaded.
|
119 |
|
120 |
-
|
121 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
122 |
|
123 |
-
|
124 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
125 |
|
126 |
-
|
127 |
-
|
128 |
|
129 |
-
|
130 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
131 |
|
132 |
## In case of model failure
|
133 |
If your model is displayed in the `FAILED` category, its execution stopped.
|
|
|
106 |
## Evaluation Queue for the FlagEval VLM Leaderboard
|
107 |
Models added here will be automatically evaluated on the FlagEval cluster.
|
108 |
|
109 |
+
Currently, we offer two methods for model evaluation, including API calls and private deployments:
|
110 |
+
1. If you choose to evaluate via API call, you need to provide the Model interface, name and corresponding API key.
|
111 |
+
2. If you choose to do open source model evaluation directly through huggingface, you don't need to fill in the Model online api url and Model online api key.
|
112 |
|
113 |
+
## Open API model Integration Documentation
|
114 |
+
|
115 |
+
For models accessed via API calls (such as OpenAI API, Anthropic API, etc.), the integration process is straightforward and only requires providing necessary configuration information.
|
116 |
+
1. model_name: Name of the model to use
|
117 |
+
2. api_key: API access key
|
118 |
+
3. api_base: Base URL for the API service
|
119 |
+
|
120 |
+
## Adding a Custom Model to the Platform
|
121 |
+
|
122 |
+
This guide explains how to integrate your custom model into the platform by implementing a model adapter and run.sh script. We'll use the Qwen-VL implementation as a reference example.
|
123 |
+
|
124 |
+
### Overview
|
125 |
+
|
126 |
+
To add your custom model, you need to:
|
127 |
+
1. Create a custom dataset class
|
128 |
+
2. Implement a model adapter class
|
129 |
+
3. Set up the initialization and inference pipeline
|
130 |
+
|
131 |
+
### Step-by-Step Implementation
|
132 |
+
|
133 |
+
Here is an example:[model_adapter.py](https://github.com/flageval-baai/FlagEvalMM/blob/main/model_zoo/vlm/qwen_vl/model_adapter.py)
|
134 |
+
|
135 |
+
#### 1. Create Preprocess Custom Dataset Class
|
136 |
+
|
137 |
+
Inherit from `ServerDataset` to handle data loading:
|
138 |
```python
|
139 |
+
# model_adapter.py
|
140 |
+
class CustomDataset(ServerDataset):
|
141 |
+
def __getitem__(self, index):
|
142 |
+
data = self.get_data(index)
|
143 |
+
question_id = data["question_id"]
|
144 |
+
img_path = data["img_path"]
|
145 |
+
qs = data["question"]
|
146 |
+
qs, idx = process_images_symbol(qs)
|
147 |
+
idx = set(idx)
|
148 |
+
img_path_idx = []
|
149 |
+
for i in idx:
|
150 |
+
if i < len(img_path):
|
151 |
+
img_path_idx.append(img_path[i])
|
152 |
+
else:
|
153 |
+
print("[warning] image index out of range")
|
154 |
+
return question_id, img_path_idx, qs
|
155 |
```
|
|
|
156 |
|
157 |
+
The function `get_data` returns a structure like this:
|
158 |
+
```python
|
159 |
+
{
|
160 |
+
"img_path": "A list where each element is an absolute path to an image that can be read directly using PIL, cv2, etc.",
|
161 |
+
"question": "A string containing the question, where image positions are marked with <image1> <image2>",
|
162 |
+
"question_id": "question_id",
|
163 |
+
"type": "A string indicating the type of question"
|
164 |
+
}
|
165 |
+
```
|
166 |
|
167 |
+
#### 2. Implement Model Adapter
|
168 |
+
|
169 |
+
Inherit from `BaseModelAdapter` and implement the required methods:
|
170 |
+
1. model_init: is responsible for model initialization and serves as the entry point for model loading and setup.
|
171 |
+
2. run_one_task: implements the inference pipeline, handling data processing and result generation for a single evaluation task.
|
172 |
+
```python
|
173 |
+
# model_adapter.py
|
174 |
+
class ModelAdapter(BaseModelAdapter):
|
175 |
+
def model_init(self, task_info: Dict):
|
176 |
+
ckpt_path = task_info["model_path"]
|
177 |
+
'''
|
178 |
+
Initialize the model and processor here.
|
179 |
+
Load your pre-trained model and any required processing tools using the provided checkpoint path.
|
180 |
+
'''
|
181 |
+
|
182 |
+
def run_one_task(self, task_name: str, meta_info: Dict[str, Any]):
|
183 |
+
results = []
|
184 |
+
cnt = 0
|
185 |
+
|
186 |
+
data_loader = self.create_data_loader(
|
187 |
+
CustomDataset, task_name, batch_size=1, num_workers=0
|
188 |
+
)
|
189 |
+
|
190 |
+
for question_id, img_path, qs in data_loader:
|
191 |
+
|
192 |
+
'''
|
193 |
+
Perform model inference here.
|
194 |
+
Use the model to generate the 'answer' variable for the given inputs (e.g., question_id, image path, question).
|
195 |
+
'''
|
196 |
+
|
197 |
+
results.append(
|
198 |
+
{"question_id": question_id, "answer": answer}
|
199 |
+
)
|
200 |
+
|
201 |
+
self.save_result(results, meta_info, rank=rank)
|
202 |
+
'''
|
203 |
+
Save the inference results.
|
204 |
+
Use the provided meta_info and rank parameters to manage result storage as needed.
|
205 |
+
'''
|
206 |
+
```
|
207 |
+
Note:
|
208 |
+
`results` is a list of dictionaries
|
209 |
+
Each dictionary must contain two keys:
|
210 |
+
```python
|
211 |
+
question_id: identifies the specific question
|
212 |
+
answer: contains the model's prediction/output
|
213 |
+
```
|
214 |
+
After collecting all results, they are saved using `save_result()`
|
215 |
|
216 |
+
#### 3. Launch Script (run.sh)
|
217 |
+
run.sh is the entry script for launching model evaluation, used to set environment variables and start the evaluation program.
|
218 |
|
219 |
+
```python
|
220 |
+
#!/bin/bash
|
221 |
+
current_file="$0"
|
222 |
+
current_dir="$(dirname "$current_file")"
|
223 |
+
SERVER_IP=$1
|
224 |
+
SERVER_PORT=$2
|
225 |
+
PYTHONPATH=$current_dir:$PYTHONPATH python $current_dir/model_adapter.py \
|
226 |
+
--server_ip $SERVER_IP \
|
227 |
+
--server_port $SERVER_PORT \
|
228 |
+
"${@:3}"
|
229 |
+
```
|
230 |
|
231 |
## In case of model failure
|
232 |
If your model is displayed in the `FAILED` category, its execution stopped.
|