lixuejing commited on
Commit
01ddf5c
·
1 Parent(s): 79840d6
Files changed (1) hide show
  1. src/about.py +114 -15
src/about.py CHANGED
@@ -106,28 +106,127 @@ EVALUATION_QUEUE_TEXT = """
106
  ## Evaluation Queue for the FlagEval VLM Leaderboard
107
  Models added here will be automatically evaluated on the FlagEval cluster.
108
 
109
- Currently, we provide two methods for model evaluation, including API calling and private deployment. If you choose to evaluate via API calling, you need to provide the model's interface, name, and corresponding API KEY.
 
 
110
 
111
- ### 1) Make sure you can load your model and tokenizer using AutoClasses:
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
112
  ```python
113
- from transformers import AutoConfig, AutoModel, AutoTokenizer
114
- config = AutoConfig.from_pretrained("your model name", revision=revision)
115
- model = AutoModel.from_pretrained("your model name", revision=revision)
116
- tokenizer = AutoTokenizer.from_pretrained("your model name", revision=revision)
 
 
 
 
 
 
 
 
 
 
 
 
117
  ```
118
- If this step fails, follow the error messages to debug your model before submitting it. It's likely your model has been improperly uploaded.
119
 
120
- Note: make sure your model is public!
121
- Note: if your model needs `use_remote_code=True`, we do not support this option yet but we are working on adding it, stay posted!
 
 
 
 
 
 
 
122
 
123
- ### 2) Convert your model weights to [safetensors](https://huggingface.co/docs/safetensors/index)
124
- It's a new format for storing weights which is safer and faster to load and use. It will also allow us to add the number of parameters of your model to the `Extended Viewer`!
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
125
 
126
- ### 3) Make sure your model has an open license!
127
- This is a leaderboard for Open LLMs, and we'd love for as many people as possible to know they can use your model 🤗
128
 
129
- ### 4) Fill up your model card
130
- When we add extra information about models to the leaderboard, it will be automatically taken from the model card
 
 
 
 
 
 
 
 
 
131
 
132
  ## In case of model failure
133
  If your model is displayed in the `FAILED` category, its execution stopped.
 
106
  ## Evaluation Queue for the FlagEval VLM Leaderboard
107
  Models added here will be automatically evaluated on the FlagEval cluster.
108
 
109
+ Currently, we offer two methods for model evaluation, including API calls and private deployments:
110
+ 1. If you choose to evaluate via API call, you need to provide the Model interface, name and corresponding API key.
111
+ 2. If you choose to do open source model evaluation directly through huggingface, you don't need to fill in the Model online api url and Model online api key.
112
 
113
+ ## Open API model Integration Documentation
114
+
115
+ For models accessed via API calls (such as OpenAI API, Anthropic API, etc.), the integration process is straightforward and only requires providing necessary configuration information.
116
+ 1. model_name: Name of the model to use
117
+ 2. api_key: API access key
118
+ 3. api_base: Base URL for the API service
119
+
120
+ ## Adding a Custom Model to the Platform
121
+
122
+ This guide explains how to integrate your custom model into the platform by implementing a model adapter and run.sh script. We'll use the Qwen-VL implementation as a reference example.
123
+
124
+ ### Overview
125
+
126
+ To add your custom model, you need to:
127
+ 1. Create a custom dataset class
128
+ 2. Implement a model adapter class
129
+ 3. Set up the initialization and inference pipeline
130
+
131
+ ### Step-by-Step Implementation
132
+
133
+ Here is an example:[model_adapter.py](https://github.com/flageval-baai/FlagEvalMM/blob/main/model_zoo/vlm/qwen_vl/model_adapter.py)
134
+
135
+ #### 1. Create Preprocess Custom Dataset Class
136
+
137
+ Inherit from `ServerDataset` to handle data loading:
138
  ```python
139
+ # model_adapter.py
140
+ class CustomDataset(ServerDataset):
141
+ def __getitem__(self, index):
142
+ data = self.get_data(index)
143
+ question_id = data["question_id"]
144
+ img_path = data["img_path"]
145
+ qs = data["question"]
146
+ qs, idx = process_images_symbol(qs)
147
+ idx = set(idx)
148
+ img_path_idx = []
149
+ for i in idx:
150
+ if i < len(img_path):
151
+ img_path_idx.append(img_path[i])
152
+ else:
153
+ print("[warning] image index out of range")
154
+ return question_id, img_path_idx, qs
155
  ```
 
156
 
157
+ The function `get_data` returns a structure like this:
158
+ ```python
159
+ {
160
+ "img_path": "A list where each element is an absolute path to an image that can be read directly using PIL, cv2, etc.",
161
+ "question": "A string containing the question, where image positions are marked with <image1> <image2>",
162
+ "question_id": "question_id",
163
+ "type": "A string indicating the type of question"
164
+ }
165
+ ```
166
 
167
+ #### 2. Implement Model Adapter
168
+
169
+ Inherit from `BaseModelAdapter` and implement the required methods:
170
+ 1. model_init: is responsible for model initialization and serves as the entry point for model loading and setup.
171
+ 2. run_one_task: implements the inference pipeline, handling data processing and result generation for a single evaluation task.
172
+ ```python
173
+ # model_adapter.py
174
+ class ModelAdapter(BaseModelAdapter):
175
+ def model_init(self, task_info: Dict):
176
+ ckpt_path = task_info["model_path"]
177
+ '''
178
+ Initialize the model and processor here.
179
+ Load your pre-trained model and any required processing tools using the provided checkpoint path.
180
+ '''
181
+
182
+ def run_one_task(self, task_name: str, meta_info: Dict[str, Any]):
183
+ results = []
184
+ cnt = 0
185
+
186
+ data_loader = self.create_data_loader(
187
+ CustomDataset, task_name, batch_size=1, num_workers=0
188
+ )
189
+
190
+ for question_id, img_path, qs in data_loader:
191
+
192
+ '''
193
+ Perform model inference here.
194
+ Use the model to generate the 'answer' variable for the given inputs (e.g., question_id, image path, question).
195
+ '''
196
+
197
+ results.append(
198
+ {"question_id": question_id, "answer": answer}
199
+ )
200
+
201
+ self.save_result(results, meta_info, rank=rank)
202
+ '''
203
+ Save the inference results.
204
+ Use the provided meta_info and rank parameters to manage result storage as needed.
205
+ '''
206
+ ```
207
+ Note:
208
+ `results` is a list of dictionaries
209
+ Each dictionary must contain two keys:
210
+ ```python
211
+ question_id: identifies the specific question
212
+ answer: contains the model's prediction/output
213
+ ```
214
+ After collecting all results, they are saved using `save_result()`
215
 
216
+ #### 3. Launch Script (run.sh)
217
+ run.sh is the entry script for launching model evaluation, used to set environment variables and start the evaluation program.
218
 
219
+ ```python
220
+ #!/bin/bash
221
+ current_file="$0"
222
+ current_dir="$(dirname "$current_file")"
223
+ SERVER_IP=$1
224
+ SERVER_PORT=$2
225
+ PYTHONPATH=$current_dir:$PYTHONPATH python $current_dir/model_adapter.py \
226
+ --server_ip $SERVER_IP \
227
+ --server_port $SERVER_PORT \
228
+ "${@:3}"
229
+ ```
230
 
231
  ## In case of model failure
232
  If your model is displayed in the `FAILED` category, its execution stopped.