Spaces:

BAAI
/

open_flageval_vlm_leaderboard

Running

App Files Files Community

lixuejing commited on 3 days ago

Commit

01ddf5c

1 Parent(s): 79840d6

update

Browse files

Files changed (1) hide show

src/about.py +114 -15

src/about.py CHANGED Viewed

@@ -106,28 +106,127 @@ EVALUATION_QUEUE_TEXT = """
 ## Evaluation Queue for the FlagEval VLM Leaderboard
 Models added here will be automatically evaluated on the FlagEval cluster.
-Currently, we provide two methods for model evaluation, including API calling and private deployment. If you choose to evaluate via API calling, you need to provide the model's interface, name, and corresponding API KEY.
-### 1) Make sure you can load your model and tokenizer using AutoClasses:
 ```python
-from transformers import AutoConfig, AutoModel, AutoTokenizer
-config = AutoConfig.from_pretrained("your model name", revision=revision)
-model = AutoModel.from_pretrained("your model name", revision=revision)
-tokenizer = AutoTokenizer.from_pretrained("your model name", revision=revision)
 ```
-If this step fails, follow the error messages to debug your model before submitting it. It's likely your model has been improperly uploaded.
-Note: make sure your model is public!
-Note: if your model needs `use_remote_code=True`, we do not support this option yet but we are working on adding it, stay posted!
-### 2) Convert your model weights to [safetensors](https://huggingface.co/docs/safetensors/index)
-It's a new format for storing weights which is safer and faster to load and use. It will also allow us to add the number of parameters of your model to the `Extended Viewer`!
-### 3) Make sure your model has an open license!
-This is a leaderboard for Open LLMs, and we'd love for as many people as possible to know they can use your model 🤗
-### 4) Fill up your model card
-When we add extra information about models to the leaderboard, it will be automatically taken from the model card
 ## In case of model failure
 If your model is displayed in the `FAILED` category, its execution stopped.

 ## Evaluation Queue for the FlagEval VLM Leaderboard
 Models added here will be automatically evaluated on the FlagEval cluster.
+Currently, we offer two methods for model evaluation, including API calls and private deployments:
+  1. If you choose to evaluate via API call, you need to provide the Model interface, name and corresponding API key.
+  2. If you choose to do open source model evaluation directly through huggingface, you don't need to fill in the Model online api url and Model online api key.
+## Open API model Integration Documentation
+For models accessed via API calls (such as OpenAI API, Anthropic API, etc.), the integration process is straightforward and only requires providing necessary configuration information.
+1. model_name: Name of the model to use
+2. api_key: API access key
+3. api_base: Base URL for the API service
+## Adding a Custom Model to the Platform
+This guide explains how to integrate your custom model into the platform by implementing a model adapter and run.sh script. We'll use the Qwen-VL implementation as a reference example.
+### Overview
+To add your custom model, you need to:
+1. Create a custom dataset class
+2. Implement a model adapter class
+3. Set up the initialization and inference pipeline
+### Step-by-Step Implementation
+Here is an example:[model_adapter.py](https://github.com/flageval-baai/FlagEvalMM/blob/main/model_zoo/vlm/qwen_vl/model_adapter.py)
+#### 1. Create Preprocess Custom Dataset Class
+Inherit from `ServerDataset` to handle data loading:
 ```python
+# model_adapter.py
+class CustomDataset(ServerDataset):
+    def __getitem__(self, index):
+        data = self.get_data(index)
+        question_id = data["question_id"]
+        img_path = data["img_path"]
+        qs = data["question"]
+        qs, idx = process_images_symbol(qs)
+        idx = set(idx)
+        img_path_idx = []
+        for i in idx:
+            if i < len(img_path):
+                img_path_idx.append(img_path[i])
+            else:
+                print("[warning] image index out of range")
+        return question_id, img_path_idx, qs
 ```
+The function `get_data` returns a structure like this:
+```python
+{
+    "img_path": "A list where each element is an absolute path to an image that can be read directly using PIL, cv2, etc.",
+    "question": "A string containing the question, where image positions are marked with <image1> <image2>",
+    "question_id": "question_id",
+    "type": "A string indicating the type of question"
+}
+```
+#### 2. Implement Model Adapter
+Inherit from `BaseModelAdapter` and implement the required methods:
+1. model_init: is responsible for model initialization and serves as the entry point for model loading and setup.
+2. run_one_task: implements the inference pipeline, handling data processing and result generation for a single evaluation task.
+```python
+# model_adapter.py
+class ModelAdapter(BaseModelAdapter):
+    def model_init(self, task_info: Dict):
+        ckpt_path = task_info["model_path"]
+        '''
+        Initialize the model and processor here.
+        Load your pre-trained model and any required processing tools using the provided checkpoint path.
+        '''
+    def run_one_task(self, task_name: str, meta_info: Dict[str, Any]):
+        results = []
+        cnt = 0
+        data_loader = self.create_data_loader(
+            CustomDataset, task_name, batch_size=1, num_workers=0
+        )
+        for question_id, img_path, qs in data_loader:
+            '''
+            Perform model inference here.
+            Use the model to generate the 'answer' variable for the given inputs (e.g., question_id, image path, question).
+            '''
+            results.append(
+                {"question_id": question_id, "answer": answer}
+            )
+        self.save_result(results, meta_info, rank=rank)
+        '''
+        Save the inference results.
+        Use the provided meta_info and rank parameters to manage result storage as needed.
+        '''
+```
+Note:
+`results` is a list of dictionaries
+Each dictionary must contain two keys:
+```python
+question_id: identifies the specific question
+answer: contains the model's prediction/output
+```
+After collecting all results, they are saved using `save_result()`
+#### 3.  Launch Script (run.sh)
+run.sh is the entry script for launching model evaluation, used to set environment variables and start the evaluation program.
+```python
+#!/bin/bash
+current_file="$0"
+current_dir="$(dirname "$current_file")"
+SERVER_IP=$1
+SERVER_PORT=$2
+PYTHONPATH=$current_dir:$PYTHONPATH python $current_dir/model_adapter.py \
+    --server_ip $SERVER_IP \
+    --server_port $SERVER_PORT \
+    "${@:3}"
+```
 ## In case of model failure
 If your model is displayed in the `FAILED` category, its execution stopped.