Spaces:
Running
Running
Update content.py
Browse files- content.py +17 -6
content.py
CHANGED
@@ -13,9 +13,7 @@ Bottom_logo = f'''<img src="data:image/jpeg;base64,{bottom_logo}" style="width:2
|
|
13 |
intro_md = f'''
|
14 |
# {benchname} Leaderboard
|
15 |
|
16 |
-
|
17 |
-
* [π§ͺ Evaluation Code](https://github.com/maum-ai/KOFFVQA)
|
18 |
-
* [π Report](https://arxiv.org/abs/2503.23730)
|
19 |
|
20 |
{benchname}π is a Free-Form VQA benchmark dataset designed to evaluate Vision-Language Models (VLMs) in Korean language environments. Unlike traditional multiple-choice or predefined answer formats, KOFFVQA challenges models to generate open-ended, natural-language answers to visually grounded questions. This allows for a more comprehensive assessment of a model's ability to understand and generate nuanced Korean responses.
|
21 |
|
@@ -34,22 +32,35 @@ The {benchname} benchmark is designed to evaluate and compare the performance of
|
|
34 |
This benchmark includes a total of 275 Korean questions across 10 tasks. The questions are open-ended, free-form VQA (Visual Question Answering) with objective answers, allowing responses without strict format constraints.
|
35 |
|
36 |
## News
|
|
|
37 |
|
38 |
-
* **2025-04-01** : Our paper [KOFFVQA: An Objectively Evaluated Free-form VQA Benchmark for Large Vision-Language Models in the Korean Language](https://arxiv.org/abs/2503.23730) has released and accepted to CVPRW 2025, Workshop on Benchmarking and Expanding AI Multimodal Approaches(BEAM 2025) π
|
39 |
|
40 |
* **2025-01-21**: [Evaluation code](https://github.com/maum-ai/KOFFVQA) and [dataset](https://huggingface.co/datasets/maum-ai/KOFFVQA_Data) release
|
41 |
|
42 |
* **2024-12-06**: Leaderboard Release!
|
43 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
44 |
'''.strip()
|
45 |
|
46 |
submit_md = f'''
|
47 |
|
48 |
-
# Submit
|
49 |
|
50 |
We are not accepting model addition requests at the moment. Once the request system is established, we will start accepting requests.
|
51 |
|
52 |
-
π
|
53 |
|
54 |
π§ββοΈ We currently use google/gemma-2-9b-it as the judge model, so there's no need to worry about API keys or usage fees.
|
55 |
|
|
|
13 |
intro_md = f'''
|
14 |
# {benchname} Leaderboard
|
15 |
|
16 |
+
[**π Leaderboard**](https://huggingface.co/spaces/maum-ai/KOFFVQA-Leaderboard) | [**π KOFFVQA Arxiv**](https://arxiv.org/abs/2503.23730) | [**π€ KOFFVQA Dataset**](https://huggingface.co/datasets/maum-ai/KOFFVQA_Data)
|
|
|
|
|
17 |
|
18 |
{benchname}π is a Free-Form VQA benchmark dataset designed to evaluate Vision-Language Models (VLMs) in Korean language environments. Unlike traditional multiple-choice or predefined answer formats, KOFFVQA challenges models to generate open-ended, natural-language answers to visually grounded questions. This allows for a more comprehensive assessment of a model's ability to understand and generate nuanced Korean responses.
|
19 |
|
|
|
32 |
This benchmark includes a total of 275 Korean questions across 10 tasks. The questions are open-ended, free-form VQA (Visual Question Answering) with objective answers, allowing responses without strict format constraints.
|
33 |
|
34 |
## News
|
35 |
+
* **2025-04-25** : Our [leaderboard](https://huggingface.co/spaces/maum-ai/KOFFVQA-Leaderboard) currently finished evaluating total **81** of famous vlm around open- or close- sourced model. Also refactoring the evaluation code to make it easier to use and evaluate much more diverse models.
|
36 |
|
37 |
+
* **2025-04-01** : Our paper [KOFFVQA: An Objectively Evaluated Free-form VQA Benchmark for Large Vision-Language Models in the Korean Language](https://arxiv.org/abs/2503.23730) has been released and accepted to CVPRW 2025, Workshop on Benchmarking and Expanding AI Multimodal Approaches(BEAM 2025) π
|
38 |
|
39 |
* **2025-01-21**: [Evaluation code](https://github.com/maum-ai/KOFFVQA) and [dataset](https://huggingface.co/datasets/maum-ai/KOFFVQA_Data) release
|
40 |
|
41 |
* **2024-12-06**: Leaderboard Release!
|
42 |
|
43 |
+
## Citation
|
44 |
+
|
45 |
+
**BibTeX:**
|
46 |
+
```bibtex
|
47 |
+
@article{kim2025koffvqa,
|
48 |
+
title={KOFFVQA: An Objectively Evaluated Free-form VQA Benchmark for Large Vision-Language Models in the Korean Language},
|
49 |
+
author={Kim, Yoonshik and Jung, Jaeyoon},
|
50 |
+
journal={arXiv preprint arXiv:2503.23730},
|
51 |
+
year={2025}
|
52 |
+
}
|
53 |
+
```
|
54 |
+
|
55 |
'''.strip()
|
56 |
|
57 |
submit_md = f'''
|
58 |
|
59 |
+
# Submit
|
60 |
|
61 |
We are not accepting model addition requests at the moment. Once the request system is established, we will start accepting requests.
|
62 |
|
63 |
+
π Wondering how your VLM stacks up in Korean? Just run it with our evaluation code and get your scoreβno API key needed!
|
64 |
|
65 |
π§ββοΈ We currently use google/gemma-2-9b-it as the judge model, so there's no need to worry about API keys or usage fees.
|
66 |
|