COIG-P: A High-Quality and Large-Scale Chinese Preference Dataset for Alignment with Human Values Paper • 2504.05535 • Published 4 days ago • 38
SuperGPQA: Scaling LLM Evaluation across 285 Graduate Disciplines Paper • 2502.14739 • Published Feb 20 • 102
LongEval: A Comprehensive Analysis of Long-Text Generation Through a Plan-based Paradigm Paper • 2502.19103 • Published Feb 26 • 2
SciMMIR: Benchmarking Scientific Multi-modal Information Retrieval Paper • 2401.13478 • Published Jan 24, 2024 • 2
MMRA: A Benchmark for Multi-granularity Multi-image Relational Association Paper • 2407.17379 • Published Jul 24, 2024 • 3
A Comparative Study on Reasoning Patterns of OpenAI's o1 Model Paper • 2410.13639 • Published Oct 17, 2024 • 18
OmniBench: Towards The Future of Universal Omni-Language Models Paper • 2409.15272 • Published Sep 23, 2024 • 31