CIBench: Evaluating Your LLMs with a Code Interpreter Plugin Paper • 2407.10499 • Published Jul 15
MindSearch: Mimicking Human Minds Elicits Deep AI Searcher Paper • 2407.20183 • Published Jul 29 • 40
MultiModal-GPT: A Vision and Language Model for Dialogue with Humans Paper • 2305.04790 • Published May 8, 2023 • 1
T-Eval: Evaluating the Tool Utilization Capability Step by Step Paper • 2312.14033 • Published Dec 21, 2023 • 2
InternLM-Math: Open Math Large Language Models Toward Verifiable Reasoning Paper • 2402.06332 • Published Feb 9 • 18
Agent-FLAN: Designing Data and Methods of Effective Agent Tuning for Large Language Models Paper • 2403.12881 • Published Mar 19 • 16
AlchemistCoder: Harmonizing and Eliciting Code Capability by Hindsight Tuning on Multi-source Data Paper • 2405.19265 • Published May 29
MindSearch: Mimicking Human Minds Elicits Deep AI Searcher Paper • 2407.20183 • Published Jul 29 • 40
ShareGPT4Video: Improving Video Understanding and Generation with Better Captions Paper • 2406.04325 • Published Jun 6 • 72