WritingBench: A Comprehensive Benchmark for Generative Writing Paper • 2503.05244 • Published 5 days ago • 14
Mobile-Agent-V: Learning Mobile Device Operation Through Video-Guided Multi-Agent Collaboration Paper • 2502.17110 • Published 16 days ago • 11
Mobile-Agent-V: Learning Mobile Device Operation Through Video-Guided Multi-Agent Collaboration Paper • 2502.17110 • Published 16 days ago • 11
PC-Agent: A Hierarchical Multi-Agent Collaboration Framework for Complex Task Automation on PC Paper • 2502.14282 • Published 21 days ago • 18
Mobile-Agent-E: Self-Evolving Mobile Assistant for Complex Tasks Paper • 2501.11733 • Published Jan 20 • 28
mPLUG-Owl: Modularization Empowers Large Language Models with Multimodality Paper • 2304.14178 • Published Apr 27, 2023 • 3
StructBERT: Incorporating Language Structures into Pre-training for Deep Language Understanding Paper • 1908.04577 • Published Aug 13, 2019
UReader: Universal OCR-free Visually-situated Language Understanding with Multimodal Large Language Model Paper • 2310.05126 • Published Oct 8, 2023 • 1
mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document Understanding Paper • 2307.02499 • Published Jul 4, 2023 • 15
BUS:Efficient and Effective Vision-language Pre-training with Bottom-Up Patch Summarization Paper • 2307.08504 • Published Jul 17, 2023
CValues: Measuring the Values of Chinese Large Language Models from Safety to Responsibility Paper • 2307.09705 • Published Jul 19, 2023 • 1
Enabling Weak LLMs to Judge Response Reliability via Meta Ranking Paper • 2402.12146 • Published Feb 19, 2024
Browse and Concentrate: Comprehending Multimodal Content via prior-LLM Context Fusion Paper • 2402.12195 • Published Feb 19, 2024
Evaluation and Analysis of Hallucination in Large Vision-Language Models Paper • 2308.15126 • Published Aug 29, 2023 • 1
ModelScope-Agent: Building Your Customizable Agent System with Open-source Large Language Models Paper • 2309.00986 • Published Sep 2, 2023 • 21
mPLUG-DocOwl 1.5: Unified Structure Learning for OCR-free Document Understanding Paper • 2403.12895 • Published Mar 19, 2024 • 32
TinyChart: Efficient Chart Understanding with Visual Token Merging and Program-of-Thoughts Learning Paper • 2404.16635 • Published Apr 25, 2024 • 2
HiTeA: Hierarchical Temporal-Aware Video-Language Pre-training Paper • 2212.14546 • Published Dec 30, 2022
mPLUG-2: A Modularized Multi-modal Foundation Model Across Text, Image and Video Paper • 2302.00402 • Published Feb 1, 2023