Learn-by-interact: A Data-Centric Framework for Self-Adaptive Agents in Realistic Environments Paper ā¢ 2501.10893 ā¢ Published 8 days ago ā¢ 22
Mobile-Agent-E: Self-Evolving Mobile Assistant for Complex Tasks Paper ā¢ 2501.11733 ā¢ Published 6 days ago ā¢ 25
UI-TARS: Pioneering Automated GUI Interaction with Native Agents Paper ā¢ 2501.12326 ā¢ Published 5 days ago ā¢ 45
2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining Paper ā¢ 2501.00958 ā¢ Published 25 days ago ā¢ 98
LLaVA-Mini: Efficient Image and Video Large Multimodal Models with One Vision Token Paper ā¢ 2501.03895 ā¢ Published 19 days ago ā¢ 48
Table Transformer Collection The Table Transformer (TATR) is a series of object detection models useful for table extraction from PDF images. ā¢ 5 items ā¢ Updated 18 days ago ā¢ 21