Diversifying Joint Vision-Language Tokenization Learning Paper • 2306.03421 • Published Jun 6, 2023 • 2
A Systematic Investigation of KB-Text Embedding Alignment at Scale Paper • 2106.01586 • Published Jun 3, 2021 • 1
Learning Sparse Mixture of Experts for Visual Question Answering Paper • 1909.09192 • Published Sep 19, 2019 • 1
Bringing Back the Context: Camera Trap Species Identification as Link Prediction on Multimodal Knowledge Graphs Paper • 2401.00608 • Published Dec 31, 2023 • 2
Explorer: Scaling Exploration-driven Web Trajectory Synthesis for Multimodal Web Agents Paper • 2502.11357 • Published Feb 17 • 10 • 2
Diversifying Joint Vision-Language Tokenization Learning Paper • 2306.03421 • Published Jun 6, 2023 • 2
A Systematic Investigation of KB-Text Embedding Alignment at Scale Paper • 2106.01586 • Published Jun 3, 2021 • 1
Bringing Back the Context: Camera Trap Species Identification as Link Prediction on Multimodal Knowledge Graphs Paper • 2401.00608 • Published Dec 31, 2023 • 2
A Retrieve-and-Read Framework for Knowledge Graph Link Prediction Paper • 2212.09724 • Published Dec 19, 2022 • 1
Explorer: Scaling Exploration-driven Web Trajectory Synthesis for Multimodal Web Agents Paper • 2502.11357 • Published Feb 17 • 10
Explorer: Scaling Exploration-driven Web Trajectory Synthesis for Multimodal Web Agents Paper • 2502.11357 • Published Feb 17 • 10
ScienceAgentBench: Toward Rigorous Assessment of Language Agents for Data-Driven Scientific Discovery Paper • 2410.05080 • Published Oct 7, 2024 • 21
Navigating the Digital World as Humans Do: Universal Visual Grounding for GUI Agents Paper • 2410.05243 • Published Oct 7, 2024 • 19
Ferret-UI: Grounded Mobile UI Understanding with Multimodal LLMs Paper • 2404.05719 • Published Apr 8, 2024 • 82