Advances and Challenges in Foundation Agents: From Brain-Inspired Intelligence to Evolutionary, Collaborative, and Safe Systems Paper • 2504.01990 • Published 9 days ago • 218
Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling Paper • 2412.05271 • Published Dec 6, 2024 • 152
AlignVLM: Bridging Vision and Language Latent Spaces for Multimodal Understanding Paper • 2502.01341 • Published Feb 3 • 39
Wizard Models Collection Replica of the official repository for research purposes • 6 items • Updated Jun 20, 2024 • 1
LazyLLM: Dynamic Token Pruning for Efficient Long Context LLM Inference Paper • 2407.14057 • Published Jul 19, 2024 • 46
VCR: Visual Caption Restoration Collection All configurations for VCR: Visual Caption Restoration (arXiv:2406.06462). • 8 items • Updated Jul 31, 2024 • 2
VCR: Visual Caption Restoration (Smaller Test Subsets) Collection This space contains smaller test subsets (first 100 / first 500) of all VCR-Wiki configurations. • 8 items • Updated Jun 11, 2024 • 2
Meta Llama 3 Collection This collection hosts the transformers and original repos of the Meta Llama 3 and Llama Guard 2 releases • 5 items • Updated Dec 6, 2024 • 739