VideoLLaMA 3: Frontier Multimodal Foundation Models for Image and Video Understanding Paper • 2501.13106 • Published 2 days ago • 61
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning Paper • 2501.12948 • Published 2 days ago • 161
VideoWorld: Exploring Knowledge Learning from Unlabeled Videos Paper • 2501.09781 • Published 8 days ago • 20
MiniMax-01: Scaling Foundation Models with Lightning Attention Paper • 2501.08313 • Published 10 days ago • 268
Mulberry: Empowering MLLM with o1-like Reasoning and Reflection via Collective Monte Carlo Tree Search Paper • 2412.18319 • Published Dec 24, 2024 • 37
Improving Composed Image Retrieval via Contrastive Learning with Scaling Positives and Negatives Paper • 2404.11317 • Published Apr 17, 2024 • 1
Improving the Consistency in Cross-Lingual Cross-Modal Retrieval with 1-to-K Contrastive Learning Paper • 2406.18254 • Published Jun 26, 2024 • 1
EasyRAG: Efficient Retrieval-Augmented Generation Framework for Automated Network Operations Paper • 2410.10315 • Published Oct 14, 2024 • 2
When Text Embedding Meets Large Language Model: A Comprehensive Survey Paper • 2412.09165 • Published Dec 12, 2024 • 1
LOKI: A Comprehensive Synthetic Data Detection Benchmark using Large Multimodal Models Paper • 2410.09732 • Published Oct 13, 2024 • 54
GTE models Collection General Text Embedding Models Released by Tongyi Lab of Alibaba Group • 21 items • Updated 4 days ago • 20
PaliGemma Release Collection Pretrained and mix checkpoints for PaliGemma • 16 items • Updated Dec 13, 2024 • 144
view article Article GaLore: Advancing Large Model Training on Consumer-grade Hardware Mar 20, 2024 • 26
Idefics2 🐶 Collection Idefics2-8B is a foundation vision-language model. In this collection, you will find the models, datasets and demo related to its creation. • 11 items • Updated May 6, 2024 • 91