Apollo: An Exploration of Video Understanding in Large Multimodal Models Paper • 2412.10360 • Published 11 days ago • 130
EXAONE-3.0 Collection EXAONE 3.0 7.8B instruction-tuned language model • 2 items • Updated 16 days ago • 2
EXAONE-3.5 Collection EXAONE 3.5 language model series including instruction-tuned models of 2.4B, 7.8B, and 32B. • 10 items • Updated 15 days ago • 79
NitroFusion: High-Fidelity Single-Step Diffusion through Dynamic Adversarial Training Paper • 2412.02030 • Published 22 days ago • 18
One Shot, One Talk: Whole-body Talking Avatar from a Single Image Paper • 2412.01106 • Published 23 days ago • 18
ChatRex: Taming Multimodal LLM for Joint Perception and Understanding Paper • 2411.18363 • Published 27 days ago • 9
Material Anything: Generating Materials for Any 3D Object via Diffusion Paper • 2411.15138 • Published Nov 22 • 42
Flux.1 Tools Collection FLUX.1 Tools, a suite of models designed to add control and steerability to base text-to-image models FLUX.1 • 6 items • Updated Nov 22 • 13
LLaMA-Mesh: Unifying 3D Mesh Generation with Language Models Paper • 2411.09595 • Published Nov 14 • 71
BLIP3-KALE: Knowledge Augmented Large-Scale Dense Captions Paper • 2411.07461 • Published Nov 12 • 21
OS-ATLAS: A Foundation Action Model for Generalist GUI Agents Paper • 2410.23218 • Published Oct 30 • 46
MobileLLM Collection Optimizing Sub-billion Parameter Language Models for On-Device Use Cases (ICML 2024) https://arxiv.org/abs/2402.14905 • 9 items • Updated 28 days ago • 99
MarDini: Masked Autoregressive Diffusion for Video Generation at Scale Paper • 2410.20280 • Published Oct 26 • 23
CogVLM2 Collection This collection hosts the repos of the THUDM's CogVLM2 releases • 8 items • Updated 27 days ago • 19