view article Article From DeepSpeed to FSDP and Back Again with Hugging Face Accelerate Jun 13, 2024 • 53
Exploring Hallucination of Large Multimodal Models in Video Understanding: Benchmark, Analysis and Mitigation Paper • 2503.19622 • Published 19 days ago • 29
ZeroI2V: Zero-Cost Adaptation of Pre-trained Transformers from Image to Video Paper • 2310.01324 • Published Oct 2, 2023 • 1
InternVideo2: Scaling Video Foundation Models for Multimodal Video Understanding Paper • 2403.15377 • Published Mar 22, 2024 • 26
VideoChat-Flash: Hierarchical Compression for Long-Context Video Modeling Paper • 2501.00574 • Published Dec 31, 2024 • 6