Describe Anything Collection Multimodal Large Language Models for Detailed Localized Image and Video Captioning • 6 items • Updated about 9 hours ago • 16