Florence-VL: Enhancing Vision-Language Models with Generative Vision Encoder and Depth-Breadth Fusion Paper โข 2412.04424 โข Published 20 days ago โข 55 โข 4