Qwen
/

Qwen2-VL-7B-Instruct

Image-Text-to-Text

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

JustinLin610 commited on Aug 29, 2024

Commit

b6241d7

·

verified ·

1 Parent(s): 1399c6f

Update README.md

Files changed (1) hide show

README.md +5 -4

README.md CHANGED Viewed

@@ -18,13 +18,14 @@ We're excited to unveil **Qwen2-VL**, the latest iteration of our Qwen-VL model,
 #### Key Enhancements:
-* **Enhanced Image Comprehension**: We've significantly improved the model's ability to understand and interpret visual information, setting new benchmarks across key performance metrics.
-* **Advanced Video Understanding**: Qwen2-VL now features superior online streaming capabilities, enabling real-time analysis of dynamic video content with remarkable accuracy.
-* **Integrated Visual Agent Functionality**: Our model now seamlessly incorporates sophisticated system integration, transforming Qwen2-VL into a powerful visual agent capable of complex reasoning and decision-making.
-* **Expanded Multilingual Support**: We've broadened our language capabilities to better serve a diverse global user base, making Qwen2-VL more accessible and effective across different linguistic contexts.
 #### Model Architecture Updates:

 #### Key Enhancements:
+* **SoTA understanding of images of various resolution & ratio**: Qwen2-VL achieves state-of-the-art performance on visual understanding benchmarks, including MathVista, DocVQA, RealworldQA, MMBench, etc.
+* **Understanding videos of 20min+**: with the online streaming capabilities, Qwen2-VL can understand long videos by high-quality video-based question answering, dialog, content creation, etc.
+* **Agent that can operate your mobiles, robots, ...**: with the abilities of complex reasoning and decision making, Qwen2-VL can be integrated with devices like mobile phones, robots, etc., for automatic operation based on visual environment and text instructions.
+* **Multilingual Support**: to serve global users, besides English and Chinese, Qwen2-VL now supports the understanding of texts in different languages inside images, including most European languages, Japanese, Korean, Arabic, Vietnamese, etc.
 #### Model Architecture Updates: