InternLM-XComposer

InternLM-XComposer is a vision-language large model (VLLM) based on InternLM for advanced text-image comprehension and composition. InternLM-XComposer has serveal appealing properties:

  • Interleaved Text-Image Composition: InternLM-XComposer can effortlessly generate coherent and contextual articles that seamlessly integrate images, providing a more engaging and immersive reading experience. The interleaved text-image composition is implemented in following steps:

    1. Text Generation: It crafts long-form text based on human-provided instructions.
    2. Image Spoting and Captioning: It pinpoints optimal locations for image placement and furnishes image descriptions.
    3. Image Retrieval and Selection: It select image candidates and identify the image that optimally complements the content.
  • Comprehension with Rich Multilingual Knowledge: The text-image comprehension is empowered by training on extensive multi-modal multilingual concepts with carefully crafted strategies, resulting in a deep understanding of visual content.

  • Strong performance: It consistently achieves state-of-the-art results across various benchmarks for vision-language large models, including MME Benchmark (English), MMBench (English), Seed-Bench (English), CCBench(Chinese), and MMBench-CN (Chineese).

We release InternLM-XComposer series in two versions:

  • InternLM-XComposer-VL: The pretrained VLLM model with InternLM as the initialization of the LLM, achieving strong performance on various multimodal benchmarks, e.g., MME Benchmark, MMBench Seed-Bench, CCBench, and MMBench-CN.
  • InternLM-XComposer: The finetuned VLLM for Interleaved Text-Image Composition and LLM-based AI assistant.
Downloads last month
70
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The HF Inference API does not support model that require custom code execution.