Multimodal (text + image + video + audio) embedding models aligned with jina-embeddings-v5-text-*. Two sizes, four task variants each.
-
jina-embeddings-v5-omni: Text-Geometry-Preserving Multimodal Embeddings via Frozen-Tower Composition
Paper • 2605.08384 • Published • 10 -
jinaai/jina-embeddings-v5-omni-small
Feature Extraction • 2B • Updated • 31k • 55 -
jinaai/jina-embeddings-v5-omni-nano
Feature Extraction • 1.0B • Updated • 18.9k • 21 -
jinaai/jina-embeddings-v5-omni-nano-text-matching
Feature Extraction • 0.9B • Updated • 11.1k • 3
