CulturaX: A Cleaned, Enormous, and Multilingual Dataset for Large Language Models in 167 Languages Paper • 2309.09400 • Published Sep 17, 2023 • 84
Jina Embeddings: A Novel Set of High-Performance Sentence Embedding Models Paper • 2307.11224 • Published Jul 20, 2023 • 6
Multi-Task Contrastive Learning for 8192-Token Bilingual Text Embeddings Paper • 2402.17016 • Published Feb 26 • 5
jina-embeddings-v3: Multilingual Embeddings With Task LoRA Paper • 2409.10173 • Published Sep 16 • 28
Jina-ColBERT-v2: A General-Purpose Multilingual Late Interaction Retriever Paper • 2408.16672 • Published Aug 29 • 7
Jina Embeddings 2: 8192-Token General-Purpose Text Embeddings for Long Documents Paper • 2310.19923 • Published Oct 30, 2023 • 13