Umitcan Sahin PRO

ucsahin

AI & ML interests

Visual Language Models, Large Language Models, Vision Transformers

Recent Activity

reacted to singhsidhukuldeep's post with šŸ”„ 1 day ago
Exciting News in AI: JinaAI Releases JINA-CLIP-v2! The team at Jina AI has just released a groundbreaking multilingual multimodal embedding model that's pushing the boundaries of text-image understanding. Here's why this is a big deal: šŸš€ Technical Highlights: - Dual encoder architecture combining a 561M parameter Jina XLM-RoBERTa text encoder and a 304M parameter EVA02-L14 vision encoder - Supports 89 languages with 8,192 token context length - Processes images up to 512Ɨ512 pixels with 14Ɨ14 patch size - Implements FlashAttention2 for text and xFormers for vision processing - Uses Matryoshka Representation Learning for efficient vector storage āš”ļø Under The Hood: - Multi-stage training process with progressive resolution scaling (224ā†’384ā†’512) - Contrastive learning using InfoNCE loss in both directions - Trained on massive multilingual dataset including 400M English and 400M multilingual image-caption pairs - Incorporates specialized datasets for document understanding, scientific graphs, and infographics - Uses hard negative mining with 7 negatives per positive sample šŸ“Š Performance: - Outperforms previous models on visual document retrieval (52.65% nDCG@5) - Achieves 89.73% image-to-text and 79.09% text-to-image retrieval on CLIP benchmark - Strong multilingual performance across 30 languages - Maintains performance even with 75% dimension reduction (256D vs 1024D) šŸŽÆ Key Innovation: The model solves the long-standing challenge of unifying text-only and multi-modal retrieval systems while adding robust multilingual support. Perfect for building cross-lingual visual search systems! Kudos to the research team at Jina AI for this impressive advancement in multimodal AI!
upvoted a collection 16 days ago
DataGemma Release
View all activity

Organizations

None yet

ucsahin's activity

upvoted 2 articles 5 months ago
view article
Article

Google releases Gemma 2 2B, ShieldGemma and Gemma Scope

ā€¢ 59
upvoted 2 articles 5 months ago
view article
Article

TGI Multi-LoRA: Deploy Once, Serve 30 Models

ā€¢ 53
view article
Article

Docmatix - a huge dataset for Document Visual Question Answering

ā€¢ 71