Hugging Face
Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up
8
4
364
Will Brooks
TornButter
Follow
shtefcs's profile picture
21world's profile picture
2 followers
Ā·
2 following
AI & ML interests
None yet
Recent Activity
reacted
to
singhsidhukuldeep
's
post
with š„
2 days ago
Exciting News in AI: JinaAI Releases JINA-CLIP-v2! The team at Jina AI has just released a groundbreaking multilingual multimodal embedding model that's pushing the boundaries of text-image understanding. Here's why this is a big deal: š Technical Highlights: - Dual encoder architecture combining a 561M parameter Jina XLM-RoBERTa text encoder and a 304M parameter EVA02-L14 vision encoder - Supports 89 languages with 8,192 token context length - Processes images up to 512Ć512 pixels with 14Ć14 patch size - Implements FlashAttention2 for text and xFormers for vision processing - Uses Matryoshka Representation Learning for efficient vector storage ā”ļø Under The Hood: - Multi-stage training process with progressive resolution scaling (224ā384ā512) - Contrastive learning using InfoNCE loss in both directions - Trained on massive multilingual dataset including 400M English and 400M multilingual image-caption pairs - Incorporates specialized datasets for document understanding, scientific graphs, and infographics - Uses hard negative mining with 7 negatives per positive sample š Performance: - Outperforms previous models on visual document retrieval (52.65% nDCG@5) - Achieves 89.73% image-to-text and 79.09% text-to-image retrieval on CLIP benchmark - Strong multilingual performance across 30 languages - Maintains performance even with 75% dimension reduction (256D vs 1024D) šÆ Key Innovation: The model solves the long-standing challenge of unifying text-only and multi-modal retrieval systems while adding robust multilingual support. Perfect for building cross-lingual visual search systems! Kudos to the research team at Jina AI for this impressive advancement in multimodal AI!
liked
a Space
6 days ago
HuggingFaceH4/blogpost-scaling-test-time-compute
liked
a model
12 days ago
meta-llama/Llama-3.3-70B-Instruct
View all activity
Organizations
None yet
TornButter
's activity
All
Models
Datasets
Spaces
Papers
Collections
Community
Posts
Upvotes
Likes
liked
a Space
6 days ago
Running
384
š
Scaling test-time compute
liked
a model
12 days ago
meta-llama/Llama-3.3-70B-Instruct
Text Generation
ā¢
Updated
5 days ago
ā¢
315k
ā¢
ā¢
1.3k
liked
a model
15 days ago
city96/t5-v1_1-xxl-encoder-gguf
Updated
Aug 20
ā¢
38.1k
ā¢
237
liked
a Space
17 days ago
Running
on
Zero
2.09k
š¢
TRELLIS
Scalable and Versatile 3D Generation from images
liked
a model
21 days ago
tencent/HunyuanVideo
Text-to-Video
ā¢
Updated
8 days ago
ā¢
7.35k
ā¢
1.28k
liked
a model
22 days ago
TheDrummer/Cydonia-22B-v1.3
Updated
Nov 21
ā¢
1.61k
ā¢
21
liked
a Space
26 days ago
Running
on
CPU Upgrade
951
š¢
Anychat
liked
a model
26 days ago
Djrango/Qwen2vl-Flux
Text-to-Image
ā¢
Updated
20 days ago
ā¢
454
liked
2 models
29 days ago
dakkidaze/Cydonia-22B-v1.3-4.5bpw-h6-exl2
Updated
Nov 24
ā¢
51
ā¢
1
black-forest-labs/FLUX.1-Fill-dev
Updated
about 1 month ago
ā¢
62.6k
ā¢
406
liked
a Space
29 days ago
Running
640
š
PR Puppet Sora
liked
2 models
about 1 month ago
Lightricks/LTX-Video
Image-to-Video
ā¢
Updated
7 days ago
ā¢
71.6k
ā¢
774
NexaAIDev/OmniVLM-968M
Updated
10 days ago
ā¢
3.79k
ā¢
485
liked
a Space
about 1 month ago
Running
on
Zero
510
š¢
BRIA RMBG 2.0
remove background from any image
liked
3 models
about 1 month ago
briaai/RMBG-2.0
Image Segmentation
ā¢
Updated
3 days ago
ā¢
239k
ā¢
538
Qwen/Qwen2.5-Coder-32B-Instruct
Text Generation
ā¢
Updated
Nov 18
ā¢
370k
ā¢
ā¢
1.37k
OuteAI/OuteTTS-0.1-350M
Text-to-Speech
ā¢
Updated
29 days ago
ā¢
5.62k
ā¢
296
liked
3 models
about 2 months ago
Laxhar/noobai-XL-Vpred-0.5
Text-to-Image
ā¢
Updated
Nov 15
ā¢
218
ā¢
21
scepter-studio/ACE-0.6B-512px
Updated
Nov 21
ā¢
32
ā¢
26
Etched/oasis-500m
Updated
Nov 4
ā¢
366
ā¢
431
Load more