Varun Sakunia's picture

21 30

Varun Sakunia

Varun-08

·

AI & ML interests

Python, Machine Learning, Deep Learning, Computer Vision

Recent Activity

upvoted a paper about 16 hours ago

Dispider: Enabling Video LLMs with Active Real-Time Interaction via Disentangled Perception, Decision, and Reaction

upvoted a paper 11 days ago

Task Preference Optimization: Improving Multimodal Large Language Models with Vision Task Alignment

upvoted a paper 14 days ago

Switti: Designing Scale-Wise Transformers for Text-to-Image Synthesis

View all activity

Organizations

None yet

Varun-08's activity

upvoted a paper about 16 hours ago

Dispider: Enabling Video LLMs with Active Real-Time Interaction via Disentangled Perception, Decision, and Reaction

Paper • 2501.03218 • Published 3 days ago • 27

upvoted a paper 11 days ago

Task Preference Optimization: Improving Multimodal Large Language Models with Vision Task Alignment

Paper • 2412.19326 • Published 14 days ago • 18

upvoted a paper 14 days ago

Switti: Designing Scale-Wise Transformers for Text-to-Image Synthesis

Paper • 2412.01819 • Published Dec 2, 2024 • 34

upvoted a collection 15 days ago

ModernBERT

Bringing BERT into modernity via both architecture changes and scaling • 3 items • Updated 22 days ago • 119

upvoted a collection about 1 month ago

PaliGemma 2 Release

Vision-Language Models available in multiple 3B, 10B and 28B variants. • 23 items • Updated 27 days ago • 125

upvoted a collection about 2 months ago

Models for dataset curation

9 items • Updated Dec 5, 2024 • 17

upvoted a paper 2 months ago

Both Text and Images Leaked! A Systematic Analysis of Multimodal LLM Data Contamination

Paper • 2411.03823 • Published Nov 6, 2024 • 43

upvoted 2 collections 3 months ago

Llama 3.2

This collection hosts the transformers and original repos of the Llama 3.2 and Llama Guard 3 • 15 items • Updated Dec 6, 2024 • 554

DocLayout-YOLO

Dataset and model for DocLayout-YOLO • 9 items • Updated Oct 22, 2024 • 12

upvoted 2 articles 3 months ago

Article

Fine-tuning LLMs to 1.58bit: extreme quantization made easy

Sep 18, 2024

• 215

Article

How to build a custom text classifier without days of human labeling

By

•

Oct 17, 2024

• 55

upvoted 4 papers 3 months ago

Lotus: Diffusion-based Visual Foundation Model for High-quality Dense Prediction

Paper • 2409.18124 • Published Sep 26, 2024 • 32

Depth Pro: Sharp Monocular Metric Depth in Less Than a Second

Paper • 2410.02073 • Published Oct 2, 2024 • 41

Contextual Document Embeddings

Paper • 2410.02525 • Published Oct 3, 2024 • 18

A Spark of Vision-Language Intelligence: 2-Dimensional Autoregressive Transformer for Efficient Finegrained Image Generation

Paper • 2410.01912 • Published Oct 2, 2024 • 14

upvoted a collection 3 months ago

Molmo

Artifacts for open multimodal language models. • 5 items • Updated 3 days ago • 292

upvoted a collection 4 months ago

Qwen2.5

Qwen2.5 language models, including pretrained and instruction-tuned models of 7 sizes, including 0.5B, 1.5B, 3B, 7B, 14B, 32B, and 72B. • 45 items • Updated Nov 28, 2024 • 458