Yash Thube

thubZ9

https://thubzai.github.io/

AI & ML interests

Multimodal learning • CV • RL

Recent Activity

upvoted a paper 2 days ago

Efficient Process Reward Model Training via Active Learning

upvoted a paper 4 days ago

InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models

updated a collection 11 days ago

My reading list!

View all activity

Organizations

thubZ9's activity

upvoted a paper 2 days ago

Efficient Process Reward Model Training via Active Learning

Paper • 2504.10559 • Published 4 days ago • 11

upvoted a paper 4 days ago

InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models

Paper • 2504.10479 • Published 4 days ago • 222

upvoted a paper 11 days ago

Qwen2.5-Omni Technical Report

Paper • 2503.20215 • Published 24 days ago • 138

upvoted a paper 14 days ago

Advances and Challenges in Foundation Agents: From Brain-Inspired Intelligence to Evolutionary, Collaborative, and Safe Systems

Paper • 2504.01990 • Published 18 days ago • 242

upvoted a paper 28 days ago

One-Step Residual Shifting Diffusion for Image Super-Resolution via Distillation

Paper • 2503.13358 • Published Mar 17 • 95

upvoted 2 collections about 1 month ago

Cohere Labs Aya Vision

Collection

Aya Vision is a state-of-the-art family of vision models that brings multimodal capabilities to 23 languages. • 5 items • Updated 3 days ago • 68

Gemma 3 Release

Collection

24 items • Updated about 10 hours ago • 335

upvoted an article about 1 month ago

Article

A Deepdive into Aya Vision: Advancing the Frontier of Multilingual Multimodality

Mar 4

• 73

upvoted 2 papers about 1 month ago

Unified Reward Model for Multimodal Understanding and Generation

Paper • 2503.05236 • Published Mar 7 • 118

Token-Efficient Long Video Understanding for Multimodal LLMs

Paper • 2503.04130 • Published Mar 6 • 93

upvoted 8 papers about 2 months ago

Visual-RFT: Visual Reinforcement Fine-Tuning

Paper • 2503.01785 • Published Mar 3 • 76

SWE-RL: Advancing LLM Reasoning via Reinforcement Learning on Open Software Evolution

Paper • 2502.18449 • Published Feb 25 • 73

SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features

Paper • 2502.14786 • Published Feb 20 • 142

Continuous Diffusion Model for Language Modeling

Paper • 2502.11564 • Published Feb 17 • 53

SoFar: Language-Grounded Orientation Bridges Spatial Reasoning and Object Manipulation

Paper • 2502.13143 • Published Feb 18 • 29

upvoted 2 papers 2 months ago

Large Language Diffusion Models

Paper • 2502.09992 • Published Feb 14 • 112

Region-Adaptive Sampling for Diffusion Transformers

Paper • 2502.10389 • Published Feb 14 • 53