Tony Zhao's picture

Tony Zhao

tianchez

·

https://www.tianchez.com

AI & ML interests

Multimodal Agent, Generative AI

Recent Activity

commented on a paper 3 days ago

VLM-R1: A Stable and Generalizable R1-style Large Vision-Language Model

updated a model 4 days ago

omlab/VLM-R1-Qwen2.5VL-3B-Math-0305

updated a model 4 days ago

omlab/Qwen2.5VL-3B-VLM-R1-REC-500steps

View all activity

Organizations

tianchez's activity

commented a paper 3 days ago

VLM-R1: A Stable and Generalizable R1-style Large Vision-Language Model

Paper • 2504.07615 • Published 8 days ago • 22 •

updated 3 models 4 days ago

omlab/VLM-R1-Qwen2.5VL-3B-Math-0305

Visual Question Answering • Updated 4 days ago • 339

omlab/Qwen2.5VL-3B-VLM-R1-REC-500steps

Zero-Shot Object Detection • Updated 4 days ago • 855 • 22

omlab/VLM-R1-Qwen2.5VL-3B-OVD-0321

Zero-Shot Object Detection • Updated 4 days ago • 817 • 6

updated a collection 4 days ago

Multimodal Research

10 items • Updated 4 days ago • 1

updated a Space 4 days ago

VLM R1 Referral Expression

Find and highlight objects in images based on text descriptions

upvoted a paper 4 days ago

VLM-R1: A Stable and Generalizable R1-style Large Vision-Language Model

Paper • 2504.07615 • Published 8 days ago • 22

replied to AdinaY's post 24 days ago

https://huggingface.co/blog/omlab/vlm-r1-for-ovd
https://huggingface.co/blog/omlab/vlm-ovd-findings

replied to AdinaY's post 25 days ago

We now share our latest insights in the blog here.
https://om-ai-lab.github.io/index.html

liked a Space 26 days ago

OmAgent

Process and answer questions about webpage videos

liked a Space 27 days ago

VLM R1 OVD

VLM-R1 model for Open-Vocabulary Object Detection

published a Space 27 days ago

VLM R1 OVD

VLM-R1 model for Open-Vocabulary Object Detection

upvoted a collection about 1 month ago

VLM-R1-models

A collection of VLM-R1 Models • 7 items • Updated 27 days ago • 4

New activity in omlab/VLM-R1-Referral-Expression about 2 months ago

Apply for community grant: Personal project (gpu)

#3 opened about 2 months ago by

replied to their post about 2 months ago

looks very cool!

reacted to their post with 👍 about 2 months ago

Post

4302

Introducing VLM-R1!

GRPO has helped DeepSeek R1 to learn reasoning. Can it also help VLMs perform stronger for general computer vision tasks?

The answer is YES and it generalizes better than SFT. We trained Qwen 2.5 VL 3B on RefCOCO (a visual grounding task) and eval on RefCOCO Val and RefGTA (an OOD task).

https://github.com/om-ai-lab/VLM-R1

3 replies

·

New activity in omlab/VLM-R1-Referral-Expression about 2 months ago

Fixes 500 error for some users

#1 opened about 2 months ago by

reacted to their post with ❤️ about 2 months ago

Post

4302

Introducing VLM-R1!

GRPO has helped DeepSeek R1 to learn reasoning. Can it also help VLMs perform stronger for general computer vision tasks?

The answer is YES and it generalizes better than SFT. We trained Qwen 2.5 VL 3B on RefCOCO (a visual grounding task) and eval on RefCOCO Val and RefGTA (an OOD task).

https://github.com/om-ai-lab/VLM-R1

3 replies

·