Huanjin Yao's picture

4 2 1

Huanjin Yao

HuanjinYao

·

HJYao00

AI & ML interests

None yet

Recent Activity

updated a model about 1 month ago

HuanjinYao/Mulberry_llava_8b

updated a collection about 1 month ago

updated a model about 1 month ago

HuanjinYao/Mulberry_qwen2vl_7b

View all activity

Organizations

HuanjinYao's activity

updated a model about 1 month ago

HuanjinYao/Mulberry_llava_8b

Image-Text-to-Text • Updated Feb 5 • 956 • 2

updated a collection about 1 month ago

Mulberry

3 items • Updated Feb 5

updated a model about 1 month ago

HuanjinYao/Mulberry_qwen2vl_7b

Image-Text-to-Text • Updated Feb 4 • 1.59k

published a model about 1 month ago

HuanjinYao/Mulberry_qwen2vl_7b

Image-Text-to-Text • Updated Feb 4 • 1.59k

updated a dataset about 2 months ago

HuanjinYao/Mulberry-SFT

Viewer • Updated Jan 26 • 413k • 495 • 2

updated a collection about 2 months ago

Mulberry

3 items • Updated Feb 5

updated a model about 2 months ago

HuanjinYao/Mulberry_llama_11b

Image-Text-to-Text • Updated Jan 19 • 26

published a dataset about 2 months ago

HuanjinYao/Mulberry-SFT

Viewer • Updated Jan 26 • 413k • 495 • 2

updated a collection about 2 months ago

Mulberry

3 items • Updated Feb 5

published a model about 2 months ago

HuanjinYao/Mulberry_llama_11b

Image-Text-to-Text • Updated Jan 19 • 26

New activity in HuanjinYao/Mulberry_llava_8b about 2 months ago

Add metadata

#1 opened about 2 months ago by

authored a paper 2 months ago

Mulberry: Empowering MLLM with o1-like Reasoning and Reflection via Collective Monte Carlo Tree Search

Paper • 2412.18319 • Published Dec 24, 2024 • 37

upvoted a paper 3 months ago

Mulberry: Empowering MLLM with o1-like Reasoning and Reflection via Collective Monte Carlo Tree Search

Paper • 2412.18319 • Published Dec 24, 2024 • 37

liked a Space 9 months ago

DenseConnector V1.5 8B

updated a model 9 months ago

HuanjinYao/DenseConnector-v1.5-SigLIP-7B-AnyRes

Image-to-Text • Updated Jun 19, 2024 • 17

New activity in ShareGPT4Video/ShareGPT4Video 9 months ago

About video instruction data

#15 opened 9 months ago by

reacted to merve's post with 👀 10 months ago

Post

2061

Do we fully leverage ViT encoders in vision language models?

A new paper (by @HuanjinYao et al) built a dense connector that does it better! HuanjinYao/DenseConnector-v1.5-8B
HuanjinYao/denseconnector-66500e173fc8c9f05dc98dea

VLMs consist of an image encoder block, a projection layer that projects image embeddings to text embedding space and then a text decoder sequentially connected 📖
This paper explores using intermediate states of image encoder and not a single output 🤩
The authors explore three different ways of instantiating dense connector: sparse token integration, sparse channel integration and dense channel integration. (see paper on how they do it Dense Connector for MLLMs (2405.13800))

They explore all three of them integrated to LLaVA 1.5 and found out each of the new models are superior to the original LLaVA 1.5 🥹 I tried the model and it seems to work very well. As part of the release, the authors have released various ckpts based on different decoders (Vicuna 7/13B and Llama 3-8B) that you can find in the collection 🤗

updated a Space 10 months ago

DenseConnector V1.5 8B

updated a collection 10 months ago

DenseConnector

Official collection of "Dense Connector for MLLMs" • 4 items • Updated May 28, 2024 • 1

updated a model 10 months ago

HuanjinYao/DenseConnector-v1.5-13B

Image-to-Text • Updated May 28, 2024 • 20 • 1