Yongming Rao

raoyongming

AI & ML interests

None yet

Recent Activity

liked a Space about 2 months ago

THUdyh/Ola

authored a paper 2 months ago

Ola: Pushing the Frontiers of Omni-Modal Language Model with Progressive Modality Alignment

upvoted a paper 2 months ago

Ola: Pushing the Frontiers of Omni-Modal Language Model with Progressive Modality Alignment

View all activity

Organizations

None yet

raoyongming's activity

liked a Space about 2 months ago

Ola

📊

Generate text and audio responses from images and videos

authored a paper 2 months ago

Ola: Pushing the Frontiers of Omni-Modal Language Model with Progressive Modality Alignment

Paper • 2502.04328 • Published Feb 6 • 30

upvoted a paper 2 months ago

Ola: Pushing the Frontiers of Omni-Modal Language Model with Progressive Modality Alignment

Paper • 2502.04328 • Published Feb 6 • 30

authored a paper 5 months ago

Insight-V: Exploring Long-Chain Visual Reasoning with Multimodal Large Language Models

Paper • 2411.14432 • Published Nov 21, 2024 • 26

upvoted a collection 5 months ago

Insight-V

Collection

Exploring Long-Chain Visual Reasoning with Multimodal Large Language Models • 5 items • Updated Nov 22, 2024 • 11

upvoted a paper 5 months ago

Insight-V: Exploring Long-Chain Visual Reasoning with Multimodal Large Language Models

Paper • 2411.14432 • Published Nov 21, 2024 • 26

liked a Space 7 months ago

103

Oryx

💬

Generate detailed descriptions from images and videos

upvoted a paper 7 months ago

MaskBit: Embedding-free Image Generation via Bit Tokens

Paper • 2409.16211 • Published Sep 24, 2024 • 17

authored a paper 7 months ago

Oryx MLLM: On-Demand Spatial-Temporal Understanding at Arbitrary Resolution

Paper • 2409.12961 • Published Sep 19, 2024 • 26

upvoted a paper 7 months ago

Oryx MLLM: On-Demand Spatial-Temporal Understanding at Arbitrary Resolution

Paper • 2409.12961 • Published Sep 19, 2024 • 26

authored a paper 9 months ago

Coarse Correspondence Elicit 3D Spacetime Understanding in Multimodal Language Model

Paper • 2408.00754 • Published Aug 1, 2024 • 25

upvoted 2 papers 9 months ago

Coarse Correspondence Elicit 3D Spacetime Understanding in Multimodal Language Model

Paper • 2408.00754 • Published Aug 1, 2024 • 25

Efficient Inference of Vision Instruction-Following Models with Elastic Cache

Paper • 2407.18121 • Published Jul 25, 2024 • 17

authored a paper 9 months ago

Efficient Inference of Vision Instruction-Following Models with Elastic Cache

Paper • 2407.18121 • Published Jul 25, 2024 • 17

authored 2 papers over 1 year ago

Generative Multimodal Models are In-Context Learners

Paper • 2312.13286 • Published Dec 20, 2023 • 37

Sherpa3D: Boosting High-Fidelity Text-to-3D Generation via Coarse 3D Prior

Paper • 2312.06655 • Published Dec 11, 2023 • 24

liked a Space about 2 years ago

Unipc Sdm

👁