Kamesh R's picture
2 3

Kamesh R PRO

Kameshr

AI & ML interests

None yet

Recent Activity

updated a model 28 days ago
Kameshr/reasoning-small-1B
published a model 28 days ago
Kameshr/reasoning-small-1B
liked a dataset 28 days ago
Kameshr/tamil-sangam-text-excerpt
View all activity

Organizations

Stanford AI's profile picture Gradio-Blocks-Party's profile picture ICML2023's profile picture CodeWiz's profile picture Sathyabama Institute of Science and Technology's profile picture AI Starter Pack's profile picture CoBuild Tech's profile picture

Kameshr's activity

reacted to KaiChen1998's post with šŸ”„ about 1 month ago
view post
Post
4831
šŸ“¢ Our EMOVA paper has been accepted by CVPR 2025, and we are glad to release all resources, including code (training & inference), datasets (training & evaluation), and checkpoints (EMOVA-3B/7B/72B)!

šŸ¤— EMOVA is a novel end-to-end omni-modal LLM that can see, hear and speak. Given omni-modal (i.e., textual, visual and speech) inputs, EMOVA can generate both textual and speech responses with vivid emotional controls by utilizing the speech decoder and a style controller.

✨ EMOVA Highlights
āœ… State-of-the-art omni-modality: EMOVA achieves SoTA comparable results on both vision-language and speech benchmarks simultaneously.
āœ… Device adaptation: our codebase supports training/inference on both NVIDIA GPUs (e.g., A800 & H20) and Ascend NPUs (e.g., 910B3)!
āœ… Modular design: we integrate multiple implementations of vision encoder, vision projector, and language model, even including the most recent DeepSeekMoE-tiny!

šŸ”„ You are all welcome to try and star!
- Project page: https://emova-ollm.github.io/
- Github: https://github.com/emova-ollm/EMOVA
- Demo: Emova-ollm/EMOVA-demo