38 43 107

Junlin Zhou

jlzhou

edwardzjl

AI & ML interests

None yet

Recent Activity

upvoted an article 11 days ago

You could have designed state of the art positional encoding

liked a model 12 days ago

mistralai/Mistral-Small-3.1-24B-Instruct-2503

reacted to KaiChen1998's post with 👍 13 days ago

📢 Our EMOVA paper has been accepted by CVPR 2025, and we are glad to release all resources, including code (training & inference), datasets (training & evaluation), and checkpoints (EMOVA-3B/7B/72B)! 🤗 EMOVA is a novel end-to-end omni-modal LLM that can see, hear and speak. Given omni-modal (i.e., textual, visual and speech) inputs, EMOVA can generate both textual and speech responses with vivid emotional controls by utilizing the speech decoder and a style controller. ✨ EMOVA Highlights ✅ State-of-the-art omni-modality: EMOVA achieves SoTA comparable results on both vision-language and speech benchmarks simultaneously. ✅ Device adaptation: our codebase supports training/inference on both NVIDIA GPUs (e.g., A800 & H20) and Ascend NPUs (e.g., 910B3)! ✅ Modular design: we integrate multiple implementations of vision encoder, vision projector, and language model, even including the most recent DeepSeekMoE-tiny! 🔥 You are all welcome to try and star! - Project page: https://emova-ollm.github.io/ - Github: https://github.com/emova-ollm/EMOVA - Demo: https://huggingface.co/spaces/Emova-ollm/EMOVA-demo

View all activity

Organizations

jlzhou's activity

upvoted an article 11 days ago

Article

You could have designed state of the art positional encoding

Nov 25, 2024

• 195

liked a model 12 days ago

mistralai/Mistral-Small-3.1-24B-Instruct-2503

Image-Text-to-Text • Updated 8 days ago • 105k • 1.02k

reacted to KaiChen1998's post with 👍 13 days ago

Post

4800

📢 Our EMOVA paper has been accepted by CVPR 2025, and we are glad to release all resources, including code (training & inference), datasets (training & evaluation), and checkpoints (EMOVA-3B/7B/72B)!

🤗 EMOVA is a novel end-to-end omni-modal LLM that can see, hear and speak. Given omni-modal (i.e., textual, visual and speech) inputs, EMOVA can generate both textual and speech responses with vivid emotional controls by utilizing the speech decoder and a style controller.

✨ EMOVA Highlights
✅ State-of-the-art omni-modality: EMOVA achieves SoTA comparable results on both vision-language and speech benchmarks simultaneously.
✅ Device adaptation: our codebase supports training/inference on both NVIDIA GPUs (e.g., A800 & H20) and Ascend NPUs (e.g., 910B3)!
✅ Modular design: we integrate multiple implementations of vision encoder, vision projector, and language model, even including the most recent DeepSeekMoE-tiny!

🔥 You are all welcome to try and star!
- Project page: https://emova-ollm.github.io/
- Github: https://github.com/emova-ollm/EMOVA
- Demo: Emova-ollm/EMOVA-demo

upvoted a paper 13 days ago

Min P Sampling: Balancing Creativity and Coherence at High Temperature

Paper • 2407.01082 • Published Jul 1, 2024 • 1

upvoted an article 16 days ago

Article

Welcome Gemma 3: Google's all new multimodal, multilingual, long context open LLM

18 days ago

• 355

upvoted a paper 18 days ago

Towards Thinking-Optimal Scaling of Test-Time Compute for LLM Reasoning

Paper • 2502.18080 • Published Feb 25 • 2

liked a model 18 days ago

open-r1/OlympicCoder-32B

Text Generation • Updated 13 days ago • 3.79k • 143

liked a dataset 18 days ago

smolagents/benchmark-v1

Viewer • Updated 26 days ago • 132 • 708 • 9

upvoted an article 20 days ago

Article

From Files to Chunks: Improving Hugging Face Storage Efficiency

Nov 20, 2024

• 55

reacted to Kseniase's post with 👍 27 days ago

Post

6146

9 types of "Chain-of-..." approaches:

Chain-of-Thought (CoT) prompting enhances reasoning in AI models by breaking down complex problems into step-by-step logical sequences. It continues proving its effectiveness, especially in top-performing reasoning models. However, there are other similar methods, that expand CoT and can be used for different purposes. Here are 9 of them:

1. Chain-of-Action-Thought (COAT) -> Satori: Reinforcement Learning with Chain-of-Action-Thought Enhances LLM Reasoning via Autoregressive Search (2502.02508)
Helps model decide when to keep thinking, double-check their work, or try a different approach, using special guiding tokens.

2. Chain of Draft (CoD) -> Chain of Draft: Thinking Faster by Writing Less (2502.18600)
It helps model generate short but meaningful reasoning steps, cutting costs and making processing faster

3. Chain-of-Agents -> Chain of Agents: Large Language Models Collaborating on Long-Context Tasks (2406.02818)
Uses multi-agent collaboration: Worker agents process text parts in a structured chain, and manager agent summarizes the results

4. Chain-of-RAG ->https://huggingface.co/papers/2501.14342
Creates retrieval chains, instead of retrieving all info at once. It can dynamically adjust its search process and its parameters like step number

5. Chain-of-Shot Prompting (CoS) -> CoS: Chain-of-Shot Prompting for Long Video Understanding (2502.06428)
Helps models pick frames crucial for understanding a video, using a binary video summary and video co-reasoning module.

6. Chain of Hindsight (CoH) -> Chain of Hindsight Aligns Language Models with Feedback (2302.02676)
Converts all feedback into sequences to fine-tune the model and refine outputs

7. Chain-of-Note (CoN) -> Chain-of-Note: Enhancing Robustness in Retrieval-Augmented Language Models (2311.09210)
Generates sequential reading notes for each retrieved document to assess relevance before integrating info into the final answer

8. Chain of Diagnosis (CoD) -> CoD, Towards an Interpretable Medical Agent using Chain of Diagnosis (2407.13301)
Transforms the diagnostic process into a diagnostic chain

9. Chain(s)-of-Knowledge -> https://www.turingpost.com/p/cok
Enhance LLMs by dynamically pulling in external knowledge to improve accuracy and reduce errors

liked a model about 1 month ago

NousResearch/DeepHermes-3-Llama-3-8B-Preview

Text Generation • Updated 17 days ago • 56.9k • 302

liked a dataset about 1 month ago

nvidia/HelpSteer2

Viewer • Updated Dec 18, 2024 • 21.4k • 3.7k • 411

commented on Distributed SFT with trl and DeepSpeed Part 2: Scaling Locally about 1 month ago

I'm glad you found it helpful!

Yes, this is planned. I was originally planning to write an article about training with the training operator, but now I'm wondering if I should skip that and focus on training with the new trainer instead.

PS: Kubeflow is migrating their training component from v1 (Kubeflow Training Operator) to v2 (Kubeflow Trainer).