3 144 85

MoRezaGH

Moreza009

https://github.com/mohammad-gh009

AI & ML interests

None yet

Recent Activity

upvoted an article 3 days ago

Illustrating Reinforcement Learning from Human Feedback (RLHF)

upvoted an article 4 days ago

Open-R1: a fully open reproduction of DeepSeek-R1

View all activity

Organizations

None yet

Moreza009's activity

upvoted an article 3 days ago

Article

Illustrating Reinforcement Learning from Human Feedback (RLHF)

Dec 9, 2022

• 219

upvoted an article 4 days ago

Article

Open-R1: a fully open reproduction of DeepSeek-R1

Jan 28

• 834

upvoted an article 12 days ago

Article

SmolLM - blazingly fast and remarkably powerful

Jul 16, 2024

• 350

upvoted an article 20 days ago

Article

Hyperparameter Search with Transformers and Ray Tune

Nov 2, 2020

• 4

upvoted 12 papers 20 days ago

Implicit Reasoning in Transformers is Reasoning through Shortcuts

Paper • 2503.07604 • Published 25 days ago • 21

LightGen: Efficient Image Generation through Knowledge Distillation and Direct Preference Optimization

Paper • 2503.08619 • Published 24 days ago • 20

Gemini Embedding: Generalizable Embeddings from Gemini

Paper • 2503.07891 • Published 25 days ago • 34

UniF^2ace: Fine-grained Face Understanding and Generation with Unified Multimodal Models

Paper • 2503.08120 • Published 25 days ago • 30

MagicInfinite: Generating Infinite Talking Videos with Your Words and Voice

Paper • 2503.05978 • Published 28 days ago • 34

LMM-R1: Empowering 3B LMMs with Strong Reasoning Abilities Through Two-Stage Rule-Based RL

Paper • 2503.07536 • Published 25 days ago • 83

Crowdsource, Crawl, or Generate? Creating SEA-VL, a Multicultural Vision-Language Dataset for Southeast Asia

Paper • 2503.07920 • Published 25 days ago • 95

Multi Agent based Medical Assistant for Edge Devices

Paper • 2503.05397 • Published 28 days ago • 7

Self-Taught Self-Correction for Small Language Models

Paper • 2503.08681 • Published 24 days ago • 13

GTR: Guided Thought Reinforcement Prevents Thought Collapse in RL-based VLM Agent Training

Paper • 2503.08525 • Published 24 days ago • 15

Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning

Paper • 2503.09516 • Published 23 days ago • 27

Motion Anything: Any to Motion Generation

Paper • 2503.06955 • Published 26 days ago • 29