Wei Liu's picture

Wei Liu

PeterV09

·

https://vpeterv.github.io

AI & ML interests

Machine Learning, Natural Language Processing

Recent Activity

updated a model about 22 hours ago

RL4Reasoning/verl-grpo-lr-deepscaler-bsz128-2048-clip-cliphigh-hf-1.5B-4_deepscaler_-590

published a model about 22 hours ago

RL4Reasoning/verl-grpo-lr-deepscaler-bsz128-2048-clip-cliphigh-hf-1.5B-4_deepscaler_-590

updated a model about 22 hours ago

RL4Reasoning/verl-grpo-lr-deepscaler-bsz128-4096-clip-cliphigh-hf-1.5B-4_deepscaler_-280

View all activity

Organizations

PeterV09's activity

upvoted 2 papers 5 days ago

MegaMath: Pushing the Limits of Open Math Corpora

Paper • 2504.02807 • Published 9 days ago • 27

A Survey of Efficient Reasoning for Large Reasoning Models: Language, Multimodality, and Beyond

Paper • 2503.21614 • Published 16 days ago • 39

upvoted a paper 10 days ago

Harnessing the Reasoning Economy: A Survey of Efficient Reasoning for Large Language Models

Paper • 2503.24377 • Published 12 days ago • 17

upvoted 3 collections 18 days ago

M-STAR

Resources of M-STAR (Multimodal Self-Evolving Training for Reasoning) https://mstar-lmm.github.io/ • 2 items • Updated Dec 25, 2024 • 4

SimpleRL

The collection for the Project "Simple Reinforcement Learning for Reasoning" • 2 items • Updated Feb 19 • 6

SimpleRL-Zoo

The collection for the Paper "SimpleRL-Zoo: Investigating and Taming Zero Reinforcement Learning for Open Base Models in the Wild" • 12 items • Updated 10 days ago • 6

upvoted a paper 18 days ago

SimpleRL-Zoo: Investigating and Taming Zero Reinforcement Learning for Open Base Models in the Wild

Paper • 2503.18892 • Published 19 days ago • 29

upvoted 3 papers about 1 month ago

START: Self-taught Reasoner with Tools

Paper • 2503.04625 • Published Mar 6 • 105

Language Models can Self-Improve at State-Value Estimation for Better Search

Paper • 2503.02878 • Published Mar 4 • 9

Predictive Data Selection: The Data That Predicts Is the Data That Teaches

Paper • 2503.00808 • Published Mar 2 • 57

upvoted 3 papers about 2 months ago

MoM: Linear Sequence Modeling with Mixture-of-Memories

Paper • 2502.13685 • Published Feb 19 • 34

LASP-2: Rethinking Sequence Parallelism for Linear Attention and Its Hybrid

Paper • 2502.07563 • Published Feb 11 • 24

CodeI/O: Condensing Reasoning Patterns via Code Input-Output Prediction

Paper • 2502.07316 • Published Feb 11 • 48

upvoted 6 papers 3 months ago

Test-Time Preference Optimization: On-the-Fly Alignment via Iterative Textual Feedback

Paper • 2501.12895 • Published Jan 22 • 60

Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos

Paper • 2501.04001 • Published Jan 7 • 47

Cosmos World Foundation Model Platform for Physical AI

Paper • 2501.03575 • Published Jan 7 • 78

REINFORCE++: A Simple and Efficient Approach for Aligning Large Language Models

Paper • 2501.03262 • Published Jan 4 • 99

PRMBench: A Fine-grained and Challenging Benchmark for Process-Level Reward Models

Paper • 2501.03124 • Published Jan 6 • 14

OS-Genesis: Automating GUI Agent Trajectory Construction via Reverse Task Synthesis

Paper • 2412.19723 • Published Dec 27, 2024 • 89

upvoted a paper 4 months ago

Diving into Self-Evolving Training for Multimodal Reasoning

Paper • 2412.17451 • Published Dec 23, 2024 • 44