1 8 13

Kexin Huang

737443h

https://kexinhuang02.github.io

AI & ML interests

None yet

Recent Activity

upvoted a paper 18 days ago

Rubric-based On-policy Distillation

submitted a paper about 2 months ago

FIPO: Eliciting Deep Reasoning with Future-KL Influenced Policy Optimization

authored a paper 2 months ago

FIPO: Eliciting Deep Reasoning with Future-KL Influenced Policy Optimization

View all activity

Organizations

None yet

upvoted a paper 18 days ago

Rubric-based On-policy Distillation

Paper • 2605.07396 • Published 23 days ago • 41

submitted a paper to Daily Papers about 2 months ago

FIPO: Eliciting Deep Reasoning with Future-KL Influenced Policy Optimization

Paper • 2603.19835 • Published Mar 20 • 352

authored 3 papers 2 months ago

FIPO: Eliciting Deep Reasoning with Future-KL Influenced Policy Optimization

Paper • 2603.19835 • Published Mar 20 • 352

On the Direction of RLVR Updates for LLM Reasoning: Identification and Exploitation

Paper • 2603.22117 • Published Mar 23 • 29

Sparse but Critical: A Token-Level Analysis of Distributional Shifts in RLVR Fine-Tuning of LLMs

Paper • 2603.22446 • Published Mar 23 • 10

upvoted 2 papers 2 months ago

Sparse but Critical: A Token-Level Analysis of Distributional Shifts in RLVR Fine-Tuning of LLMs

Paper • 2603.22446 • Published Mar 23 • 10

On the Direction of RLVR Updates for LLM Reasoning: Identification and Exploitation

Paper • 2603.22117 • Published Mar 23 • 29

liked a Space 2 months ago

The Synthetic Data Playbook: Generating Trillions of the Finest Tokens

📝

243

Explore synthetic data benchmarks with an interactive bookshelf

liked 3 datasets 4 months ago

authored 3 papers 8 months ago

RePO: ReLU-based Preference Optimization

Paper • 2503.07426 • Published Mar 10, 2025 • 2

SPRec: Self-Play to Debias LLM-based Recommendation

Paper • 2412.09243 • Published Dec 12, 2024

Quantile Advantage Estimation for Entropy-Safe Reasoning

Paper • 2509.22611 • Published Sep 26, 2025 • 119

upvoted a paper 8 months ago

Quantile Advantage Estimation for Entropy-Safe Reasoning

Paper • 2509.22611 • Published Sep 26, 2025 • 119

upvoted a paper 12 months ago

Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learning for LLM Reasoning

Paper • 2506.01939 • Published Jun 2, 2025 • 190

upvoted an article about 1 year ago

Article

Visualize and understand GPU memory in PyTorch

qgallouedec

•

Dec 24, 2024

• 271

upvoted a paper about 1 year ago

START: Self-taught Reasoner with Tools

Paper • 2503.04625 • Published Mar 6, 2025 • 113

liked a dataset over 1 year ago

open-r1/OpenR1-Math-220k

Viewer • Updated Feb 18, 2025 • 450k • 45.3k • 752

liked a Space over 1 year ago

Scaling test-time compute

📈

600

Boost LLM answers with flexible test‑time search strategies

Kexin Huang

AI & ML interests

Recent Activity

Organizations

737443h's activity

The Synthetic Data Playbook: Generating Trillions of the Finest Tokens

Visualize and understand GPU memory in PyTorch

Scaling test-time compute