Minjae Oh
Riasok
ยท
AI & ML interests
None yet
Recent Activity
authored a paper 2 days ago
ThinkBrake: Efficient Reasoning via Log-Probability Margin Guided Decoding authored a paper 2 days ago
KL for a KL: On-Policy Distillation with Control Variate Baseline authored a paper 2 days ago
Your Language Model is Its Own Critic: Reinforcement Learning with Value Estimation from Actor's Internal States