raft_study

AI & ML interests

None defined yet.

Recent Activity

Chenlu123 authored a paper about 2 months ago

Self-rewarding correction for mathematical reasoning

hendrydong authored a paper 3 months ago

BOLT: Bootstrap Long Chain-of-Thought in Language Models without Distillation

hendrydong authored a paper 3 months ago

Reward-Guided Speculative Decoding for Efficient LLM Reasoning

View all activity

raftrsf's activity

Chenlu123

authored a paper about 2 months ago

Self-rewarding correction for mathematical reasoning

Paper • 2502.19613 • Published Feb 26 • 84

hendrydong

authored 2 papers 3 months ago

BOLT: Bootstrap Long Chain-of-Thought in Language Models without Distillation

Paper • 2502.03860 • Published Feb 6 • 24

Reward-Guided Speculative Decoding for Efficient LLM Reasoning

Paper • 2501.19324 • Published Jan 31 • 39

hendrydong

authored a paper 4 months ago

Offline Reinforcement Learning for LLM Multi-Step Reasoning

Paper • 2412.16145 • Published Dec 20, 2024 • 39

hendrydong

authored a paper 7 months ago

MathHay: An Automated Benchmark for Long-Context Mathematical Reasoning in LLMs

Paper • 2410.04698 • Published Oct 7, 2024 • 13

hendrydong

authored a paper 9 months ago

ThinK: Thinner Key Cache by Query-Driven Pruning

Paper • 2407.21018 • Published Jul 30, 2024 • 33

weqweasdas

updated a model 10 months ago

raftrsf/sfr_raft_iter5_2epoch

Text Generation • Updated Jun 17, 2024

weqweasdas

updated 2 datasets 10 months ago

raftrsf/sfr_concise_iter5_top1

Viewer • Updated Jun 14, 2024 • 20k • 24

raftrsf/sfr_concise_iter5_k32_with_rewards

Viewer • Updated Jun 14, 2024 • 20k • 23

weqweasdas

updated 2 models 10 months ago

raftrsf/sfr_raft_iter4_2epoch

Text Generation • Updated Jun 13, 2024 • 2

raftrsf/sfr_raft_iter4

Text Generation • Updated Jun 13, 2024 • 4

weqweasdas

updated 2 datasets 11 months ago

raftrsf/sfr_concise_iter4_top1

Viewer • Updated Jun 12, 2024 • 20k • 27

raftrsf/sfr_concise_iter4_k32_with_rewards

Viewer • Updated Jun 12, 2024 • 20k • 27

weqweasdas

updated a model 11 months ago

raftrsf/pair_pref

Text Generation • Updated May 18, 2024 • 3

weqweasdas

updated a dataset 11 months ago

raftrsf/ipo_eval_data_baseline.json

Viewer • Updated May 18, 2024 • 7.62k • 19

weqweasdas

authored a paper 11 months ago

RLHF Workflow: From Reward Modeling to Online RLHF

Paper • 2405.07863 • Published May 13, 2024 • 71

hendrydong

authored 3 papers 11 months ago

Reverse Diffusion Monte Carlo

Paper • 2307.02037 • Published Jul 5, 2023 • 1

Spurious Feature Diversification Improves Out-of-distribution Generalization

Paper • 2309.17230 • Published Sep 29, 2023

Iterative Preference Learning from Human Feedback: Bridging Theory and Practice for RLHF under KL-Constraint

Paper • 2312.11456 • Published Dec 18, 2023 • 1

weqweasdas

authored a paper 11 months ago

Iterative Preference Learning from Human Feedback: Bridging Theory and Practice for RLHF under KL-Constraint

Paper • 2312.11456 • Published Dec 18, 2023 • 1