Rl - a anujga Collection

anujga 's Collections

Special

PT

Persona

Sft

O1

Rl

Theory

agent

Rl

updated 2 days ago

RLCD: Reinforcement Learning from Contrast Distillation for Language Model Alignment

Paper • 2307.12950 • Published Jul 24, 2023 • 10
HumanLLMs/Human-Like-DPO-Dataset

Viewer • Updated 23 days ago • 10.9k • 2.92k • 187
sam-paech/gutenberg3-generalfiction-scifi-fantasy-romance-adventure-dpo

Viewer • Updated Oct 23, 2024 • 5.65k • 153 • 21
RLHFlow/Deepseek-PRM-Data

Viewer • Updated Nov 9, 2024 • 253k • 170 • 12
RLHFlow/DS-and-Mistral-PRM-Data

Viewer • Updated Nov 10, 2024 • 526k • 52
TIGER-Lab/WebInstruct-CFT

Viewer • Updated 3 days ago • 654k • 102 • 30