andrewbai/distilabel-intel-orca-dpo-pairs_filtered_pref-skywork-8B Viewer • Updated Mar 11 • 6.42k • 28
andrewbai/distilabel-intel-orca-dpo-pairs_filtered_pref-skywork-8B Viewer • Updated Mar 11 • 6.42k • 28
R1-Zero's "Aha Moment" in Visual Reasoning on a 2B Non-SFT Model Paper • 2503.05132 • Published Mar 7 • 56
Defending LLMs against Jailbreaking Attacks via Backtranslation Paper • 2402.16459 • Published Feb 26, 2024 • 4