Speculative Knowledge Distillation: Bridging the Teacher-Student Gap Through Interleaved Sampling Paper • 2410.11325 • Published Oct 15, 2024 • 1
BPO: Supercharging Online Preference Learning by Adhering to the Proximity of Behavior LLM Paper • 2406.12168 • Published Jun 18, 2024 • 7
OpenAssistant/reward-model-deberta-v3-large-v2 Text Classification • Updated Feb 1, 2023 • 14.1k • • 214