Safer-Instruct: Aligning Language Models with Automated Preference Data Paper • 2311.08685 • Published Nov 15, 2023 • 1
CLIMB: A Benchmark of Clinical Bias in Large Language Models Paper • 2407.05250 • Published Jul 7, 2024 • 2
On the Trustworthiness of Generative Foundation Models: Guideline, Assessment, and Perspective Paper • 2502.14296 • Published 3 days ago • 41
WildFeedback: Aligning LLMs With In-situ User Interactions And Feedback Paper • 2408.15549 • Published Aug 28, 2024 • 1
Detecting and Filtering Unsafe Training Data via Data Attribution Paper • 2502.11411 • Published 6 days ago • 1