Black-Box Prompt Optimization: Aligning Large Language Models without Model Training Paper • 2311.04155 • Published Nov 7, 2023 • 1
CritiqueLLM: Scaling LLM-as-Critic for Effective and Explainable Evaluation of Large Language Model Generation Paper • 2311.18702 • Published Nov 30, 2023
AlignBench: Benchmarking Chinese Alignment of Large Language Models Paper • 2311.18743 • Published Nov 30, 2023 • 1
On the Safety of Conversational Models: Taxonomy, Dataset, and Benchmark Paper • 2110.08466 • Published Oct 16, 2021
PAL: Persona-Augmented Emotional Support Conversation Generation Paper • 2212.09235 • Published Dec 19, 2022
Recent Advances towards Safe, Responsible, and Moral Dialogue Systems: A Survey Paper • 2302.09270 • Published Feb 18, 2023
LogicGame: Benchmarking Rule-Based Reasoning Abilities of Large Language Models Paper • 2408.15778 • Published Aug 28
SPaR: Self-Play with Tree-Search Refinement to Improve Instruction-Following in Large Language Models Paper • 2412.11605 • Published 12 days ago • 15
SPaR: Self-Play with Tree-Search Refinement to Improve Instruction-Following in Large Language Models Paper • 2412.11605 • Published 12 days ago • 15
SPaR: Self-Play with Tree-Search Refinement to Improve Instruction-Following in Large Language Models Paper • 2412.11605 • Published 12 days ago • 15 • 2
AndroidLab: Training and Systematic Benchmarking of Android Autonomous Agents Paper • 2410.24024 • Published Oct 31 • 48
AutoDetect: Towards a Unified Framework for Automated Weakness Detection in Large Language Models Paper • 2406.16714 • Published Jun 24 • 10
AutoDetect: Towards a Unified Framework for Automated Weakness Detection in Large Language Models Paper • 2406.16714 • Published Jun 24 • 10 • 2
AutoDetect: Towards a Unified Framework for Automated Weakness Detection in Large Language Models Paper • 2406.16714 • Published Jun 24 • 10
Black-Box Prompt Optimization: Aligning Large Language Models without Model Training Paper • 2311.04155 • Published Nov 7, 2023 • 1