From Generation to Judgment: Opportunities and Challenges of LLM-as-a-judge Paper • 2411.16594 • Published about 1 month ago • 36
ClinicalBench: Can LLMs Beat Traditional ML Models in Clinical Prediction? Paper • 2411.06469 • Published Nov 10 • 17
Authorship Attribution in the Era of LLMs: Problems, Methodologies, and Challenges Paper • 2408.08946 • Published Aug 16 • 11
AgentPoison: Red-teaming LLM Agents via Poisoning Memory or Knowledge Bases Paper • 2407.12784 • Published Jul 17 • 48
MJ-Bench: Is Your Multimodal Reward Model Really a Good Judge for Text-to-Image Generation? Paper • 2407.04842 • Published Jul 5 • 52
Introducing v0.5 of the AI Safety Benchmark from MLCommons Paper • 2404.12241 • Published Apr 18 • 10
Meta Llama 3 Collection This collection hosts the transformers and original repos of the Meta Llama 3 and Llama Guard 2 releases • 5 items • Updated 20 days ago • 697