ProcessBench: Identifying Process Errors in Mathematical Reasoning Paper • 2412.06559 • Published 16 days ago • 68
U-MATH and μ-MATH - University-level math evaluation Collection Paper: A UNIVERSITY-LEVEL BENCHMARK FOR EVALUATING MATHEMATICAL SKILLS IN LLMS • 3 items • Updated 13 days ago • 15
U-MATH: A University-Level Benchmark for Evaluating Mathematical Skills in LLMs Paper • 2412.03205 • Published 21 days ago • 15