CrossWordBench: Evaluating the Reasoning Capabilities of LLMs and LVLMs with Controllable Puzzle Generation Paper • 2504.00043 • Published 21 days ago • 8