RLRM

community

Activity Feed

AI & ML interests

None defined yet.

Recent Activity

yuchenlin authored a paper 14 days ago

CrossWordBench: Evaluating the Reasoning Capabilities of LLMs and LVLMs with Controllable Puzzle Generation

DongfuJiang updated a model 28 days ago

RLRM/big_math_rl_pair_ct_7B

DongfuJiang published a model 28 days ago

RLRM/big_math_rl_pair_ct_7B

View all activity

RLRM's activity

yuchenlin

authored a paper 14 days ago

CrossWordBench: Evaluating the Reasoning Capabilities of LLMs and LVLMs with Controllable Puzzle Generation

Paper • 2504.00043 • Published 24 days ago • 9

DongfuJiang

updated a model 28 days ago

RLRM/big_math_rl_pair_ct_7B

Updated 28 days ago • 504

DongfuJiang

published a model 28 days ago

RLRM/big_math_rl_pair_ct_7B

Updated 28 days ago • 504

DongfuJiang

updated a dataset about 1 month ago

RLRM/Big-Math-RL-Verified-CT-7B

Viewer • Updated Mar 14 • 251k • 192

DongfuJiang

published a dataset about 1 month ago

RLRM/Big-Math-RL-Verified-CT-7B

Viewer • Updated Mar 14 • 251k • 192

yuchenlin

updated a dataset about 1 month ago

RLRM/Big-Math-RL-Verified-CT

Viewer • Updated Mar 14 • 251k • 13

yuchenlin

published a dataset about 2 months ago

RLRM/Big-Math-RL-Verified-CT

Viewer • Updated Mar 14 • 251k • 13

yuchenlin

authored a paper 2 months ago

Small Models Struggle to Learn from Strong Reasoners

Paper • 2502.12143 • Published Feb 17 • 35

DongfuJiang

authored a paper 3 months ago

ACECODER: Acing Coder RL via Automated Test-Case Synthesis

Paper • 2502.01718 • Published Feb 3 • 28

yuchenlin

authored a paper 3 months ago

ZebraLogic: On the Scaling Limits of LLMs for Logical Reasoning

Paper • 2502.01100 • Published Feb 3 • 17

yuchenlin

authored a paper 6 months ago

On Memorization of Large Language Models in Logical Reasoning

Paper • 2410.23123 • Published Oct 30, 2024 • 18

DongfuJiang

authored a paper 6 months ago

MEGA-Bench: Scaling Multimodal Evaluation to over 500 Real-World Tasks

Paper • 2410.10563 • Published Oct 14, 2024 • 39

yuchenlin

authored 2 papers 10 months ago

WildGuard: Open One-Stop Moderation Tools for Safety Risks, Jailbreaks, and Refusals of LLMs

Paper • 2406.18495 • Published Jun 26, 2024 • 13

MantisScore: Building Automatic Metrics to Simulate Fine-grained Human Feedback for Video Generation

Paper • 2406.15252 • Published Jun 21, 2024 • 16

DongfuJiang

authored 2 papers 10 months ago

MantisScore: Building Automatic Metrics to Simulate Fine-grained Human Feedback for Video Generation

Paper • 2406.15252 • Published Jun 21, 2024 • 16

WildVision: Evaluating Vision-Language Models in the Wild with Human Preferences

Paper • 2406.11069 • Published Jun 16, 2024 • 14

yuchenlin

authored 2 papers 10 months ago

WildVision: Evaluating Vision-Language Models in the Wild with Human Preferences

Paper • 2406.11069 • Published Jun 16, 2024 • 14

Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing

Paper • 2406.08464 • Published Jun 12, 2024 • 70

yuchenlin

authored a paper 11 months ago

WildBench: Benchmarking LLMs with Challenging Tasks from Real Users in the Wild

Paper • 2406.04770 • Published Jun 7, 2024 • 31

DongfuJiang

authored a paper 11 months ago

GenAI Arena: An Open Evaluation Platform for Generative Models

Paper • 2406.04485 • Published Jun 6, 2024 • 23

AI & ML interests

Recent Activity

Team members 2

RLRM's activity