brrr-gang (Brrr Gang)

valpy

authored 4 papers 4 months ago

2 OLMo 2 Furious

Paper • 2501.00656 • Published Dec 31, 2024 • 22

IssueBench: Millions of Realistic Prompts for Measuring Issue Bias in LLM Writing Assistance

Paper • 2502.08395 • Published Feb 12

RewardBench 2: Advancing Reward Model Evaluation

Paper • 2506.01937 • Published Jun 2 • 7

Generalizing Verifiable Instruction Following

Paper • 2507.02833 • Published Jul 3 • 1

natolambert

authored a paper 7 months ago

Reinforcement Learning from Human Feedback

Paper • 2504.12501 • Published Apr 16 • 4

soldni

authored a paper 7 months ago

OLMoTrace: Tracing Language Model Outputs Back to Trillions of Training Tokens

Paper • 2504.07096 • Published Apr 9 • 76

hamishivi

authored 3 papers 9 months ago

Personalizing Reinforcement Learning from Human Feedback with Variational Preference Learning

Paper • 2408.10075 • Published Aug 19, 2024

2 OLMo 2 Furious

Paper • 2501.00656 • Published Dec 31, 2024 • 22

Large-Scale Data Selection for Instruction Tuning

Paper • 2503.01807 • Published Mar 3 • 13

LouisCastricato

authored a paper 10 months ago

Towards System 2 Reasoning in LLMs: Learning How to Think With Meta Chain-of-Though

Paper • 2501.04682 • Published Jan 8 • 99

natolambert

authored 9 papers 11 months ago

Unpacking DPO and PPO: Disentangling Best Practices for Learning from Preference Feedback

Paper • 2406.09279 • Published Jun 13, 2024 • 3

WildGuard: Open One-Stop Moderation Tools for Safety Risks, Jailbreaks, and Refusals of LLMs

Paper • 2406.18495 • Published Jun 26, 2024 • 13

Towards a Framework for Openness in Foundation Models: Proceedings from the Columbia Convening on Openness in Artificial Intelligence

Paper • 2405.15802 • Published May 17, 2024

Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Multimodal Models

Paper • 2409.17146 • Published Sep 25, 2024 • 121

2 OLMo 2 Furious

Paper • 2501.00656 • Published Dec 31, 2024 • 22

vwxyzjn

authored a paper 11 months ago

The N+ Implementation Details of RLHF with PPO: A Case Study on TL;DR Summarization

Paper • 2403.17031 • Published Mar 24, 2024 • 6

AI & ML interests

Team members 9

brrr-gang's activity