Dedicated Feedback and Edit Models Empower Inference-Time Scaling for Open-Ended General-Domain Tasks Paper • 2503.04378 • Published Mar 6 • 7
Minitron Collection A family of compressed models obtained via pruning and knowledge distillation • 12 items • Updated about 10 hours ago • 61
SSMs Collection A collection of Mamba-2-based research models with 8B parameters trained on 3.5T tokens for comparison with Transformers. • 5 items • Updated about 10 hours ago • 27
RLHF Collection A collection of models trained with Reinforcement Learning from Human Feedback (RLHF). • 4 items • Updated about 10 hours ago • 5