Offline Reinforcement Learning for LLM Multi-Step Reasoning Paper โข 2412.16145 โข Published 6 days ago โข 33
RL Zero: Zero-Shot Language to Behaviors without any Supervision Paper โข 2412.05718 โข Published 19 days ago โข 4
Llama 3.2 Collection This collection hosts the transformers and original repos of the Llama 3.2 and Llama Guard 3 โข 15 items โข Updated 21 days ago โข 548