Forgetting Transformer: Softmax Attention with a Forget Gate Paper โข 2503.02130 โข Published Mar 3 โข 29
SWE-RL: Advancing LLM Reasoning via Reinforcement Learning on Open Software Evolution Paper โข 2502.18449 โข Published Feb 25 โข 73
Training Large Language Models to Reason in a Continuous Latent Space Paper โข 2412.06769 โข Published Dec 9, 2024 โข 82
Breaking the Memory Barrier: Near Infinite Batch Size Scaling for Contrastive Loss Paper โข 2410.17243 โข Published Oct 22, 2024 โข 93
Llama 3.2 Collection This collection hosts the transformers and original repos of the Llama 3.2 and Llama Guard 3 โข 15 items โข Updated Dec 6, 2024 โข 586
nabla^2DFT: A Universal Quantum Chemistry Dataset of Drug-Like Molecules and a Benchmark for Neural Network Potentials Paper โข 2406.14347 โข Published Jun 20, 2024 โข 101