Training Large Language Models to Reason in a Continuous Latent Space Paper • 2412.06769 • Published 16 days ago • 62
Byte Latent Transformer: Patches Scale Better Than Tokens Paper • 2412.09871 • Published 13 days ago • 74