nvidia/Llama-3.1-Nemotron-8B-UltraLong-1M-Instruct Text Generation โข Updated 4 days ago โข 2.21k โข 38
Token Assorted: Mixing Latent and Text Tokens for Improved Language Model Reasoning Paper โข 2502.03275 โข Published Feb 5 โข 17