RetMask Collection Trained checkpoints for the paper "From Interpretability to Performance: Optimizing Retrieval Heads for Long-Context Language Models" • 4 items • Updated Apr 21
tokyotech-llm/Llama-3.1-Swallow-8B-Instruct-v0.5 Text Generation • 8B • Updated Jun 25, 2025 • 2.09k • • 19
tokyotech-llm/Llama-3.1-Swallow-8B-Instruct-v0.5 Text Generation • 8B • Updated Jun 25, 2025 • 2.09k • • 19
tokyotech-llm/Llama-3.3-Swallow-70B-Instruct-v0.4 Text Generation • 71B • Updated Jul 1, 2025 • 170 • • 13
tokyotech-llm/Llama-3.1-Swallow-70B-Instruct-v0.3 Text Generation • 71B • Updated Apr 2, 2025 • 845 • • 13
tokyotech-llm/Llama-3.1-Swallow-8B-Instruct-v0.3 Text Generation • 8B • Updated Apr 2, 2025 • 7.78k • • 24
tokyotech-llm/Llama-3.1-Swallow-8B-Instruct-v0.2 Text Generation • 8B • Updated Apr 2, 2025 • 36 • • 16