Compressing KV Cache for Long-Context LLM Inference with Inter-Layer Attention Similarity Paper โข 2412.02252 โข Published Dec 3, 2024 โข 2
TransMLA: Multi-head Latent Attention Is All You Need Paper โข 2502.07864 โข Published Feb 11 โข 49
Kimi k1.5: Scaling Reinforcement Learning with LLMs Paper โข 2501.12599 โข Published Jan 22 โข 113
Hibiki fr-en Collection Hibiki is a model for streaming speech translation , which can run on device! See https://github.com/kyutai-labs/hibiki. โข 5 items โข Updated Feb 6 โข 52