Medusa: Simple LLM Inference Acceleration Framework with Multiple Decoding Heads Paper • 2401.10774 • Published Jan 19, 2024 • 55
Running on CPU Upgrade 12.9k 12.9k Open LLM Leaderboard 🏆 Track, rank and evaluate open LLMs and chatbots