view article Article Introducing multi-backends (TRT-LLM, vLLM) support for Text Generation Inference 18 days ago • 62
DataGemma Release Collection A series of pioneering open models that help ground LLMs in real-world data through Data Commons. • 2 items • Updated Dec 13, 2024 • 85
LLM in a flash: Efficient Large Language Model Inference with Limited Memory Paper • 2312.11514 • Published Dec 12, 2023 • 259