Model Tells You What to Discard: Adaptive KV Cache Compression for LLMs Paper • 2310.01801 • Published Oct 3, 2023 • 3
DeepSpeed4Science Initiative: Enabling Large-Scale Scientific Discovery through Sophisticated AI System Technologies Paper • 2310.04610 • Published Oct 6, 2023 • 1
Computing in the Era of Large Generative Models: From Cloud-Native to AI-Native Paper • 2401.12230 • Published Jan 17, 2024 • 1
DeepSpeed-Chat: Easy, Fast and Affordable RLHF Training of ChatGPT-like Models at All Scales Paper • 2308.01320 • Published Aug 2, 2023 • 45
DeepSpeed-MoE: Advancing Mixture-of-Experts Inference and Training to Power Next-Generation AI Scale Paper • 2201.05596 • Published Jan 14, 2022 • 2
BLOOM: A 176B-Parameter Open-Access Multilingual Language Model Paper • 2211.05100 • Published Nov 9, 2022 • 29
Random-LTD: Random and Layerwise Token Dropping Brings Efficient Training for Large-scale Transformers Paper • 2211.11586 • Published Nov 17, 2022 • 1
Universal Checkpointing: Efficient and Flexible Checkpointing for Large Scale Distributed Training Paper • 2406.18820 • Published Jun 27, 2024
Model Tells You Where to Merge: Adaptive KV Cache Merging for LLMs on Long-Context Tasks Paper • 2407.08454 • Published Jul 11, 2024
DeepSpeed Inference: Enabling Efficient Inference of Transformer Models at Unprecedented Scale Paper • 2207.00032 • Published Jun 30, 2022
Next Token Prediction Towards Multimodal Intelligence: A Comprehensive Survey Paper • 2412.18619 • Published Dec 16, 2024 • 55
PhysGen: Rigid-Body Physics-Grounded Image-to-Video Generation Paper • 2409.18964 • Published Sep 27, 2024 • 26
SingleInsert: Inserting New Concepts from a Single Image into Text-to-Image Models for Flexible Editing Paper • 2310.08094 • Published Oct 12, 2023 • 1
CCPL: Contrastive Coherence Preserving Loss for Versatile Style Transfer Paper • 2207.04808 • Published Jul 11, 2022
UltraEdit: Instruction-based Fine-Grained Image Editing at Scale Paper • 2407.05282 • Published Jul 7, 2024 • 13
Iterative Nash Policy Optimization: Aligning LLMs with General Preferences via No-Regret Learning Paper • 2407.00617 • Published Jun 30, 2024 • 7
Offline Learning in Markov Games with General Function Approximation Paper • 2302.02571 • Published Feb 6, 2023
CIDAR: Culturally Relevant Instruction Dataset For Arabic Paper • 2402.03177 • Published Feb 5, 2024 • 6