SampleMix: A Sample-wise Pre-training Data Mixing Strategey by Coordinating Data Quality and Diversity Paper • 2503.01506 • Published 6 days ago • 8
Running 2.13k 2.13k The Ultra-Scale Playbook 🌌 The ultimate guide to training LLM on large GPU Clusters
Large Language Models Meet Open-World Intent Discovery and Recognition: An Evaluation of ChatGPT Paper • 2310.10176 • Published Oct 16, 2023 • 1
Lifting the Curse of Capacity Gap in Distilling Language Models Paper • 2305.12129 • Published May 20, 2023
Retrieval-based Knowledge Transfer: An Effective Approach for Extreme Large Language Model Compression Paper • 2310.15594 • Published Oct 24, 2023 • 1
Improving Document Representations by Generating Pseudo Query Embeddings for Dense Retrieval Paper • 2105.03599 • Published May 8, 2021
FutureTOD: Teaching Future Knowledge to Pre-trained Language Model for Task-Oriented Dialogue Paper • 2306.10315 • Published Jun 17, 2023 • 1
DolphCoder: Echo-Locating Code Large Language Models with Diverse and Multi-Objective Instruction Tuning Paper • 2402.09136 • Published Feb 14, 2024 • 1
XPrompt: Exploring the Extreme of Prompt Tuning Paper • 2210.04457 • Published Oct 10, 2022 • 1
Semi-Supervised Knowledge-Grounded Pre-training for Task-Oriented Dialog Systems Paper • 2210.08873 • Published Oct 17, 2022 • 1
Parallel Decoding via Hidden Transfer for Lossless Large Language Model Acceleration Paper • 2404.12022 • Published Apr 18, 2024
Speculative Decoding via Early-exiting for Faster LLM Inference with Thompson Sampling Control Mechanism Paper • 2406.03853 • Published Jun 6, 2024
What's Wrong with Your Code Generated by Large Language Models? An Extensive Study Paper • 2407.06153 • Published Jul 8, 2024
How Do Your Code LLMs Perform? Empowering Code Instruction Tuning with High-Quality Data Paper • 2409.03810 • Published Sep 5, 2024 • 35
ReMamba: Equip Mamba with Effective Long-Sequence Modeling Paper • 2408.15496 • Published Aug 28, 2024 • 12
How Do Your Code LLMs Perform? Empowering Code Instruction Tuning with High-Quality Data Paper • 2409.03810 • Published Sep 5, 2024 • 35
ReMamba: Equip Mamba with Effective Long-Sequence Modeling Paper • 2408.15496 • Published Aug 28, 2024 • 12