LLM-Drop Collection Model weights of paper "What Matters in Transformers? Not All Attention is Needed" (https://arxiv.org/abs/2406.15786) • 14 items • Updated Oct 23, 2024 • 4
Router-Tuning: A Simple and Effective Approach for Enabling Dynamic-Depth in Transformers Paper • 2410.13184 • Published Oct 17, 2024 • 2
Router-Tuning: A Simple and Effective Approach for Enabling Dynamic-Depth in Transformers Paper • 2410.13184 • Published Oct 17, 2024 • 2
Reflection-Tuning: Data Recycling Improves LLM Instruction-Tuning Paper • 2310.11716 • Published Oct 18, 2023 • 5
Merging Experts into One: Improving Computational Efficiency of Mixture of Experts Paper • 2310.09832 • Published Oct 15, 2023 • 1
Vega-MT: The JD Explore Academy Translation System for WMT22 Paper • 2209.09444 • Published Sep 20, 2022
Superfiltering: Weak-to-Strong Data Filtering for Fast Instruction-Tuning Paper • 2402.00530 • Published Feb 1, 2024 • 1
SparseAdapter: An Easy Approach for Improving the Parameter-Efficiency of Adapters Paper • 2210.04284 • Published Oct 9, 2022