Frac-Connections: Fractional Extension of Hyper-Connections
Abstract
Residual connections are central to modern deep learning architectures, enabling the training of very deep networks by mitigating gradient vanishing. Hyper-Connections recently generalized residual connections by introducing multiple connection strengths at different depths, thereby addressing the seesaw effect between gradient vanishing and representation collapse. However, Hyper-Connections increase memory access costs by expanding the width of hidden states. In this paper, we propose Frac-Connections, a novel approach that divides hidden states into multiple parts rather than expanding their width. Frac-Connections retain partial benefits of Hyper-Connections while reducing memory consumption. To validate their effectiveness, we conduct large-scale experiments on language tasks, with the largest being a 7B MoE model trained on up to 3T tokens, demonstrating that Frac-Connections significantly outperform residual connections.
Community
Congratulations, I really like this series of work.
I would like to ask if the authors have plans to open-source the weights :)
Thank you very much for your kind words! We’re glad you like our work. Yes, we do have plans to open-source the weights, and we will announce it once everything is ready. Stay tuned!
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- Hierarchical Residuals Exploit Brain-Inspired Compositionality (2025)
- DeepCrossAttention: Supercharging Transformer Residual Connections (2025)
- You Do Not Fully Utilize Transformer's Representation Capacity (2025)
- Accelerated Training through Iterative Gradient Propagation Along the Residual Path (2025)
- Autoregressive Generation of Static and Growing Trees (2025)
- HybridNorm: Towards Stable and Efficient Transformer Training via Hybrid Normalization (2025)
- MUDDFormer: Breaking Residual Bottlenecks in Transformers via Multiway Dynamic Dense Connections (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper