Magnushammer: A Transformer-based Approach to Premise Selection Paper • 2303.04488 • Published Mar 8, 2023
Structured Packing in LLM Training Improves Long Context Utilization Paper • 2312.17296 • Published Dec 28, 2023 • 2
Hierarchical Transformers Are More Efficient Language Models Paper • 2110.13711 • Published Oct 26, 2021
Analysing The Impact of Sequence Composition on Language Model Pre-Training Paper • 2402.13991 • Published Feb 21 • 1
Structured Packing in LLM Training Improves Long Context Utilization Paper • 2312.17296 • Published Dec 28, 2023 • 2