Gecko: Versatile Text Embeddings Distilled from Large Language Models Paper • 2403.20327 • Published Mar 29, 2024 • 47
Round and Round We Go! What makes Rotary Positional Encodings useful? Paper • 2410.06205 • Published Oct 8, 2024 • 1
Byte Latent Transformer: Patches Scale Better Than Tokens Paper • 2412.09871 • Published about 1 month ago • 86
MrT5: Dynamic Token Merging for Efficient Byte-level Language Models Paper • 2410.20771 • Published Oct 28, 2024 • 3