Papers - Training - Scaling - Compute Optimal Collection by matlok 18 days ago - Byte Latent Transformer: Patches Scale Better Than Tokens Paper • 2412.09871 • Published about 1 month ago • 85
Byte Latent Transformer: Patches Scale Better Than Tokens Paper • 2412.09871 • Published about 1 month ago • 85
Papers - Attention - Flex Attention https://pytorch.org/blog/flexattention/ Collection by matlok 18 days ago - Byte Latent Transformer: Patches Scale Better Than Tokens Paper • 2412.09871 • Published about 1 month ago • 85
Byte Latent Transformer: Patches Scale Better Than Tokens Paper • 2412.09871 • Published about 1 month ago • 85
Papers - Embeddings - Bytes - BPB - Tokenzr Free Perplexity Collection by matlok 18 days ago - Byte Latent Transformer: Patches Scale Better Than Tokens Paper • 2412.09871 • Published about 1 month ago • 85
Byte Latent Transformer: Patches Scale Better Than Tokens Paper • 2412.09871 • Published about 1 month ago • 85
Papers - Embeddings - Bytes - Flops - Input Layer Lookup Collection by matlok 18 days ago - Byte Latent Transformer: Patches Scale Better Than Tokens Paper • 2412.09871 • Published about 1 month ago • 85
Byte Latent Transformer: Patches Scale Better Than Tokens Paper • 2412.09871 • Published about 1 month ago • 85
Papers - Training - Embeddings Model - Bytes - Entropy Model Collection by matlok 18 days ago - Byte Latent Transformer: Patches Scale Better Than Tokens Paper • 2412.09871 • Published about 1 month ago • 85
Byte Latent Transformer: Patches Scale Better Than Tokens Paper • 2412.09871 • Published about 1 month ago • 85
Papers - Attention - Bytes - Patch Cross Attention Collection by matlok 18 days ago - Byte Latent Transformer: Patches Scale Better Than Tokens Paper • 2412.09871 • Published about 1 month ago • 85
Byte Latent Transformer: Patches Scale Better Than Tokens Paper • 2412.09871 • Published about 1 month ago • 85
Papers - Attention - Bytes - MHA Cross Attention - Perceiver Collection by matlok 18 days ago - Byte Latent Transformer: Patches Scale Better Than Tokens Paper • 2412.09871 • Published about 1 month ago • 85
Byte Latent Transformer: Patches Scale Better Than Tokens Paper • 2412.09871 • Published about 1 month ago • 85
Papers - Embeddings - Text - Byte - Hash ngrams Collection by matlok 18 days ago - Byte Latent Transformer: Patches Scale Better Than Tokens Paper • 2412.09871 • Published about 1 month ago • 85
Byte Latent Transformer: Patches Scale Better Than Tokens Paper • 2412.09871 • Published about 1 month ago • 85
Papers - Attention - Block Causal Collection by matlok 18 days ago - Byte Latent Transformer: Patches Scale Better Than Tokens Paper • 2412.09871 • Published about 1 month ago • 85
Byte Latent Transformer: Patches Scale Better Than Tokens Paper • 2412.09871 • Published about 1 month ago • 85
Papers - Tokenizers - Bytes - Incremental Patching Note: BPE does not handle incremental patching like BLT Collection by matlok 18 days ago - Byte Latent Transformer: Patches Scale Better Than Tokens Paper • 2412.09871 • Published about 1 month ago • 85
Byte Latent Transformer: Patches Scale Better Than Tokens Paper • 2412.09871 • Published about 1 month ago • 85