# Increasing Speed * Integrate Flash Attention 2.0 cuda, significant speed up * Utilize 8BIT Optimizer from BNB, big speed up weakness => bnb isn't compatible with all gpus * Use a better tokenizer TokenMonster? * Parallelize the transformer blocks similar to that of [PALMS](https://github.com/conceptofmind/PaLM) * Look into MPTS config for LION for pretraining, did they use high batch size?