google/gemma-7b · A new idea to improve training and inference performance

Hello Google Team, I have an idea to significantly improve LLM performance: https://www.kaggle.com/code/vasilypodorov/fast-language-modelling-with-un-formers

There I have trained 0.7B parameter LLM of a new architecture with a throughput of approximately 0.7B tokens per hour on TPU v3-8. The details are in the article referenced above.

Could you read it and say if it makes sense? I would like you to try training small LLM based on this technique to decide whether it is useful or not. This would take just several TPU days.