What is the model architecture?
#2
by
ewre324
- opened
Hello, nice to see such small model. I was wondering if the model is based on Llama/TinyLlama architecture?
Also can the authors please provide the steps taken to train the model?
Hi! Thanks for the kind words. You can find the model arch on the base model’s page: https://huggingface.co/BEE-spoke-data/smol_llama-220M-GQA
It’s a ”smol” version of the llama2 architecture with 1024 hidden size. We finetuned the model from the base pretrain with axolotl: https://openaccess-ai-collective.github.io/axolotl/