--- license: apache-2.0 --- # Llama-124M-experimental-pretrain This is an experimental pretraining run done solely on a home PC. ### Model Description - **Training code** adapted from https://github.com/Lightning-AI/litgpt . - **Cost:** Around 20 RMB ($3). - **Model architecture:** Transformer decoder with gated SiLU MLP, RMS Norm, RoPE positional embedding, and grouped query attention. - **Language(s) (NLP):** Mainly English. - **License:** apache-2.0 - **Parameter count:** 124M (0.124B) ## Uses After downloading this repository, run ``` litgpt generate "./Llama-124M-experimental-pretrain --prompt "What is GPT-4? GPT-4 is" ``` The output will look something like: ``` What is GPT-4? GPT-4 is an extremely powerful, highly immersive, and powerful, in the sense that it is able to be used to help you deal with various technical issues, while still providing an easy to use experience that will help you get better and faster results. It Time for inference 1: 0.42 sec total, 119.97 tokens/sec Memory used: 0.27 GB ``` ## Bias, Risks, and Limitations This model is too small to avoid hallucinations, and there is no code in the training dataset. Dont expect this model to provide any sort of assistance. Just for fun. ## Training Details ### Training Data This model is trained on https://huggingface.co/datasets/EleutherAI/rpj-v2-sample for two epochs, with a total of 19 billion tokens. The trained context length is 2048. #### Training Hyperparameters - **Training regime:** bf16-mixed. - **Learning rate:** Cosine schedule from 5e-4 to 5e-5. #### Speeds The training run lasted for approximately 43 hours on one PC with 1x RTX 4090. ## Evaluation | Tasks |Version|Filter|n-shot| Metric | | Value | |Stderr| |--------------|------:|------|-----:|----------|---|------:|---|-----:| |arc_easy | 1|none | 0|acc |↑ | 0.3969|± |0.0100| | | |none | 0|acc_norm |↑ | 0.3628|± |0.0099| |lambada_openai| 1|none | 0|acc |↑ | 0.2626|± |0.0061| | | |none | 0|perplexity|↓ |71.1943|± |2.8730| |piqa | 1|none | 0|acc |↑ | 0.5871|± |0.0115| | | |none | 0|acc_norm |↑ | 0.5843|± |0.0115| |sciq | 1|none | 0|acc |↑ | 0.6940|± |0.0146| | | |none | 0|acc_norm |↑ | 0.5970|± |0.0155| ## Environmental Impact - **Hardware Type:** RTX 4090 x 1 - **Hours used:** 44 - **Carbon Emitted:** 6.6 kg of CO2.