ZhangRC commited on
Commit
4bd265d
1 Parent(s): dc12f85

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +79 -3
README.md CHANGED
@@ -1,3 +1,79 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ ---
4
+
5
+ # Llama-124M-experimental-pretrain
6
+
7
+ <!-- Provide a quick summary of what the model is/does. -->
8
+
9
+ This is an experimental pretraining run done solely on a home PC.
10
+
11
+ ### Model Description
12
+
13
+ <!-- Provide a longer summary of what this model is. -->
14
+
15
+ - **Training code** adapted from https://github.com/Lightning-AI/litgpt .
16
+ - **Cost:** Around 20 RMB ($3).
17
+ - **Model architecture:** Transformer decoder with gated SiLU MLP, RMS Norm, RoPE positional embedding, and grouped query attention.
18
+ - **Language(s) (NLP):** Mainly English.
19
+ - **License:** apache-2.0
20
+ - **Parameter count:** 124M (0.124B)
21
+
22
+ ## Uses
23
+ After downloading this repository, run
24
+ ```
25
+ litgpt generate "./Llama-124M-experimental-pretrain --prompt "What is GPT-4? GPT-4 is"
26
+ ```
27
+ The output will look something like:
28
+ ```
29
+ What is GPT-4? GPT-4 is an extremely powerful, highly immersive, and powerful, in the sense that it is able to be used to help you deal with various technical issues, while still providing an easy to use experience that will help you get better and faster results. It
30
+ Time for inference 1: 0.42 sec total, 119.97 tokens/sec
31
+ Memory used: 0.27 GB
32
+ ```
33
+
34
+ ## Bias, Risks, and Limitations
35
+
36
+ <!-- This section is meant to convey both technical and sociotechnical limitations. -->
37
+ This model is too small to avoid hallucinations, and there is no code in the training dataset. Dont expect this model to provide any sort of assistance. Just for fun.
38
+
39
+ ## Training Details
40
+
41
+ ### Training Data
42
+
43
+ <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
44
+
45
+ This model is trained on https://huggingface.co/datasets/EleutherAI/rpj-v2-sample for two epochs, with a total of 19 billion tokens. The trained context length is 2048.
46
+
47
+
48
+ #### Training Hyperparameters
49
+
50
+ - **Training regime:** bf16-mixed. <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
51
+ - **Learning rate:** Cosine schedule from 5e-4 to 5e-5.
52
+
53
+ #### Speeds
54
+
55
+ The training run lasted for approximately 43 hours on one PC with 1x RTX 4090.
56
+
57
+ ## Evaluation
58
+
59
+ | Tasks |Version|Filter|n-shot| Metric | | Value | |Stderr|
60
+ |--------------|------:|------|-----:|----------|---|------:|---|-----:|
61
+ |arc_easy | 1|none | 0|acc |↑ | 0.3969|± |0.0100|
62
+ | | |none | 0|acc_norm |↑ | 0.3628|± |0.0099|
63
+ |lambada_openai| 1|none | 0|acc |↑ | 0.2626|± |0.0061|
64
+ | | |none | 0|perplexity|↓ |71.1943|± |2.8730|
65
+ |piqa | 1|none | 0|acc |↑ | 0.5871|± |0.0115|
66
+ | | |none | 0|acc_norm |↑ | 0.5843|± |0.0115|
67
+ |sciq | 1|none | 0|acc |↑ | 0.6940|± |0.0146|
68
+ | | |none | 0|acc_norm |↑ | 0.5970|± |0.0155|
69
+
70
+
71
+ ## Environmental Impact
72
+
73
+ <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
74
+
75
+ - **Hardware Type:** RTX 4090 x 1
76
+ - **Hours used:** 44
77
+ - **Carbon Emitted:** 6.6 kg of CO2.
78
+
79
+