ZhangRC
/

Llama-124M-experimental-pretrain

Model card Files Files and versions Community

Llama-124M-experimental-pretrain / README.md

ZhangRC's picture

Update README.md

4bd265d verified 10 days ago

|

history blame contribute delete

No virus

3.25 kB

	---
	license: apache-2.0
	---

	# Llama-124M-experimental-pretrain

	<!-- Provide a quick summary of what the model is/does. -->

	This is an experimental pretraining run done solely on a home PC.

	### Model Description

	<!-- Provide a longer summary of what this model is. -->

	- Training code adapted from https://github.com/Lightning-AI/litgpt .
	- Cost: Around 20 RMB ($3).
	- Model architecture: Transformer decoder with gated SiLU MLP, RMS Norm, RoPE positional embedding, and grouped query attention.
	- Language(s) (NLP): Mainly English.
	- License: apache-2.0
	- Parameter count: 124M (0.124B)

	## Uses
	After downloading this repository, run
	```
	litgpt generate "./Llama-124M-experimental-pretrain --prompt "What is GPT-4? GPT-4 is"
	```
	The output will look something like:
	```
	What is GPT-4? GPT-4 is an extremely powerful, highly immersive, and powerful, in the sense that it is able to be used to help you deal with various technical issues, while still providing an easy to use experience that will help you get better and faster results. It
	Time for inference 1: 0.42 sec total, 119.97 tokens/sec
	Memory used: 0.27 GB
	```

	## Bias, Risks, and Limitations

	<!-- This section is meant to convey both technical and sociotechnical limitations. -->
	This model is too small to avoid hallucinations, and there is no code in the training dataset. Dont expect this model to provide any sort of assistance. Just for fun.

	## Training Details

	### Training Data

	<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->

	This model is trained on https://huggingface.co/datasets/EleutherAI/rpj-v2-sample for two epochs, with a total of 19 billion tokens. The trained context length is 2048.


	#### Training Hyperparameters

	- Training regime: bf16-mixed. <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
	- Learning rate: Cosine schedule from 5e-4 to 5e-5.

	#### Speeds

	The training run lasted for approximately 43 hours on one PC with 1x RTX 4090.

	## Evaluation

	\| Tasks \|Version\|Filter\|n-shot\| Metric \| \| Value \| \|Stderr\|
	\|--------------\|------:\|------\|-----:\|----------\|---\|------:\|---\|-----:\|
	\|arc_easy \| 1\|none \| 0\|acc \|↑ \| 0.3969\|± \|0.0100\|
	\| \| \|none \| 0\|acc_norm \|↑ \| 0.3628\|± \|0.0099\|
	\|lambada_openai\| 1\|none \| 0\|acc \|↑ \| 0.2626\|± \|0.0061\|
	\| \| \|none \| 0\|perplexity\|↓ \|71.1943\|± \|2.8730\|
	\|piqa \| 1\|none \| 0\|acc \|↑ \| 0.5871\|± \|0.0115\|
	\| \| \|none \| 0\|acc_norm \|↑ \| 0.5843\|± \|0.0115\|
	\|sciq \| 1\|none \| 0\|acc \|↑ \| 0.6940\|± \|0.0146\|
	\| \| \|none \| 0\|acc_norm \|↑ \| 0.5970\|± \|0.0155\|


	## Environmental Impact

	<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->

	- Hardware Type: RTX 4090 x 1
	- Hours used: 44
	- Carbon Emitted: 6.6 kg of CO2.