|
--- |
|
license: bsd-3-clause |
|
datasets: |
|
- EleutherAI/pile |
|
language: |
|
- en |
|
library_name: transformers |
|
--- |
|
# Lovelace Medium Alpha1 |
|
|
|
551M parameter Transformer-XL style model trained on 100B tokens of The Pile! |
|
|
|
This model was originally trained for the "Direct Prefrence Heads" paper, but will also be used as the basis for much of my future research. |
|
All code used to train and run these models is available here: https://github.com/Avelina9X/direct-preference-heads and our paper is available here: https://arxiv.org/abs/2405.20053 |
|
|
|
## Model Architecture |
|
| Name | Value | |
|
| --- | --- | |
|
| Total Parameters | 551M | |
|
| Non-Embedding Parameters | 512M | |
|
| Vocab Size | 50272 | |
|
| \\(d_\text{vocab}\\) | 768 | |
|
| \\(d_\text{model}\\) | 1536 | |
|
| \\(n_\text{layers}\\) | 18 | |
|
| FFN Activation | SwiGLU | |
|
| \\(d_\text{ffn}\\) | 4096 | |
|
| Attention Type | Full | |
|
| Positon Embedding | Reversed RoPE with ABF | |
|
| \\(n_\text{heads}\\) | 24 | |
|
| \\(d_\text{key}\\) | 64 | |
|
| Trained Context | 2048 | |
|
| Trained Memory | 2048 | |
|
| Max Inference Context | 4096 | |
|
|
|
## Model Collection |
|
| Model | Link | |
|
| --- | --- | |
|
| Pre-Trained Model | [lovelace-medium-alpha1](https://huggingface.co/Avelina/lovelace-medium-alpha1) | |
|
| Fine-Tuned Model | [lovelace-medium-alpha1-sft](https://huggingface.co/Avelina/lovelace-medium-alpha1-sft) | |
|
| DPH Aligned Model | [lovelace-medium-alpha1-dph](https://huggingface.co/Avelina/lovelace-medium-alpha1-dph) | |