Avelina
/

lovelace-medium-alpha1

Text Generation

lsw_transformer

Inference Endpoints

Model card Files Files and versions Community

lovelace-medium-alpha1 / README.md

Avelina's picture

Update README.md

a893f57 verified 9 months ago

|

history blame contribute delete

1.57 kB

	---
	license: bsd-3-clause
	datasets:
	- EleutherAI/pile
	language:
	- en
	library_name: transformers
	---
	# Lovelace Medium Alpha1

	551M parameter Transformer-XL style model trained on 100B tokens of The Pile!

	This model was originally trained for the "Direct Prefrence Heads" paper, but will also be used as the basis for much of my future research.
	All code used to train and run these models is available here: https://github.com/Avelina9X/direct-preference-heads and our paper is available here: https://arxiv.org/abs/2405.20053

	## Model Architecture
	\| Name \| Value \|
	\| --- \| --- \|
	\| Total Parameters \| 551M \|
	\| Non-Embedding Parameters \| 512M \|
	\| Vocab Size \| 50272 \|
	\| \\(d_\text{vocab}\\) \| 768 \|
	\| \\(d_\text{model}\\) \| 1536 \|
	\| \\(n_\text{layers}\\) \| 18 \|
	\| FFN Activation \| SwiGLU \|
	\| \\(d_\text{ffn}\\) \| 4096 \|
	\| Attention Type \| Full \|
	\| Positon Embedding \| Reversed RoPE with ABF \|
	\| \\(n_\text{heads}\\) \| 24 \|
	\| \\(d_\text{key}\\) \| 64 \|
	\| Trained Context \| 2048 \|
	\| Trained Memory \| 2048 \|
	\| Max Inference Context \| 4096 \|

	## Model Collection
	\| Model \| Link \|
	\| --- \| --- \|
	\| Pre-Trained Model \| [lovelace-medium-alpha1](https://huggingface.co/Avelina/lovelace-medium-alpha1) \|
	\| Fine-Tuned Model \| [lovelace-medium-alpha1-sft](https://huggingface.co/Avelina/lovelace-medium-alpha1-sft) \|
	\| DPH Aligned Model \| [lovelace-medium-alpha1-dph](https://huggingface.co/Avelina/lovelace-medium-alpha1-dph) \|