|
--- |
|
license: apache-2.0 |
|
datasets: |
|
- wikitext |
|
- ptb_text_only |
|
language: |
|
- en |
|
metrics: |
|
- perplexity |
|
pipeline_tag: text-generation |
|
model-index: |
|
- name: distilgpt2 |
|
results: |
|
- task: |
|
type: text-generation |
|
dataset: |
|
name: penn_treebank |
|
type: ptb_text_only |
|
metrics: |
|
- name: perlexity@distilgpt2:BASELINE |
|
type: dmx-perlexity |
|
value: 63.45857238769531 |
|
- name: perlexity@distilgpt2:BASIC |
|
type: dmx-perlexity |
|
value: 64.36720275878906 |
|
- task: |
|
type: text-generation |
|
dataset: |
|
name: wikitext2 |
|
type: wikitext-2-raw-v1 |
|
metrics: |
|
- name: perlexity@distilgpt2:BASELINE |
|
type: dmx-perlexity |
|
value: 46.05925369262695 |
|
- name: perlexity@distilgpt2:BASIC |
|
type: dmx-perlexity |
|
value: 46.570838928222656 |
|
--- |
|
This is a d-Matrix functional reference of the GPT2 model family, with the following *revisions*: |
|
- [`distilgpt2`](https://huggingface.co/distilbert/distilgpt2) |
|
- [`gpt2`](https://huggingface.co/openai-community/gpt2) |
|
- [`gpt2-medium`](https://huggingface.co/openai-community/gpt2-medium) |
|
- [`gpt2-large`](https://huggingface.co/openai-community/gpt2-large) |
|
- [`gpt2-xl`](https://huggingface.co/openai-community/gpt2-xl) |
|
|
|
The reference provides the following functional *configurations*: |
|
Configuration | Explanation |
|
:-- | :-- |
|
**`BASELINE`** | a reference functionally equivalent to the original model |
|
**`BASIC`** | all linear algebraic operands quantized to `BFP16-64`, and all other operations transformed to approximated kernel simulations |
|
|
|
|
|
### Usage |
|
|
|
Install d-Matrix [Dmx_Compressor](https://github.com/d-matrix-ai/dmx-compressor) first. |
|
|
|
```sh |
|
pip install dmx_compressor |
|
``` |
|
|
|
The following is an example model and its evaluation. |
|
|
|
```python |
|
from dmx.compressor.dmx import pipeline |
|
|
|
pipe = pipeline( |
|
task="text-generation", |
|
model="d-matrix/gpt2", |
|
revision="gpt2-xl", # see above for other variants |
|
dmx_config="BASELINE", # see above for other variants |
|
) |
|
|
|
results = pipe.evaluate( |
|
metric="d-matrix/dmx_perplexity", |
|
dataset="wikitext", |
|
dataset_version="wikitext-2-raw-v1", |
|
) |
|
``` |
|
|
|
### Evaluation results |
|
|
|
- `perplexity` on `penn_treebank` |
|
Revision \ Configuration | **`BASELINE`** | **`BASIC`** |
|
:-- | --: | --: |
|
`distilgpt2` | 63.46 | 64.13 |
|
`gpt2` | 35.77 | 35.93 |
|
`gpt2-medium` | 27.06 | 27.10 |
|
`gpt2-large` | 23.03 | 23.04 |
|
`gpt2-xl` | 21.01 | 21.02 |
|
|
|
- `perplexity` on `wikitext2` |
|
Revision \ Configuration | **`BASELINE`** | **`BASIC`** |
|
:-- | --: | --: |
|
`distilgpt2` | 46.06 | 46.44 |
|
`gpt2` | 29.94 | 30.08 |
|
`gpt2-medium` | 21.71 | 21.73 |
|
`gpt2-large` | 19.42| 19.43 |
|
`gpt2-xl` | 17.40| 17.40 |
|
|
|
- `perplexity` on `wikitext103` |
|
Revision \ Configuration | **`BASELINE`** | **`BASIC`** |
|
:-- | --: | --: |
|
`distilgpt2` | 46.06 | 46.44 |
|
`gpt2` | 29.94 |30.08 |
|
`gpt2-medium` | 21.71 | 21.73 |
|
`gpt2-large` | 19.43 | 19.43 |
|
`gpt2-xl` | 17.40 | 17.40 |