Spaces:
Sleeping
Sleeping
File size: 3,018 Bytes
4fa9f3e 1a74fec 4fa9f3e 219cce4 4fa9f3e b97e015 219cce4 4fa9f3e b97e015 74397ee b97e015 1a74fec b97e015 76e1a38 74397ee b97e015 1a74fec b97e015 1a74fec 76e1a38 1a74fec 76e1a38 ec2acd0 76e1a38 096defa 74397ee 096defa 76e1a38 74397ee 76e1a38 74397ee 76e1a38 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 |
---
title: DmxPerplexity
emoji: 🌖
colorFrom: purple
colorTo: pink
sdk: gradio
sdk_version: 4.41.0
app_file: app.py
pinned: false
license: apache-2.0
tags:
- evaluate
- metric
description: >-
Perplexity metric implemented by d-Matrix. Perplexity (PPL) is one of the most
common metrics for evaluating language models. It is defined as the
exponentiated average negative log-likelihood of a sequence, calculated with
exponent base `e`. Note that this metric is intended for Causual Language
Models, the perplexity calculation is only correct if model uses Cross Entropy
Loss. For more information, see
https://huggingface.co/docs/transformers/perplexity
---
# Metric Card for Perplexity
## Metric Description
Perplexity metric implemented by d-Matrix.
Perplexity (PPL) is one of the most common metrics for evaluating language models.
It is defined as the exponentiated average negative log-likelihood of a sequence, calculated with exponent base `e`.
Note that this metric is intended for Causual Language Models, the perplexity calculation is only correct if model uses Cross Entropy Loss.
For more information, see https://huggingface.co/docs/transformers/perplexity
## How to Use
At minimum, this metric requires the model and references as inputs.
```python
>>> import evaluate
>>> perplexity = evaluate.load("d-matrix/dmx_perplexity", module_type="metric")
>>> input_texts = ["lorem ipsum", "Happy Birthday!", "Bienvenue"]
>>> results = perplexity.compute(model='distilgpt2',references=input_texts)
>>> print(results)
{'loss': 4.993086338043213, 'perplexity': 147.390625}
```
### Inputs
- **model** (`Union`[`str`,`AutoModelForCausalLM`]): model used for calculating Perplexity
- **references** (`list` of `str`): input text, each separate text snippet is one list entry.
- **device** (`str`): device to run on, defaults to 'cuda' when available.
- **max_length** (`int`): maximum sequence length, defaults to 2048.
### Output Values
- **loss** (`float`): the loss of the model predictions compared to the reference
- **perplexity**(`float`): measures the uncertainty of a model predicting text. Model performance is better when perplexity is lower.
Output Example(s):
```python
{'loss': 4.993086338043213, 'perplexity': 147.390625}
```
This metric outputs a dictionary, containing the loss and perplexity score.
### Examples
```python
>>> import evaluate
>>> from datasets import load_dataset
>>> from transformers import AutoTokenizer, AutoModelForCausalLM
>>> perplexity = evaluate.load("d-matrix/dmx_perplexity", module_type="metric")
>>> input_texts = load_dataset("wikitext", "wikitext-2-raw-v1", split="test")["text"][:10]
>>> model = AutoModelForCausalLM.from_pretrained("distilgpt2")
>>> results = perplexity.compute(model=model,references=input_texts)
>>> print(list(results.keys()))
['loss', 'perplexity']
>>> print(results['loss'])
3.9706921577453613
>>> print(results['perplexity'])
53.021217346191406
```
## Citation(s)
https://huggingface.co/docs/transformers/perplexity |