Spaces:
Running
Running
rdiehlmartinez
commited on
Update README.md
Browse files
README.md
CHANGED
@@ -3,14 +3,14 @@ title: Perplexity
|
|
3 |
emoji: 🤗
|
4 |
colorFrom: blue
|
5 |
colorTo: red
|
6 |
-
sdk:
|
7 |
-
sdk_version: 3.19.1
|
8 |
-
app_file: app.py
|
9 |
pinned: false
|
10 |
tags:
|
11 |
- evaluate
|
12 |
- metric
|
13 |
description: >-
|
|
|
|
|
14 |
Perplexity (PPL) is one of the most common metrics for evaluating language
|
15 |
models. It is defined as the exponentiated average negative log-likelihood of
|
16 |
a sequence, calculated with exponent base `e`.
|
@@ -21,8 +21,10 @@ description: >-
|
|
21 |
|
22 |
# Metric Card for Perplexity
|
23 |
|
24 |
-
|
25 |
-
|
|
|
|
|
26 |
|
27 |
As a metric, it can be used to evaluate how well the model has learned the distribution of the text it was trained on.
|
28 |
|
@@ -39,7 +41,7 @@ The metric takes a list of text as input, as well as the name of the model used
|
|
39 |
|
40 |
```python
|
41 |
from evaluate import load
|
42 |
-
perplexity = load("perplexity"
|
43 |
results = perplexity.compute(predictions=predictions, model_id='gpt2')
|
44 |
```
|
45 |
|
@@ -50,6 +52,7 @@ results = perplexity.compute(predictions=predictions, model_id='gpt2')
|
|
50 |
- **batch_size** (int): the batch size to run texts through the model. Defaults to 16.
|
51 |
- **add_start_token** (bool): whether to add the start token to the texts, so the perplexity can include the probability of the first word. Defaults to True.
|
52 |
- **device** (str): device to run on, defaults to `cuda` when available
|
|
|
53 |
|
54 |
### Output Values
|
55 |
This metric outputs a dictionary with the perplexity scores for the text input in the list, and the average perplexity.
|
|
|
3 |
emoji: 🤗
|
4 |
colorFrom: blue
|
5 |
colorTo: red
|
6 |
+
sdk: static
|
|
|
|
|
7 |
pinned: false
|
8 |
tags:
|
9 |
- evaluate
|
10 |
- metric
|
11 |
description: >-
|
12 |
+
This is a fork of the huggingface evaluate library's implementation of perplexity.
|
13 |
+
|
14 |
Perplexity (PPL) is one of the most common metrics for evaluating language
|
15 |
models. It is defined as the exponentiated average negative log-likelihood of
|
16 |
a sequence, calculated with exponent base `e`.
|
|
|
21 |
|
22 |
# Metric Card for Perplexity
|
23 |
|
24 |
+
> This is a fork of the huggingface evaluate library's implementation of perplexity.
|
25 |
+
|
26 |
+
|
27 |
+
## Metric DescriptionGiven a model and an input text sequence, perplexity measures how likely the model is to generate the input text sequence.
|
28 |
|
29 |
As a metric, it can be used to evaluate how well the model has learned the distribution of the text it was trained on.
|
30 |
|
|
|
41 |
|
42 |
```python
|
43 |
from evaluate import load
|
44 |
+
perplexity = load("pico-lm/perplexity")
|
45 |
results = perplexity.compute(predictions=predictions, model_id='gpt2')
|
46 |
```
|
47 |
|
|
|
52 |
- **batch_size** (int): the batch size to run texts through the model. Defaults to 16.
|
53 |
- **add_start_token** (bool): whether to add the start token to the texts, so the perplexity can include the probability of the first word. Defaults to True.
|
54 |
- **device** (str): device to run on, defaults to `cuda` when available
|
55 |
+
- **trust_remote_code** (bool): enables running metric on custom models
|
56 |
|
57 |
### Output Values
|
58 |
This metric outputs a dictionary with the perplexity scores for the text input in the list, and the average perplexity.
|