--- title: DmxPerplexity emoji: 🌖 colorFrom: purple colorTo: pink sdk: gradio sdk_version: 4.41.0 app_file: app.py pinned: false license: apache-2.0 tags: - evaluate - metric description: >- Perplexity metric implemented by d-Matrix. Perplexity (PPL) is one of the most common metrics for evaluating language models. It is defined as the exponentiated average negative log-likelihood of a sequence, calculated with exponent base `e`. Note that this metric is intended for Causual Language Models, the perplexity calculation is only correct if model uses Cross Entropy Loss. For more information, see https://huggingface.co/docs/transformers/perplexity --- # Metric Card for Perplexity ## Metric Description Perplexity metric implemented by d-Matrix. Perplexity (PPL) is one of the most common metrics for evaluating language models. It is defined as the exponentiated average negative log-likelihood of a sequence, calculated with exponent base `e`. Note that this metric is intended for Causual Language Models, the perplexity calculation is only correct if model uses Cross Entropy Loss. For more information, see https://huggingface.co/docs/transformers/perplexity ## How to Use At minimum, this metric requires the model and references as inputs. ```python >>> import evaluate >>> perplexity = evaluate.load("d-matrix/dmx_perplexity", module_type="metric") >>> input_texts = ["lorem ipsum", "Happy Birthday!", "Bienvenue"] >>> results = perplexity.compute(model='distilgpt2',references=input_texts) >>> print(results) {'loss': 4.993086338043213, 'perplexity': 147.390625} ``` ### Inputs - **model** (`Union`[`str`,`AutoModelForCausalLM`]): model used for calculating Perplexity - **references** (`list` of `str`): input text, each separate text snippet is one list entry. - **device** (`str`): device to run on, defaults to 'cuda' when available. - **max_length** (`int`): maximum sequence length, defaults to 2048. ### Output Values - **loss** (`float`): the loss of the model predictions compared to the reference - **perplexity**(`float`): measures the uncertainty of a model predicting text. Model performance is better when perplexity is lower. Output Example(s): ```python {'loss': 4.993086338043213, 'perplexity': 147.390625} ``` This metric outputs a dictionary, containing the loss and perplexity score. ### Examples ```python >>> import evaluate >>> from datasets import load_dataset >>> from transformers import AutoTokenizer, AutoModelForCausalLM >>> perplexity = evaluate.load("d-matrix/dmx_perplexity", module_type="metric") >>> input_texts = load_dataset("wikitext", "wikitext-2-raw-v1", split="test")["text"][:10] >>> model = AutoModelForCausalLM.from_pretrained("distilgpt2") >>> results = perplexity.compute(model=model,references=input_texts) >>> print(list(results.keys())) ['loss', 'perplexity'] >>> print(results['loss']) 3.9706921577453613 >>> print(results['perplexity']) 53.021217346191406 ``` ## Citation(s) https://huggingface.co/docs/transformers/perplexity