Spaces:
Build error
Build error
File size: 5,804 Bytes
9e36d64 01c0635 1b92067 b2dcfc6 9e36d64 1b92067 9e36d64 1b92067 b2dcfc6 1b92067 b2dcfc6 4af5648 1b92067 b2dcfc6 fe55490 d225144 b2dcfc6 167906f b2dcfc6 1b92067 bb4ea0c b2dcfc6 c242702 b2dcfc6 c242702 b2dcfc6 c242702 b2dcfc6 c242702 b2dcfc6 1b92067 b2dcfc6 1b92067 e094451 ba0c789 e094451 85ed75d 3258b73 b2dcfc6 e094451 ba0c789 e094451 85ed75d 3258b73 e094451 b2dcfc6 e094451 ba0c789 3258b73 85ed75d 3258b73 b2dcfc6 e094451 ba0c789 3258b73 b2dcfc6 1b92067 b2dcfc6 1b92067 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 |
---
title: vendiscore
datasets:
-
tags:
- evaluate
- metric
description: "The Vendi Score is a metric for evaluating diversity in machine learning. See the project's README at https://github.com/vertaix/Vendi-Score for more information."
sdk: gradio
sdk_version: 3.0.2
app_file: app.py
pinned: false
---
# Metric Card for VendiScore
The Vendi Score (VS) is a metric for evaluating diversity in machine learning.
The input to metric is a collection of samples and a pairwise similarity function, and the output is a number, which can be interpreted as the effective number of unique elements in the sample.
See the project's README at https://github.com/vertaix/Vendi-Score for more information.
## Metric Description
The Vendi Score (VS) is a metric for evaluating diversity in machine learning.
The input to metric is a collection of samples and a pairwise similarity function, and the output is a number, which can be interpreted as the effective number of unique elements in the sample.
Specifically, given an `n x n` positive semi-definite matrix `K` of similarity scores, the score is defined as:
```
VS(K) = exp(tr(K/n @ log(K/n))) = exp(-sum_i lambda_i log lambda_i),
```
where `lambda_i` are the eigenvalues of `K/n` and `0 log 0 = 0`.
That is, the Vendi Score is equal to the exponential of the von Neumann entropy of `K/n`, or the Shannon entropy of the eigenvalues, which is also known as the effective rank.
## How to Use
The Vendi Score is available as a Python package or in HuggingFace `evaluate`.
To use the Python package, see the instructions at https://github.com/vertaix/Vendi-Score.
The `evaluate` module supports text, numbers, and precomputed similarity scores or feature embeddings.
Please use the Python package for more support for images and other datatypes.
To use the `evaluate` module, first install the requirements:
```
pip install evaluate
pip install vendi_score[all]
```
To calculate the score, pass a list of samples and a similarity function or a string identifying a predefined class of similarity functions (see below).
```
>>> vendiscore = evaluate.load("Vertaix/vendiscore", "text")
>>> sents = ["Look, Jane.", "See Spot.", "See Spot run.", "Run, Spot, run.", "Jane sees Spot run."]
>>> results = vendiscore.compute(samples=sents, k="ngram_overlap", ns=[1, 2])
>>> print(results)
{'VS': 3.90657...}
```
### Inputs
- **samples**: an iterable containing n samples to score; an n x n similarity
matrix K, or an n x d feature matrix X.
- **k**: a pairwise similarity function, or a string identifying a predefined
similarity function. If k is a pairwise similarity function, it should
be symmetric and k(x, x) = 1.
Options: ngram_overlap, text_embeddings.
- **score_K**: if true, samples is an n x n similarity matrix K.
- **score_X**: if true, samples is an n x d feature matrix X.
- **score_dual**: if true, samples is an n x d feature matrix X and we will
compute the diversity score using the covariance matrix X @ X.T.
- **normalize**: if true, normalize the similarity scores.
- **model (optional)**: if k is "text_embeddings", a model mapping sentences to
embeddings (output should be an object with an attribute called
`pooler_output` or `last_hidden_state`).
- **tokenizer (optional)**: if k is "text_embeddings" or "ngram_overlap", a
tokenizer mapping strings to lists.
- **model_path (optional)**: if k is "text_embeddings", the name of a model on
the HuggingFace hub.
- **ns (optional)**: if k is "ngram_overlap", the values of n to calculate.
- **batch_size (optional)**: batch size to use if k is "text_embedding".
- **device (optional)**: a string (e.g. "cuda", "cpu") or torch.device
identifying the device to use if k is "text_embedding".
### Output Values
The output is a dictionary with one key, "VS".
Given n samples, the value of the Vendi Score ranges between 1 and n, with higher numbers indicating that the sample is more diverse.
### Examples
```
>>> import numpy as np
>>> vendiscore = evaluate.load("Vertaix/vendiscore", "int")
>>> samples = [0, 0, 10, 10, 20, 20]
>>> k = lambda a, b: np.exp(-np.abs(a - b))
>>> vendiscore.compute(samples=samples, k=k)
{'VS': 2.9999...}
```
If you already have precomputed a similarity matrix:
```
>>> vendiscore = evaluate.load("Vertaix/vendiscore", "K")
>>> K = np.array([[1.0, 0.9, 0.0],
[0.9, 1.0, 0.0],
[0.0, 0.0, 1.0]])
>>> vendiscore.compute(samples=K, score_K=True)
{'VS': 2.1573...}
```
If your similarity function is a dot product between `n` normalized
`d`-dimensional embeddings `X`, and `d` < `n`, it is faster
to compute the Vendi Score using the covariance matrix, `X @ X.T`.
(If the rows of `X` are not normalized, set `normalize = True`.)
```
>>> vendiscore = evaluate.load("Vertaix/vendiscore", "X")
>>> X = np.array([[100, 0], [99, 1], [1, 99], [0, 100]])
>>> vendiscore.compute(samples=X, score_dual=True, normalize=True)
{'VS': 1.99989...}
```
Text similarity can be calculated using n-gram overlap or using inner products between embeddings from a neural network.
```
>>> vendiscore = evaluate.load("Vertaix/vendiscore", "text")
>>> sents = ["Look, Jane.", "See Spot.", "See Spot run.", "Run, Spot, run.", "Jane sees Spot run."]
>>> ngram_vs = vendiscore.compute(samples=sents, k="ngram_overlap", ns=[1, 2])["VS"]
>>> bert_vs = vendiscore.compute(samples=sents, k="text_embeddings", model_path="bert-base-uncased")["VS"]
>>> print(f"N-grams: {ngram_vs:.02f}, BERT: {bert_vs:.02f}")
N-grams: 3.91, BERT: 1.21
```
## Limitations and Bias
The Vendi Score depends on the choice of similarity function. Care should be taken to select a similarity function that reflects the features that are relevant for defining diversity in a given application.
## Citation
|