Spaces:
Build error
Build error
File size: 6,773 Bytes
9e36d64 1b92067 b2dcfc6 9e36d64 1b92067 9e36d64 1b92067 b2dcfc6 1b92067 b2dcfc6 1b92067 b2dcfc6 1b92067 b2dcfc6 1b92067 b2dcfc6 1b92067 b2dcfc6 1b92067 e094451 b2dcfc6 e094451 b2dcfc6 e094451 b2dcfc6 e094451 b2dcfc6 e094451 b2dcfc6 e094451 b2dcfc6 1b92067 b2dcfc6 1b92067 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 |
---
title: VendiScore
datasets:
-
tags:
- evaluate
- metric
description: "The Vendi Score is a metric for evaluating diversity in machine learning. See the project's README at https://github.com/vertaix/Vendi-Score for more information."
sdk: gradio
sdk_version: 3.0.2
app_file: app.py
pinned: false
---
# Metric Card for VendiScore
The Vendi Score (VS) is a metric for evaluating diversity in machine learning.
The input to metric is a collection of samples and a pairwise similarity function, and the output is a number, which can be interpreted as the effective number of unique elements in the sample.
See the project's README at https://github.com/vertaix/Vendi-Score for more information.
## Metric Description
The Vendi Score (VS) is a metric for evaluating diversity in machine learning.
The input to metric is a collection of samples and a pairwise similarity function, and the output is a number, which can be interpreted as the effective number of unique elements in the sample.
Specifically, given a positive semi-definite matrix $K \in \mathbb{R}^{n \times n}$ of similarity scores, the score is defined as:
$$\mathrm{VS}(K) = \exp(-\mathrm{tr}(K/n \log K/n)) = \exp(-\sum_{i=1}^n \lambda_i \log \lambda_i),$$
where $\lambda_i$ are the eigenvalues of $K/n$ and $0 \log 0 = 0$.
That is, the Vendi Score is equal to the exponential of the von Neumann entropy of $K/n$, or the Shannon entropy of the eigenvalues, which is also known as the effective rank.
## How to Use
The Vendi Score is available as a Python package or in HuggingFace `evaluate`.
To use the Python package, see the instructions at https://github.com/vertaix/Vendi-Score.
To use the `evaluate` module, pass a list of samples and a similarity function or a string identifying a predefined class of similarity functions (see below).
```
>>> vendiscore = evaluate.load("danf0/vendiscore")
>>> samples = ["Look, Jane.",
"See Spot.",
"See Spot run.",
"Run, Spot, run.",
"Jane sees Spot run."]
>>> results = vendiscore.compute(samples, k="ngram_overlap", ns=[1, 2])
>>> print(results)
{'VS': 3.90657...}
```
### Inputs
- **samples**: an iterable containing $n$ samples to score; an n x n similarity
matrix K, or an n x d feature matrix X.
- **k**: a pairwise similarity function, or a string identifying a predefined
similarity function. If k is a pairwise similarity function, it should
be symmetric and k(x, x) = 1.
Options: ngram_overlap, text_embeddings, pixels, image_embeddings.
- **score_K**: if true, samples is an n x n similarity matrix K.
- **score_X**: if true, samples is an n x d feature matrix X.
- **score_dual**: if true, samples is an n x d feature matrix X and we will
compute the diversity score using the covariance matrix X @ X.T.
- **normalize**: if true, normalize the similarity scores.
- **model (optional)**: if k is "text_embeddings", a model mapping sentences to
embeddings (output should be an object with an attribute called
`pooler_output` or `last_hidden_state`). If k is "image_embeddings", a
model mapping images to embeddings.
- **tokenizer (optional)**: if k is "text_embeddings" or "ngram_overlap", a
tokenizer mapping strings to lists.
- **transform (optional)**: if k is "image_embeddings", a torchvision transform
to apply to the samples.
- **model_path (optional)**: if k is "text_embeddings", the name of a model on
the HuggingFace hub.
- **ns (optional)**: if k is "ngram_overlap", the values of n to calculate.
- **batch_size (optional)**: batch size to use if k is "text_embedding" or
"image_embedding".
- **device (optional)**: a string (e.g. "cuda", "cpu") or torch.device
identifying the device to use if k is "text_embedding"
or "image_embedding".
### Output Values
The output is a dictionary with one key, "VS".
Given n samples, the value of the Vendi Score ranges between 1 and n, with higher numbers indicating that the sample is more diverse.
### Examples
```
>>> import numpy as np
>>> vendiscore = evaluate.load("danf0/vendiscore")
>>> samples = [0, 0, 10, 10, 20, 20]
>>> k = lambda a, b: np.exp(-np.abs(a - b))
>>> vendiscore.compute(samples, k)
2.9999
```
If you already have precomputed a similarity matrix:
```
>>> K = np.array([[1.0, 0.9, 0.0],
[0.9, 1.0, 0.0],
[0.0, 0.0, 1.0]])
>>> vendiscore.compute(K, score_K=True)
2.1573
```
If your similarity function is a dot product between `n` normalized
`d`-dimensional embeddings `X`, and `d` < `n`, it is faster
to compute the Vendi Score using the covariance matrix, `X @ X.T`.
(If the rows of `X` are not normalized, set `normalize = True`.)
```
>>> X = np.array([[100, 0], [99, 1], [1, 99], [0, 100])
>>> vendiscore.compute(X, score_dual=True, normalize=True)
1.9989...
```
Image similarity can be calculated using inner products between pixel vectors or between embeddings from a neural network.
The default embeddings are from the pool-2048 layer of the torchvision version of the Inception v3 model; other embedding functions can be passed to the `model` argument.
```
>>> from torchvision import datasets
>>> mnist = datasets.MNIST("data/mnist", train=False, download=True)
>>> digits = [[x for x, y in mnist if y == c] for c in range(10)]
>>> pixel_vs = [vendiscore.compute(imgs, k="pixels") for imgs in digits]
>>> inception_vs = [vendiscore.compute(imgs, k="image_embeddings", batch_size=64, device="cuda") for imgs in digits]
>>> for y, (pvs, ivs) in enumerate(zip(pixel_vs, inception_vs)): print(f"{y}\t{pvs:.02f}\t{ivs:02f}")
0 7.68 3.45
1 5.31 3.50
2 12.18 3.62
3 9.97 2.97
4 11.10 3.75
5 13.51 3.16
6 9.06 3.63
7 9.58 4.07
8 9.69 3.74
9 8.56 3.43
```
Text similarity can be calculated using n-gram overlap or using inner products between embeddings from a neural network.
```
>>> sents = ["Look, Jane.",
"See Spot.",
"See Spot run.",
"Run, Spot, run.",
"Jane sees Spot run."]
>>> ngram_vs = vendiscore.compute(sents, k="ngram_overlap", ns=[1, 2])
>>> bert_vs = vendiscore.compute(sents, k="text_embeddings", model_path="bert-base-uncased")
>>> simcse_vs = vendiscore.compute(sents, k="text_embeddings", model_path="princeton-nlp/unsup-simcse-bert-base-uncased")
>>> print(f"N-grams: {ngram_vs:.02f}, BERT: {bert_vs:.02f}, SimCSE: {simcse_vs:.02f})
N-grams: 3.91, BERT: 1.21, SimCSE: 2.81
```
## Limitations and Bias
The Vendi Score depends on the choice of similarity function. Care should be taken to select a similarity function that reflects the features that are relevant for defining diversity in a given application.
## Citation
|