File size: 5,804 Bytes
9e36d64
01c0635
1b92067
 
 
 
 
b2dcfc6
9e36d64
1b92067
9e36d64
 
 
 
1b92067
 
b2dcfc6
 
 
1b92067
 
b2dcfc6
 
4af5648
 
 
 
 
 
1b92067
 
b2dcfc6
 
fe55490
 
 
d225144
 
 
 
 
 
b2dcfc6
167906f
 
 
b2dcfc6
 
 
1b92067
 
bb4ea0c
b2dcfc6
 
 
 
c242702
b2dcfc6
 
 
 
 
 
 
c242702
b2dcfc6
 
 
 
 
c242702
b2dcfc6
c242702
b2dcfc6
1b92067
 
 
b2dcfc6
 
 
 
1b92067
e094451
 
ba0c789
e094451
 
85ed75d
3258b73
b2dcfc6
 
 
e094451
ba0c789
e094451
 
 
85ed75d
3258b73
e094451
b2dcfc6
e094451
 
 
 
 
ba0c789
3258b73
85ed75d
3258b73
b2dcfc6
 
e094451
 
ba0c789
3258b73
 
 
 
 
b2dcfc6
1b92067
 
b2dcfc6
1b92067
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
---
title: vendiscore
datasets:
-  
tags:
- evaluate
- metric
description: "The Vendi Score is a metric for evaluating diversity in machine learning. See the project's README at https://github.com/vertaix/Vendi-Score for more information."
sdk: gradio
sdk_version: 3.0.2
app_file: app.py
pinned: false
---

# Metric Card for VendiScore

The Vendi Score (VS) is a metric for evaluating diversity in machine learning.
The input to metric is a collection of samples and a pairwise similarity function, and the output is a number, which can be interpreted as the effective number of unique elements in the sample.
See the project's README at https://github.com/vertaix/Vendi-Score for more information.

## Metric Description
The Vendi Score (VS) is a metric for evaluating diversity in machine learning.
The input to metric is a collection of samples and a pairwise similarity function, and the output is a number, which can be interpreted as the effective number of unique elements in the sample.
Specifically, given an `n x n` positive semi-definite matrix `K` of similarity scores, the score is defined as:
```
VS(K) = exp(tr(K/n @ log(K/n))) = exp(-sum_i lambda_i log lambda_i),
```
where `lambda_i` are the eigenvalues of `K/n` and `0 log 0 = 0`.
That is, the Vendi Score is equal to the exponential of the von Neumann entropy of `K/n`, or the Shannon entropy of the eigenvalues, which is also known as the effective rank.

## How to Use
The Vendi Score is available as a Python package or in HuggingFace `evaluate`.
To use the Python package, see the instructions at https://github.com/vertaix/Vendi-Score.
The `evaluate` module supports text, numbers, and precomputed similarity scores or feature embeddings.
Please use the Python package for more support for images and other datatypes.

To use the `evaluate` module, first install the requirements:
```
pip install evaluate
pip install vendi_score[all]
```
To calculate the score, pass a list of samples and a similarity function or a string identifying a predefined class of similarity functions (see below).
```
>>> vendiscore = evaluate.load("Vertaix/vendiscore", "text")
>>> sents = ["Look, Jane.", "See Spot.", "See Spot run.", "Run, Spot, run.", "Jane sees Spot run."]
>>> results = vendiscore.compute(samples=sents, k="ngram_overlap", ns=[1, 2])
>>> print(results)
{'VS': 3.90657...}
```

### Inputs
- **samples**: an iterable containing n samples to score; an n x n similarity
       matrix K, or an n x d feature matrix X.
- **k**: a pairwise similarity function, or a string identifying a predefined 
       similarity function. If k is a pairwise similarity function, it should
       be symmetric and k(x, x) = 1.
       Options: ngram_overlap, text_embeddings.
- **score_K**: if true, samples is an n x n similarity matrix K.
- **score_X**: if true, samples is an n x d feature matrix X.
- **score_dual**: if true,  samples is an n x d feature matrix X and we will
       compute the diversity score using the covariance matrix X @ X.T.
- **normalize**: if true, normalize the similarity scores.
- **model (optional)**: if k is "text_embeddings", a model mapping sentences to
       embeddings (output should be an object with an attribute called
       `pooler_output` or `last_hidden_state`).
- **tokenizer (optional)**: if k is "text_embeddings" or "ngram_overlap", a
       tokenizer mapping strings to lists.
- **model_path (optional)**: if k is "text_embeddings", the name of a model on
       the HuggingFace hub.
- **ns (optional)**: if k is "ngram_overlap", the values of n to calculate.
- **batch_size (optional)**: batch size to use if k is "text_embedding".
- **device (optional)**: a string (e.g. "cuda", "cpu") or torch.device 
       identifying the device to use if k is "text_embedding".


### Output Values

The output is a dictionary with one key, "VS".
Given n samples, the value of the Vendi Score ranges between 1 and n, with higher numbers indicating that the sample is more diverse.

### Examples

```
>>> import numpy as np
>>> vendiscore = evaluate.load("Vertaix/vendiscore", "int")
>>> samples = [0, 0, 10, 10, 20, 20]
>>> k = lambda a, b: np.exp(-np.abs(a - b))
>>> vendiscore.compute(samples=samples, k=k)
{'VS': 2.9999...}
```

If you already have precomputed a similarity matrix:
```
>>> vendiscore = evaluate.load("Vertaix/vendiscore", "K")
>>> K = np.array([[1.0, 0.9, 0.0],
                  [0.9, 1.0, 0.0],
                  [0.0, 0.0, 1.0]])
>>> vendiscore.compute(samples=K, score_K=True)
{'VS': 2.1573...}
```

If your similarity function is a dot product between `n` normalized
`d`-dimensional embeddings `X`, and `d` < `n`, it is faster
to compute the Vendi Score using the covariance matrix, `X @ X.T`.
(If the rows of `X` are not normalized, set `normalize = True`.)
```
>>> vendiscore = evaluate.load("Vertaix/vendiscore", "X")
>>> X = np.array([[100, 0], [99, 1], [1, 99], [0, 100]])
>>> vendiscore.compute(samples=X, score_dual=True, normalize=True)
{'VS': 1.99989...}
```

Text similarity can be calculated using n-gram overlap or using inner products between embeddings from a neural network.
```
>>> vendiscore = evaluate.load("Vertaix/vendiscore", "text")
>>> sents = ["Look, Jane.", "See Spot.", "See Spot run.", "Run, Spot, run.", "Jane sees Spot run."]
>>> ngram_vs = vendiscore.compute(samples=sents, k="ngram_overlap", ns=[1, 2])["VS"]
>>> bert_vs = vendiscore.compute(samples=sents, k="text_embeddings", model_path="bert-base-uncased")["VS"]
>>> print(f"N-grams: {ngram_vs:.02f}, BERT: {bert_vs:.02f}")
N-grams: 3.91, BERT: 1.21
```

## Limitations and Bias
The Vendi Score depends on the choice of similarity function. Care should be taken to select a similarity function that reflects the features that are relevant for defining diversity in a given application.

## Citation