Spaces:
Sleeping
Sleeping
add other metrics
Browse files- README.md +19 -18
- matching_series.py +59 -44
README.md
CHANGED
@@ -13,7 +13,7 @@ pinned: false
|
|
13 |
# Metric Card for matching_series
|
14 |
|
15 |
## Metric Description
|
16 |
-
Matching Series is a metric for evaluating time-series generation models. It is based on the idea of matching the generated time-series with the original time-series. The metric calculates the Mean Squared Error (
|
17 |
|
18 |
## How to Use
|
19 |
At minium, the metric requires the original time-series and the generated time-series as input. The metric can be used to evaluate the performance of time-series generation models.
|
@@ -28,7 +28,7 @@ At minium, the metric requires the original time-series and the generated time-s
|
|
28 |
>>> metric = evaluate.load("bowdbeg/matching_series")
|
29 |
>>> results = metric.compute(references=references, predictions=predictions, batch_size=1000)
|
30 |
>>> print(results)
|
31 |
-
{'
|
32 |
```
|
33 |
|
34 |
### Inputs
|
@@ -37,25 +37,26 @@ At minium, the metric requires the original time-series and the generated time-s
|
|
37 |
- **batch_size**: (int, optional): The batch size for computing the metric. This affects quadratically. Default is None.
|
38 |
- **cuc_n_calculation**: (int, optional): The number of samples to compute the coverage because sampling exists. Default is 3.
|
39 |
- **cuc_n_samples**: (list of int, optional): The number of samples to compute the coverage. Default is $[2^i \text{for} i \leq \log_2 n] + [n]$.
|
|
|
40 |
|
41 |
### Output Values
|
42 |
|
43 |
Let prediction instances be $P = \{p_1, p_2, \ldots, p_n\}$ and reference instances be $R = \{r_1, r_2, \ldots, r_m\}$.
|
44 |
|
45 |
-
- **
|
46 |
-
- **
|
47 |
-
- **
|
48 |
-
- **
|
49 |
-
- **
|
50 |
-
- **
|
51 |
-
- **
|
52 |
-
- **
|
53 |
-
- **
|
54 |
-
- **
|
55 |
-
- **
|
56 |
-
- **
|
57 |
-
- **matching_precision**: (float): Precision of the matching instances. In the equation, $\frac{ | \{i | \min_{i} \mathrm{
|
58 |
-
- **matching_recall**: (float): Recall of the matching instances. In the equation, $\frac{ | \{j | \min_{j} \mathrm{
|
59 |
- **matching_f1**: (float): F1-score of the matching instances.
|
60 |
- **matching_precision_features**: (list of float): matching_precision computed individually for each feature.
|
61 |
- **matching_recall_features**: (list of float): matching_recall computed individually for each feature.
|
@@ -63,8 +64,8 @@ Let prediction instances be $P = \{p_1, p_2, \ldots, p_n\}$ and reference instan
|
|
63 |
- **macro_matching_precision**: (float): Average of the matching_precision_features.
|
64 |
- **macro_matching_recall**: (float): Average of the matching_recall_features.
|
65 |
- **macro_matching_f1**: (float): Average of the matching_f1_features.
|
66 |
-
- **coverages**: (list of float): Coverage of the matching instances computed on the sampled generated data in cuc_n_samples. In the equation, $[\frac{ | \{ j | \min_{j} \mathrm{
|
67 |
-
- **cuc**: (float): Coverage of the matching instances. In the equation, $\frac{ | \{i | \min_{i} \mathrm{
|
68 |
- **coverages_features**: (list of list of float): coverages computed individually for each feature.
|
69 |
- **cuc_features**: (list of float): cuc computed individually for each feature.
|
70 |
- **macro_coverages**: (list of float): Average of the coverages_features.
|
|
|
13 |
# Metric Card for matching_series
|
14 |
|
15 |
## Metric Description
|
16 |
+
Matching Series is a metric for evaluating time-series generation models. It is based on the idea of matching the generated time-series with the original time-series. The metric calculates the Mean Squared Error (distance) between the generated time-series and the original time-series between matched instances. The metric outputs a score greater or equal to 0, where 0 indicates a perfect generation.
|
17 |
|
18 |
## How to Use
|
19 |
At minium, the metric requires the original time-series and the generated time-series as input. The metric can be used to evaluate the performance of time-series generation models.
|
|
|
28 |
>>> metric = evaluate.load("bowdbeg/matching_series")
|
29 |
>>> results = metric.compute(references=references, predictions=predictions, batch_size=1000)
|
30 |
>>> print(results)
|
31 |
+
{'precision_distance': 0.15843592698313289, 'f1_distance': 0.155065974239652, 'recall_distance': 0.1518363944110798, 'index_distance': 0.17040952035850207, 'precision_distance_features': [0.13823438020409948, 0.13795530908046955, 0.13737011148651265, 0.14067189082974238, 0.1364122789352347, 0.1436081670647643, 0.14458237409706912, 0.13806270434163667, 0.1409687410230486, 0.14361925950728213], 'f1_distance_features': [0.1296088638995658, 0.1321776706161825, 0.13029775314091577, 0.13175439826605778, 0.12737279060587542, 0.1356699896603108, 0.13397234988746393, 0.12775081706715302, 0.1315612879575721, 0.13479662354178928], 'recall_distance_features': [0.12199655178880468, 0.12686452003437784, 0.12391796468320122, 0.12390010513296679, 0.11945686853897312, 0.12856343456552471, 0.12481307474748718, 0.11887226171295895, 0.12333088520535256, 0.1269952147807759], 'index_distance_features': [0.1675969516703118, 0.1670366499114896, 0.1671737398882021, 0.17176917018356727, 0.1648541323369367, 0.1719173137987784, 0.1718364937170575, 0.16298119493341198, 0.17348958360035996, 0.18543997354490532], 'macro_precision_distance': 0.14014852165698596, 'macro_recall_distance': 0.1238710881190423, 'macro_f1_distance': 0.13149625446428864, 'macro_index_distance': 0.17040952035850207, 'matching_precision': 0.1, 'matching_recall': 1.0, 'matching_f1': 0.18181818181818182, 'matching_precision_features': [0.9, 0.9, 0.8, 0.9, 0.9, 0.9, 1.0, 0.8, 1.0, 1.0], 'matching_recall_features': [0.1, 0.09, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1], 'matching_f1_features': [0.18, 0.16363636363636364, 0.17777777777777778, 0.18, 0.18, 0.18, 0.18181818181818182, 0.17777777777777778, 0.18181818181818182, 0.18181818181818182], 'macro_matching_precision': 0.91, 'macro_matching_recall': 0.099, 'macro_matching_f1': 0.17846464646464646, 'cuc': 0.12364285714285712, 'coverages': [0.10000000000000002, 0.20000000000000004, 0.3333333333333333, 0.4666666666666666, 0.7666666666666666, 0.9333333333333332, 1.0], 'macro_cuc': 0.12047857142857143, 'macro_coverages': [0.10000000000000002, 0.19000000000000003, 0.32666666666666666, 0.51, 0.72, 0.8966666666666667, 0.99], 'cuc_features': [0.1175, 0.11607142857142858, 0.12214285714285712, 0.12507142857142856, 0.1202142857142857, 0.11735714285714285, 0.12042857142857144, 0.12028571428571429, 0.12864285714285717, 0.11707142857142858], 'coverages_features': [[0.10000000000000002, 0.20000000000000004, 0.3, 0.43333333333333335, 0.6666666666666666, 0.8666666666666667, 1.0], [0.10000000000000002, 0.20000000000000004, 0.3666666666666667, 0.5666666666666667, 0.6666666666666666, 0.9, 0.9], [0.10000000000000002, 0.16666666666666666, 0.3333333333333333, 0.5, 0.6666666666666666, 0.9333333333333332, 1.0], [0.10000000000000002, 0.20000000000000004, 0.3333333333333333, 0.5666666666666667, 0.7999999999999999, 0.9333333333333332, 1.0], [0.10000000000000002, 0.20000000000000004, 0.3333333333333333, 0.43333333333333335, 0.6999999999999998, 0.9, 1.0], [0.10000000000000002, 0.20000000000000004, 0.26666666666666666, 0.43333333333333335, 0.6666666666666666, 0.8666666666666667, 1.0], [0.10000000000000002, 0.16666666666666666, 0.4000000000000001, 0.6, 0.7333333333333334, 0.8666666666666667, 1.0], [0.10000000000000002, 0.16666666666666666, 0.3, 0.5666666666666667, 0.7666666666666666, 0.8666666666666667, 1.0], [0.10000000000000002, 0.20000000000000004, 0.3, 0.5333333333333333, 0.8000000000000002, 1.0, 1.0], [0.10000000000000002, 0.20000000000000004, 0.3333333333333333, 0.4666666666666666, 0.7333333333333334, 0.8333333333333334, 1.0]]}
|
32 |
```
|
33 |
|
34 |
### Inputs
|
|
|
37 |
- **batch_size**: (int, optional): The batch size for computing the metric. This affects quadratically. Default is None.
|
38 |
- **cuc_n_calculation**: (int, optional): The number of samples to compute the coverage because sampling exists. Default is 3.
|
39 |
- **cuc_n_samples**: (list of int, optional): The number of samples to compute the coverage. Default is $[2^i \text{for} i \leq \log_2 n] + [n]$.
|
40 |
+
- **metric**: (str, optional): The metric to measure distance between examples. Default is "mse". Available options are "mse", "mae", "rmse".
|
41 |
|
42 |
### Output Values
|
43 |
|
44 |
Let prediction instances be $P = \{p_1, p_2, \ldots, p_n\}$ and reference instances be $R = \{r_1, r_2, \ldots, r_m\}$.
|
45 |
|
46 |
+
- **precision_distance**: (float): Average of the distance between the generated instance and the reference instance with the lowest distance. Intuitively, this is similar to precision in classification. In the equation, $\frac{1}{n} \sum_{i=1}^{n} \min_{j} \mathrm{distance}(p_i, r_j)$.
|
47 |
+
- **recall_distance**: (float): Average of the distance between the reference instance and the with the lowest distance. Intuitively, this is similar to recall in classification. In the equation, $\frac{1}{m} \sum_{j=1}^{m} \min_{i} \mathrm{distance}(p_i, r_j)$.
|
48 |
+
- **f1_distance**: (float): Harmonic mean of the precision_distance and recall_distance. This is similar to F1-score in classification.
|
49 |
+
- **index_distance**: (float): Average of the distance between the generated instance and the reference instance with the same index. In the equation, $\frac{1}{n} \sum_{i=1}^{n} \mathrm{distance}(p_i, r_i)$.
|
50 |
+
- **precision_distance_features**: (list of float): precision_distance computed individually for each feature.
|
51 |
+
- **recall_distance_features**: (list of float): recall_distance computed individually for each feature.
|
52 |
+
- **f1_distance_features**: (list of float): f1_distance computed individually for each feature.
|
53 |
+
- **index_distance_features**: (list of float): index_distance computed individually for each feature.
|
54 |
+
- **macro_precision_distance**: (float): Average of the precision_distance_features.
|
55 |
+
- **macro_recall_distance**: (float): Average of the recall_distance_features.
|
56 |
+
- **macro_f1_distance**: (float): Average of the f1_distance_features.
|
57 |
+
- **macro_index_distance**: (float): Average of the index_distance_features.
|
58 |
+
- **matching_precision**: (float): Precision of the matching instances. In the equation, $\frac{ | \{i | \min_{i} \mathrm{distance}(p_i, r_j)\} | }{m}$.
|
59 |
+
- **matching_recall**: (float): Recall of the matching instances. In the equation, $\frac{ | \{j | \min_{j} \mathrm{distance}(p_i, r_j)\} | }{n}$.
|
60 |
- **matching_f1**: (float): F1-score of the matching instances.
|
61 |
- **matching_precision_features**: (list of float): matching_precision computed individually for each feature.
|
62 |
- **matching_recall_features**: (list of float): matching_recall computed individually for each feature.
|
|
|
64 |
- **macro_matching_precision**: (float): Average of the matching_precision_features.
|
65 |
- **macro_matching_recall**: (float): Average of the matching_recall_features.
|
66 |
- **macro_matching_f1**: (float): Average of the matching_f1_features.
|
67 |
+
- **coverages**: (list of float): Coverage of the matching instances computed on the sampled generated data in cuc_n_samples. In the equation, $[\frac{ | \{ j | \min_{j} \mathrm{distance}(p_i, r_j) \text{where}~p_i \in \mathrm{sample}(P, \mathrm{n\_sample}) \} | }{m} \text{for}~\mathrm{n\_sample} \in \mathrm{cuc\_n\_samples} ]$.
|
68 |
+
- **cuc**: (float): Coverage of the matching instances. In the equation, $\frac{ | \{i | \min_{i} \mathrm{distance}(p_i, r_j) < \mathrm{threshold}\} | }{n}$.
|
69 |
- **coverages_features**: (list of list of float): coverages computed individually for each feature.
|
70 |
- **cuc_features**: (list of float): cuc computed individually for each feature.
|
71 |
- **macro_coverages**: (list of float): Average of the coverages_features.
|
matching_series.py
CHANGED
@@ -133,6 +133,7 @@ class matching_series(evaluate.Metric):
|
|
133 |
batch_size: Optional[int] = None,
|
134 |
cuc_n_calculation: int = 3,
|
135 |
cuc_n_samples: Union[List[int], str] = "auto",
|
|
|
136 |
):
|
137 |
"""
|
138 |
Compute the scores of the module given the predictions and references
|
@@ -157,37 +158,41 @@ class matching_series(evaluate.Metric):
|
|
157 |
|
158 |
# at first, convert the inputs to numpy arrays
|
159 |
|
160 |
-
#
|
161 |
# shape: (num_generation, num_reference, num_features)
|
162 |
if batch_size is not None:
|
163 |
-
|
164 |
# iterate over the predictions and references in batches
|
165 |
for i in range(0, len(predictions) + batch_size, batch_size):
|
166 |
for j in range(0, len(references) + batch_size, batch_size):
|
167 |
-
|
168 |
-
|
|
|
|
|
|
|
169 |
)
|
|
|
170 |
else:
|
171 |
-
|
172 |
|
173 |
-
|
174 |
|
175 |
# matching scores
|
176 |
-
|
177 |
# best match for each generated time series
|
178 |
# shape: (num_generation,)
|
179 |
-
best_match = np.argmin(
|
180 |
|
181 |
-
# matching
|
182 |
# shape: (num_generation,)
|
183 |
-
|
184 |
|
185 |
# best match for each reference time series
|
186 |
# shape: (num_reference,)
|
187 |
-
best_match_inv = np.argmin(
|
188 |
-
|
189 |
|
190 |
-
|
191 |
|
192 |
# matching precision, recall and f1
|
193 |
matching_recall = np.unique(best_match).size / len(best_match_inv)
|
@@ -195,27 +200,27 @@ class matching_series(evaluate.Metric):
|
|
195 |
matching_f1 = 2 / (1 / matching_precision + 1 / matching_recall)
|
196 |
|
197 |
# take matching for each feature and compute metrics for them
|
198 |
-
|
199 |
-
|
200 |
-
|
201 |
matching_precision_features = []
|
202 |
matching_recall_features = []
|
203 |
matching_f1_features = []
|
204 |
-
|
205 |
coverages_features = []
|
206 |
cuc_features = []
|
207 |
for f in range(predictions.shape[-1]):
|
208 |
-
|
209 |
-
|
210 |
-
best_match_f = np.argmin(
|
211 |
-
|
212 |
-
best_match_inv_f = np.argmin(
|
213 |
-
|
214 |
-
|
215 |
-
|
216 |
-
|
217 |
-
|
218 |
-
|
219 |
|
220 |
matching_recall_f = np.unique(best_match_f).size / len(best_match_f)
|
221 |
matching_precision_f = np.unique(best_match_inv_f).size / len(best_match_inv_f)
|
@@ -228,10 +233,10 @@ class matching_series(evaluate.Metric):
|
|
228 |
coverages_features.append(coverages_f)
|
229 |
cuc_features.append(cuc_f)
|
230 |
|
231 |
-
|
232 |
-
|
233 |
-
|
234 |
-
|
235 |
|
236 |
macro_matching_precision = statistics.mean(matching_precision_features)
|
237 |
macro_matching_recall = statistics.mean(matching_recall_features)
|
@@ -244,18 +249,18 @@ class matching_series(evaluate.Metric):
|
|
244 |
macro_coverages = [statistics.mean(c) for c in zip(*coverages_features)]
|
245 |
|
246 |
return {
|
247 |
-
"
|
248 |
-
"
|
249 |
-
"
|
250 |
-
"
|
251 |
-
"
|
252 |
-
"
|
253 |
-
"
|
254 |
-
"
|
255 |
-
"
|
256 |
-
"
|
257 |
-
"
|
258 |
-
"
|
259 |
"matching_precision": matching_precision,
|
260 |
"matching_recall": matching_recall,
|
261 |
"matching_f1": matching_f1,
|
@@ -305,3 +310,13 @@ class matching_series(evaluate.Metric):
|
|
305 |
coverages.append(coverage / n_calculation)
|
306 |
cuc = np.trapz(coverages, n_samples) / len(n_samples) / max(n_samples)
|
307 |
return coverages, cuc
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
133 |
batch_size: Optional[int] = None,
|
134 |
cuc_n_calculation: int = 3,
|
135 |
cuc_n_samples: Union[List[int], str] = "auto",
|
136 |
+
metric: str = "mse",
|
137 |
):
|
138 |
"""
|
139 |
Compute the scores of the module given the predictions and references
|
|
|
158 |
|
159 |
# at first, convert the inputs to numpy arrays
|
160 |
|
161 |
+
# distance between predictions and references for all example combinations for each features
|
162 |
# shape: (num_generation, num_reference, num_features)
|
163 |
if batch_size is not None:
|
164 |
+
distance = np.zeros((len(predictions), len(references), predictions.shape[-1]))
|
165 |
# iterate over the predictions and references in batches
|
166 |
for i in range(0, len(predictions) + batch_size, batch_size):
|
167 |
for j in range(0, len(references) + batch_size, batch_size):
|
168 |
+
d = self._compute_metric(
|
169 |
+
predictions[i : i + batch_size, None],
|
170 |
+
references[None, j : j + batch_size],
|
171 |
+
metric=metric,
|
172 |
+
axis=-2,
|
173 |
)
|
174 |
+
distance[i : i + batch_size, j : j + batch_size] = d
|
175 |
else:
|
176 |
+
distance = self._compute_metric(predictions[:, None], references, metric=metric, axis=1)
|
177 |
|
178 |
+
index_distance = distance.diagonal(axis1=0, axis2=1).mean()
|
179 |
|
180 |
# matching scores
|
181 |
+
distance_mean = distance.mean(axis=-1)
|
182 |
# best match for each generated time series
|
183 |
# shape: (num_generation,)
|
184 |
+
best_match = np.argmin(distance_mean, axis=-1)
|
185 |
|
186 |
+
# matching distance
|
187 |
# shape: (num_generation,)
|
188 |
+
precision_distance = distance_mean[np.arange(len(best_match)), best_match].mean()
|
189 |
|
190 |
# best match for each reference time series
|
191 |
# shape: (num_reference,)
|
192 |
+
best_match_inv = np.argmin(distance_mean, axis=0)
|
193 |
+
recall_distance = distance_mean[best_match_inv, np.arange(len(best_match_inv))].mean()
|
194 |
|
195 |
+
f1_distance = 2 / (1 / precision_distance + 1 / recall_distance)
|
196 |
|
197 |
# matching precision, recall and f1
|
198 |
matching_recall = np.unique(best_match).size / len(best_match_inv)
|
|
|
200 |
matching_f1 = 2 / (1 / matching_precision + 1 / matching_recall)
|
201 |
|
202 |
# take matching for each feature and compute metrics for them
|
203 |
+
precision_distance_features = []
|
204 |
+
recall_distance_features = []
|
205 |
+
f1_distance_features = []
|
206 |
matching_precision_features = []
|
207 |
matching_recall_features = []
|
208 |
matching_f1_features = []
|
209 |
+
index_distance_features = []
|
210 |
coverages_features = []
|
211 |
cuc_features = []
|
212 |
for f in range(predictions.shape[-1]):
|
213 |
+
distance_f = distance[:, :, f]
|
214 |
+
index_distance_f = distance_f.diagonal(axis1=0, axis2=1).mean()
|
215 |
+
best_match_f = np.argmin(distance_f, axis=-1)
|
216 |
+
precision_distance_f = distance_f[np.arange(len(best_match_f)), best_match_f].mean()
|
217 |
+
best_match_inv_f = np.argmin(distance_f, axis=0)
|
218 |
+
recall_distance_f = distance_f[best_match_inv_f, np.arange(len(best_match_inv_f))].mean()
|
219 |
+
f1_distance_f = 2 / (1 / precision_distance_f + 1 / recall_distance_f)
|
220 |
+
precision_distance_features.append(precision_distance_f)
|
221 |
+
recall_distance_features.append(recall_distance_f)
|
222 |
+
f1_distance_features.append(f1_distance_f)
|
223 |
+
index_distance_features.append(index_distance_f)
|
224 |
|
225 |
matching_recall_f = np.unique(best_match_f).size / len(best_match_f)
|
226 |
matching_precision_f = np.unique(best_match_inv_f).size / len(best_match_inv_f)
|
|
|
233 |
coverages_features.append(coverages_f)
|
234 |
cuc_features.append(cuc_f)
|
235 |
|
236 |
+
macro_precision_distance = statistics.mean(precision_distance_features)
|
237 |
+
macro_recall_distance = statistics.mean(recall_distance_features)
|
238 |
+
macro_f1_distance = statistics.mean(f1_distance_features)
|
239 |
+
macro_index_distance = statistics.mean(index_distance_features)
|
240 |
|
241 |
macro_matching_precision = statistics.mean(matching_precision_features)
|
242 |
macro_matching_recall = statistics.mean(matching_recall_features)
|
|
|
249 |
macro_coverages = [statistics.mean(c) for c in zip(*coverages_features)]
|
250 |
|
251 |
return {
|
252 |
+
"precision_distance": precision_distance,
|
253 |
+
"f1_distance": f1_distance,
|
254 |
+
"recall_distance": recall_distance,
|
255 |
+
"index_distance": index_distance,
|
256 |
+
"precision_distance_features": precision_distance_features,
|
257 |
+
"f1_distance_features": f1_distance_features,
|
258 |
+
"recall_distance_features": recall_distance_features,
|
259 |
+
"index_distance_features": index_distance_features,
|
260 |
+
"macro_precision_distance": macro_precision_distance,
|
261 |
+
"macro_recall_distance": macro_recall_distance,
|
262 |
+
"macro_f1_distance": macro_f1_distance,
|
263 |
+
"macro_index_distance": macro_index_distance,
|
264 |
"matching_precision": matching_precision,
|
265 |
"matching_recall": matching_recall,
|
266 |
"matching_f1": matching_f1,
|
|
|
310 |
coverages.append(coverage / n_calculation)
|
311 |
cuc = np.trapz(coverages, n_samples) / len(n_samples) / max(n_samples)
|
312 |
return coverages, cuc
|
313 |
+
|
314 |
+
def _compute_metric(self, x, y, metric: str = "mse", axis: int = -1):
|
315 |
+
if metric.lower() == "mse":
|
316 |
+
return np.mean((x - y) ** 2, axis=axis)
|
317 |
+
elif metric.lower() == "mae":
|
318 |
+
return np.mean(np.abs(x - y), axis=axis)
|
319 |
+
elif metric.lower() == "rmse":
|
320 |
+
return np.sqrt(self._compute_metric(x, y, metric="mse", axis=axis))
|
321 |
+
else:
|
322 |
+
raise ValueError("Unknown metric: {}".format(metric))
|