bowdbeg commited on
Commit
774aee4
1 Parent(s): b81d9c7

add other metrics

Browse files
Files changed (2) hide show
  1. README.md +19 -18
  2. matching_series.py +59 -44
README.md CHANGED
@@ -13,7 +13,7 @@ pinned: false
13
  # Metric Card for matching_series
14
 
15
  ## Metric Description
16
- Matching Series is a metric for evaluating time-series generation models. It is based on the idea of matching the generated time-series with the original time-series. The metric calculates the Mean Squared Error (MSE) between the generated time-series and the original time-series between matched instances. The metric outputs a score greater or equal to 0, where 0 indicates a perfect generation.
17
 
18
  ## How to Use
19
  At minium, the metric requires the original time-series and the generated time-series as input. The metric can be used to evaluate the performance of time-series generation models.
@@ -28,7 +28,7 @@ At minium, the metric requires the original time-series and the generated time-s
28
  >>> metric = evaluate.load("bowdbeg/matching_series")
29
  >>> results = metric.compute(references=references, predictions=predictions, batch_size=1000)
30
  >>> print(results)
31
- {'precision_mse': 0.15843592698313289, 'f1_mse': 0.155065974239652, 'recall_mse': 0.1518363944110798, 'index_mse': 0.17040952035850207, 'precision_mse_features': [0.13823438020409948, 0.13795530908046955, 0.13737011148651265, 0.14067189082974238, 0.1364122789352347, 0.1436081670647643, 0.14458237409706912, 0.13806270434163667, 0.1409687410230486, 0.14361925950728213], 'f1_mse_features': [0.1296088638995658, 0.1321776706161825, 0.13029775314091577, 0.13175439826605778, 0.12737279060587542, 0.1356699896603108, 0.13397234988746393, 0.12775081706715302, 0.1315612879575721, 0.13479662354178928], 'recall_mse_features': [0.12199655178880468, 0.12686452003437784, 0.12391796468320122, 0.12390010513296679, 0.11945686853897312, 0.12856343456552471, 0.12481307474748718, 0.11887226171295895, 0.12333088520535256, 0.1269952147807759], 'index_mse_features': [0.1675969516703118, 0.1670366499114896, 0.1671737398882021, 0.17176917018356727, 0.1648541323369367, 0.1719173137987784, 0.1718364937170575, 0.16298119493341198, 0.17348958360035996, 0.18543997354490532], 'macro_precision_mse': 0.14014852165698596, 'macro_recall_mse': 0.1238710881190423, 'macro_f1_mse': 0.13149625446428864, 'macro_index_mse': 0.17040952035850207, 'matching_precision': 0.1, 'matching_recall': 1.0, 'matching_f1': 0.18181818181818182, 'matching_precision_features': [0.9, 0.9, 0.8, 0.9, 0.9, 0.9, 1.0, 0.8, 1.0, 1.0], 'matching_recall_features': [0.1, 0.09, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1], 'matching_f1_features': [0.18, 0.16363636363636364, 0.17777777777777778, 0.18, 0.18, 0.18, 0.18181818181818182, 0.17777777777777778, 0.18181818181818182, 0.18181818181818182], 'macro_matching_precision': 0.91, 'macro_matching_recall': 0.099, 'macro_matching_f1': 0.17846464646464646, 'cuc': 0.12364285714285712, 'coverages': [0.10000000000000002, 0.20000000000000004, 0.3333333333333333, 0.4666666666666666, 0.7666666666666666, 0.9333333333333332, 1.0], 'macro_cuc': 0.12047857142857143, 'macro_coverages': [0.10000000000000002, 0.19000000000000003, 0.32666666666666666, 0.51, 0.72, 0.8966666666666667, 0.99], 'cuc_features': [0.1175, 0.11607142857142858, 0.12214285714285712, 0.12507142857142856, 0.1202142857142857, 0.11735714285714285, 0.12042857142857144, 0.12028571428571429, 0.12864285714285717, 0.11707142857142858], 'coverages_features': [[0.10000000000000002, 0.20000000000000004, 0.3, 0.43333333333333335, 0.6666666666666666, 0.8666666666666667, 1.0], [0.10000000000000002, 0.20000000000000004, 0.3666666666666667, 0.5666666666666667, 0.6666666666666666, 0.9, 0.9], [0.10000000000000002, 0.16666666666666666, 0.3333333333333333, 0.5, 0.6666666666666666, 0.9333333333333332, 1.0], [0.10000000000000002, 0.20000000000000004, 0.3333333333333333, 0.5666666666666667, 0.7999999999999999, 0.9333333333333332, 1.0], [0.10000000000000002, 0.20000000000000004, 0.3333333333333333, 0.43333333333333335, 0.6999999999999998, 0.9, 1.0], [0.10000000000000002, 0.20000000000000004, 0.26666666666666666, 0.43333333333333335, 0.6666666666666666, 0.8666666666666667, 1.0], [0.10000000000000002, 0.16666666666666666, 0.4000000000000001, 0.6, 0.7333333333333334, 0.8666666666666667, 1.0], [0.10000000000000002, 0.16666666666666666, 0.3, 0.5666666666666667, 0.7666666666666666, 0.8666666666666667, 1.0], [0.10000000000000002, 0.20000000000000004, 0.3, 0.5333333333333333, 0.8000000000000002, 1.0, 1.0], [0.10000000000000002, 0.20000000000000004, 0.3333333333333333, 0.4666666666666666, 0.7333333333333334, 0.8333333333333334, 1.0]]}
32
  ```
33
 
34
  ### Inputs
@@ -37,25 +37,26 @@ At minium, the metric requires the original time-series and the generated time-s
37
  - **batch_size**: (int, optional): The batch size for computing the metric. This affects quadratically. Default is None.
38
  - **cuc_n_calculation**: (int, optional): The number of samples to compute the coverage because sampling exists. Default is 3.
39
  - **cuc_n_samples**: (list of int, optional): The number of samples to compute the coverage. Default is $[2^i \text{for} i \leq \log_2 n] + [n]$.
 
40
 
41
  ### Output Values
42
 
43
  Let prediction instances be $P = \{p_1, p_2, \ldots, p_n\}$ and reference instances be $R = \{r_1, r_2, \ldots, r_m\}$.
44
 
45
- - **precision_mse**: (float): Average of the MSE between the generated instance and the reference instance with the lowest MSE. Intuitively, this is similar to precision in classification. In the equation, $\frac{1}{n} \sum_{i=1}^{n} \min_{j} \mathrm{MSE}(p_i, r_j)$.
46
- - **recall_mse**: (float): Average of the MSE between the reference instance and the with the lowest MSE. Intuitively, this is similar to recall in classification. In the equation, $\frac{1}{m} \sum_{j=1}^{m} \min_{i} \mathrm{MSE}(p_i, r_j)$.
47
- - **f1_mse**: (float): Harmonic mean of the precision_mse and recall_mse. This is similar to F1-score in classification.
48
- - **index_mse**: (float): Average of the MSE between the generated instance and the reference instance with the same index. In the equation, $\frac{1}{n} \sum_{i=1}^{n} \mathrm{MSE}(p_i, r_i)$.
49
- - **precision_mse_features**: (list of float): precision_mse computed individually for each feature.
50
- - **recall_mse_features**: (list of float): recall_mse computed individually for each feature.
51
- - **f1_mse_features**: (list of float): f1_mse computed individually for each feature.
52
- - **index_mse_features**: (list of float): index_mse computed individually for each feature.
53
- - **macro_precision_mse**: (float): Average of the precision_mse_features.
54
- - **macro_recall_mse**: (float): Average of the recall_mse_features.
55
- - **macro_f1_mse**: (float): Average of the f1_mse_features.
56
- - **macro_index_mse**: (float): Average of the index_mse_features.
57
- - **matching_precision**: (float): Precision of the matching instances. In the equation, $\frac{ | \{i | \min_{i} \mathrm{MSE}(p_i, r_j)\} | }{m}$.
58
- - **matching_recall**: (float): Recall of the matching instances. In the equation, $\frac{ | \{j | \min_{j} \mathrm{MSE}(p_i, r_j)\} | }{n}$.
59
  - **matching_f1**: (float): F1-score of the matching instances.
60
  - **matching_precision_features**: (list of float): matching_precision computed individually for each feature.
61
  - **matching_recall_features**: (list of float): matching_recall computed individually for each feature.
@@ -63,8 +64,8 @@ Let prediction instances be $P = \{p_1, p_2, \ldots, p_n\}$ and reference instan
63
  - **macro_matching_precision**: (float): Average of the matching_precision_features.
64
  - **macro_matching_recall**: (float): Average of the matching_recall_features.
65
  - **macro_matching_f1**: (float): Average of the matching_f1_features.
66
- - **coverages**: (list of float): Coverage of the matching instances computed on the sampled generated data in cuc_n_samples. In the equation, $[\frac{ | \{ j | \min_{j} \mathrm{MSE}(p_i, r_j) \text{where}~p_i \in \mathrm{sample}(P, \mathrm{n\_sample}) \} | }{m} \text{for}~\mathrm{n\_sample} \in \mathrm{cuc\_n\_samples} ]$.
67
- - **cuc**: (float): Coverage of the matching instances. In the equation, $\frac{ | \{i | \min_{i} \mathrm{MSE}(p_i, r_j) < \mathrm{threshold}\} | }{n}$.
68
  - **coverages_features**: (list of list of float): coverages computed individually for each feature.
69
  - **cuc_features**: (list of float): cuc computed individually for each feature.
70
  - **macro_coverages**: (list of float): Average of the coverages_features.
 
13
  # Metric Card for matching_series
14
 
15
  ## Metric Description
16
+ Matching Series is a metric for evaluating time-series generation models. It is based on the idea of matching the generated time-series with the original time-series. The metric calculates the Mean Squared Error (distance) between the generated time-series and the original time-series between matched instances. The metric outputs a score greater or equal to 0, where 0 indicates a perfect generation.
17
 
18
  ## How to Use
19
  At minium, the metric requires the original time-series and the generated time-series as input. The metric can be used to evaluate the performance of time-series generation models.
 
28
  >>> metric = evaluate.load("bowdbeg/matching_series")
29
  >>> results = metric.compute(references=references, predictions=predictions, batch_size=1000)
30
  >>> print(results)
31
+ {'precision_distance': 0.15843592698313289, 'f1_distance': 0.155065974239652, 'recall_distance': 0.1518363944110798, 'index_distance': 0.17040952035850207, 'precision_distance_features': [0.13823438020409948, 0.13795530908046955, 0.13737011148651265, 0.14067189082974238, 0.1364122789352347, 0.1436081670647643, 0.14458237409706912, 0.13806270434163667, 0.1409687410230486, 0.14361925950728213], 'f1_distance_features': [0.1296088638995658, 0.1321776706161825, 0.13029775314091577, 0.13175439826605778, 0.12737279060587542, 0.1356699896603108, 0.13397234988746393, 0.12775081706715302, 0.1315612879575721, 0.13479662354178928], 'recall_distance_features': [0.12199655178880468, 0.12686452003437784, 0.12391796468320122, 0.12390010513296679, 0.11945686853897312, 0.12856343456552471, 0.12481307474748718, 0.11887226171295895, 0.12333088520535256, 0.1269952147807759], 'index_distance_features': [0.1675969516703118, 0.1670366499114896, 0.1671737398882021, 0.17176917018356727, 0.1648541323369367, 0.1719173137987784, 0.1718364937170575, 0.16298119493341198, 0.17348958360035996, 0.18543997354490532], 'macro_precision_distance': 0.14014852165698596, 'macro_recall_distance': 0.1238710881190423, 'macro_f1_distance': 0.13149625446428864, 'macro_index_distance': 0.17040952035850207, 'matching_precision': 0.1, 'matching_recall': 1.0, 'matching_f1': 0.18181818181818182, 'matching_precision_features': [0.9, 0.9, 0.8, 0.9, 0.9, 0.9, 1.0, 0.8, 1.0, 1.0], 'matching_recall_features': [0.1, 0.09, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1], 'matching_f1_features': [0.18, 0.16363636363636364, 0.17777777777777778, 0.18, 0.18, 0.18, 0.18181818181818182, 0.17777777777777778, 0.18181818181818182, 0.18181818181818182], 'macro_matching_precision': 0.91, 'macro_matching_recall': 0.099, 'macro_matching_f1': 0.17846464646464646, 'cuc': 0.12364285714285712, 'coverages': [0.10000000000000002, 0.20000000000000004, 0.3333333333333333, 0.4666666666666666, 0.7666666666666666, 0.9333333333333332, 1.0], 'macro_cuc': 0.12047857142857143, 'macro_coverages': [0.10000000000000002, 0.19000000000000003, 0.32666666666666666, 0.51, 0.72, 0.8966666666666667, 0.99], 'cuc_features': [0.1175, 0.11607142857142858, 0.12214285714285712, 0.12507142857142856, 0.1202142857142857, 0.11735714285714285, 0.12042857142857144, 0.12028571428571429, 0.12864285714285717, 0.11707142857142858], 'coverages_features': [[0.10000000000000002, 0.20000000000000004, 0.3, 0.43333333333333335, 0.6666666666666666, 0.8666666666666667, 1.0], [0.10000000000000002, 0.20000000000000004, 0.3666666666666667, 0.5666666666666667, 0.6666666666666666, 0.9, 0.9], [0.10000000000000002, 0.16666666666666666, 0.3333333333333333, 0.5, 0.6666666666666666, 0.9333333333333332, 1.0], [0.10000000000000002, 0.20000000000000004, 0.3333333333333333, 0.5666666666666667, 0.7999999999999999, 0.9333333333333332, 1.0], [0.10000000000000002, 0.20000000000000004, 0.3333333333333333, 0.43333333333333335, 0.6999999999999998, 0.9, 1.0], [0.10000000000000002, 0.20000000000000004, 0.26666666666666666, 0.43333333333333335, 0.6666666666666666, 0.8666666666666667, 1.0], [0.10000000000000002, 0.16666666666666666, 0.4000000000000001, 0.6, 0.7333333333333334, 0.8666666666666667, 1.0], [0.10000000000000002, 0.16666666666666666, 0.3, 0.5666666666666667, 0.7666666666666666, 0.8666666666666667, 1.0], [0.10000000000000002, 0.20000000000000004, 0.3, 0.5333333333333333, 0.8000000000000002, 1.0, 1.0], [0.10000000000000002, 0.20000000000000004, 0.3333333333333333, 0.4666666666666666, 0.7333333333333334, 0.8333333333333334, 1.0]]}
32
  ```
33
 
34
  ### Inputs
 
37
  - **batch_size**: (int, optional): The batch size for computing the metric. This affects quadratically. Default is None.
38
  - **cuc_n_calculation**: (int, optional): The number of samples to compute the coverage because sampling exists. Default is 3.
39
  - **cuc_n_samples**: (list of int, optional): The number of samples to compute the coverage. Default is $[2^i \text{for} i \leq \log_2 n] + [n]$.
40
+ - **metric**: (str, optional): The metric to measure distance between examples. Default is "mse". Available options are "mse", "mae", "rmse".
41
 
42
  ### Output Values
43
 
44
  Let prediction instances be $P = \{p_1, p_2, \ldots, p_n\}$ and reference instances be $R = \{r_1, r_2, \ldots, r_m\}$.
45
 
46
+ - **precision_distance**: (float): Average of the distance between the generated instance and the reference instance with the lowest distance. Intuitively, this is similar to precision in classification. In the equation, $\frac{1}{n} \sum_{i=1}^{n} \min_{j} \mathrm{distance}(p_i, r_j)$.
47
+ - **recall_distance**: (float): Average of the distance between the reference instance and the with the lowest distance. Intuitively, this is similar to recall in classification. In the equation, $\frac{1}{m} \sum_{j=1}^{m} \min_{i} \mathrm{distance}(p_i, r_j)$.
48
+ - **f1_distance**: (float): Harmonic mean of the precision_distance and recall_distance. This is similar to F1-score in classification.
49
+ - **index_distance**: (float): Average of the distance between the generated instance and the reference instance with the same index. In the equation, $\frac{1}{n} \sum_{i=1}^{n} \mathrm{distance}(p_i, r_i)$.
50
+ - **precision_distance_features**: (list of float): precision_distance computed individually for each feature.
51
+ - **recall_distance_features**: (list of float): recall_distance computed individually for each feature.
52
+ - **f1_distance_features**: (list of float): f1_distance computed individually for each feature.
53
+ - **index_distance_features**: (list of float): index_distance computed individually for each feature.
54
+ - **macro_precision_distance**: (float): Average of the precision_distance_features.
55
+ - **macro_recall_distance**: (float): Average of the recall_distance_features.
56
+ - **macro_f1_distance**: (float): Average of the f1_distance_features.
57
+ - **macro_index_distance**: (float): Average of the index_distance_features.
58
+ - **matching_precision**: (float): Precision of the matching instances. In the equation, $\frac{ | \{i | \min_{i} \mathrm{distance}(p_i, r_j)\} | }{m}$.
59
+ - **matching_recall**: (float): Recall of the matching instances. In the equation, $\frac{ | \{j | \min_{j} \mathrm{distance}(p_i, r_j)\} | }{n}$.
60
  - **matching_f1**: (float): F1-score of the matching instances.
61
  - **matching_precision_features**: (list of float): matching_precision computed individually for each feature.
62
  - **matching_recall_features**: (list of float): matching_recall computed individually for each feature.
 
64
  - **macro_matching_precision**: (float): Average of the matching_precision_features.
65
  - **macro_matching_recall**: (float): Average of the matching_recall_features.
66
  - **macro_matching_f1**: (float): Average of the matching_f1_features.
67
+ - **coverages**: (list of float): Coverage of the matching instances computed on the sampled generated data in cuc_n_samples. In the equation, $[\frac{ | \{ j | \min_{j} \mathrm{distance}(p_i, r_j) \text{where}~p_i \in \mathrm{sample}(P, \mathrm{n\_sample}) \} | }{m} \text{for}~\mathrm{n\_sample} \in \mathrm{cuc\_n\_samples} ]$.
68
+ - **cuc**: (float): Coverage of the matching instances. In the equation, $\frac{ | \{i | \min_{i} \mathrm{distance}(p_i, r_j) < \mathrm{threshold}\} | }{n}$.
69
  - **coverages_features**: (list of list of float): coverages computed individually for each feature.
70
  - **cuc_features**: (list of float): cuc computed individually for each feature.
71
  - **macro_coverages**: (list of float): Average of the coverages_features.
matching_series.py CHANGED
@@ -133,6 +133,7 @@ class matching_series(evaluate.Metric):
133
  batch_size: Optional[int] = None,
134
  cuc_n_calculation: int = 3,
135
  cuc_n_samples: Union[List[int], str] = "auto",
 
136
  ):
137
  """
138
  Compute the scores of the module given the predictions and references
@@ -157,37 +158,41 @@ class matching_series(evaluate.Metric):
157
 
158
  # at first, convert the inputs to numpy arrays
159
 
160
- # MSE between predictions and references for all example combinations for each features
161
  # shape: (num_generation, num_reference, num_features)
162
  if batch_size is not None:
163
- mse = np.zeros((len(predictions), len(references), predictions.shape[-1]))
164
  # iterate over the predictions and references in batches
165
  for i in range(0, len(predictions) + batch_size, batch_size):
166
  for j in range(0, len(references) + batch_size, batch_size):
167
- mse[i : i + batch_size, j : j + batch_size] = np.mean(
168
- (predictions[i : i + batch_size, None] - references[None, j : j + batch_size]) ** 2, axis=-2
 
 
 
169
  )
 
170
  else:
171
- mse = np.mean((predictions[:, None] - references) ** 2, axis=1)
172
 
173
- index_mse = mse.diagonal(axis1=0, axis2=1).mean()
174
 
175
  # matching scores
176
- mse_mean = mse.mean(axis=-1)
177
  # best match for each generated time series
178
  # shape: (num_generation,)
179
- best_match = np.argmin(mse_mean, axis=-1)
180
 
181
- # matching mse
182
  # shape: (num_generation,)
183
- precision_mse = mse_mean[np.arange(len(best_match)), best_match].mean()
184
 
185
  # best match for each reference time series
186
  # shape: (num_reference,)
187
- best_match_inv = np.argmin(mse_mean, axis=0)
188
- recall_mse = mse_mean[best_match_inv, np.arange(len(best_match_inv))].mean()
189
 
190
- f1_mse = 2 / (1 / precision_mse + 1 / recall_mse)
191
 
192
  # matching precision, recall and f1
193
  matching_recall = np.unique(best_match).size / len(best_match_inv)
@@ -195,27 +200,27 @@ class matching_series(evaluate.Metric):
195
  matching_f1 = 2 / (1 / matching_precision + 1 / matching_recall)
196
 
197
  # take matching for each feature and compute metrics for them
198
- precision_mse_features = []
199
- recall_mse_features = []
200
- f1_mse_features = []
201
  matching_precision_features = []
202
  matching_recall_features = []
203
  matching_f1_features = []
204
- index_mse_features = []
205
  coverages_features = []
206
  cuc_features = []
207
  for f in range(predictions.shape[-1]):
208
- mse_f = mse[:, :, f]
209
- index_mse_f = mse_f.diagonal(axis1=0, axis2=1).mean()
210
- best_match_f = np.argmin(mse_f, axis=-1)
211
- precision_mse_f = mse_f[np.arange(len(best_match_f)), best_match_f].mean()
212
- best_match_inv_f = np.argmin(mse_f, axis=0)
213
- recall_mse_f = mse_f[best_match_inv_f, np.arange(len(best_match_inv_f))].mean()
214
- f1_mse_f = 2 / (1 / precision_mse_f + 1 / recall_mse_f)
215
- precision_mse_features.append(precision_mse_f)
216
- recall_mse_features.append(recall_mse_f)
217
- f1_mse_features.append(f1_mse_f)
218
- index_mse_features.append(index_mse_f)
219
 
220
  matching_recall_f = np.unique(best_match_f).size / len(best_match_f)
221
  matching_precision_f = np.unique(best_match_inv_f).size / len(best_match_inv_f)
@@ -228,10 +233,10 @@ class matching_series(evaluate.Metric):
228
  coverages_features.append(coverages_f)
229
  cuc_features.append(cuc_f)
230
 
231
- macro_precision_mse = statistics.mean(precision_mse_features)
232
- macro_recall_mse = statistics.mean(recall_mse_features)
233
- macro_f1_mse = statistics.mean(f1_mse_features)
234
- macro_index_mse = statistics.mean(index_mse_features)
235
 
236
  macro_matching_precision = statistics.mean(matching_precision_features)
237
  macro_matching_recall = statistics.mean(matching_recall_features)
@@ -244,18 +249,18 @@ class matching_series(evaluate.Metric):
244
  macro_coverages = [statistics.mean(c) for c in zip(*coverages_features)]
245
 
246
  return {
247
- "precision_mse": precision_mse,
248
- "f1_mse": f1_mse,
249
- "recall_mse": recall_mse,
250
- "index_mse": index_mse,
251
- "precision_mse_features": precision_mse_features,
252
- "f1_mse_features": f1_mse_features,
253
- "recall_mse_features": recall_mse_features,
254
- "index_mse_features": index_mse_features,
255
- "macro_precision_mse": macro_precision_mse,
256
- "macro_recall_mse": macro_recall_mse,
257
- "macro_f1_mse": macro_f1_mse,
258
- "macro_index_mse": macro_index_mse,
259
  "matching_precision": matching_precision,
260
  "matching_recall": matching_recall,
261
  "matching_f1": matching_f1,
@@ -305,3 +310,13 @@ class matching_series(evaluate.Metric):
305
  coverages.append(coverage / n_calculation)
306
  cuc = np.trapz(coverages, n_samples) / len(n_samples) / max(n_samples)
307
  return coverages, cuc
 
 
 
 
 
 
 
 
 
 
 
133
  batch_size: Optional[int] = None,
134
  cuc_n_calculation: int = 3,
135
  cuc_n_samples: Union[List[int], str] = "auto",
136
+ metric: str = "mse",
137
  ):
138
  """
139
  Compute the scores of the module given the predictions and references
 
158
 
159
  # at first, convert the inputs to numpy arrays
160
 
161
+ # distance between predictions and references for all example combinations for each features
162
  # shape: (num_generation, num_reference, num_features)
163
  if batch_size is not None:
164
+ distance = np.zeros((len(predictions), len(references), predictions.shape[-1]))
165
  # iterate over the predictions and references in batches
166
  for i in range(0, len(predictions) + batch_size, batch_size):
167
  for j in range(0, len(references) + batch_size, batch_size):
168
+ d = self._compute_metric(
169
+ predictions[i : i + batch_size, None],
170
+ references[None, j : j + batch_size],
171
+ metric=metric,
172
+ axis=-2,
173
  )
174
+ distance[i : i + batch_size, j : j + batch_size] = d
175
  else:
176
+ distance = self._compute_metric(predictions[:, None], references, metric=metric, axis=1)
177
 
178
+ index_distance = distance.diagonal(axis1=0, axis2=1).mean()
179
 
180
  # matching scores
181
+ distance_mean = distance.mean(axis=-1)
182
  # best match for each generated time series
183
  # shape: (num_generation,)
184
+ best_match = np.argmin(distance_mean, axis=-1)
185
 
186
+ # matching distance
187
  # shape: (num_generation,)
188
+ precision_distance = distance_mean[np.arange(len(best_match)), best_match].mean()
189
 
190
  # best match for each reference time series
191
  # shape: (num_reference,)
192
+ best_match_inv = np.argmin(distance_mean, axis=0)
193
+ recall_distance = distance_mean[best_match_inv, np.arange(len(best_match_inv))].mean()
194
 
195
+ f1_distance = 2 / (1 / precision_distance + 1 / recall_distance)
196
 
197
  # matching precision, recall and f1
198
  matching_recall = np.unique(best_match).size / len(best_match_inv)
 
200
  matching_f1 = 2 / (1 / matching_precision + 1 / matching_recall)
201
 
202
  # take matching for each feature and compute metrics for them
203
+ precision_distance_features = []
204
+ recall_distance_features = []
205
+ f1_distance_features = []
206
  matching_precision_features = []
207
  matching_recall_features = []
208
  matching_f1_features = []
209
+ index_distance_features = []
210
  coverages_features = []
211
  cuc_features = []
212
  for f in range(predictions.shape[-1]):
213
+ distance_f = distance[:, :, f]
214
+ index_distance_f = distance_f.diagonal(axis1=0, axis2=1).mean()
215
+ best_match_f = np.argmin(distance_f, axis=-1)
216
+ precision_distance_f = distance_f[np.arange(len(best_match_f)), best_match_f].mean()
217
+ best_match_inv_f = np.argmin(distance_f, axis=0)
218
+ recall_distance_f = distance_f[best_match_inv_f, np.arange(len(best_match_inv_f))].mean()
219
+ f1_distance_f = 2 / (1 / precision_distance_f + 1 / recall_distance_f)
220
+ precision_distance_features.append(precision_distance_f)
221
+ recall_distance_features.append(recall_distance_f)
222
+ f1_distance_features.append(f1_distance_f)
223
+ index_distance_features.append(index_distance_f)
224
 
225
  matching_recall_f = np.unique(best_match_f).size / len(best_match_f)
226
  matching_precision_f = np.unique(best_match_inv_f).size / len(best_match_inv_f)
 
233
  coverages_features.append(coverages_f)
234
  cuc_features.append(cuc_f)
235
 
236
+ macro_precision_distance = statistics.mean(precision_distance_features)
237
+ macro_recall_distance = statistics.mean(recall_distance_features)
238
+ macro_f1_distance = statistics.mean(f1_distance_features)
239
+ macro_index_distance = statistics.mean(index_distance_features)
240
 
241
  macro_matching_precision = statistics.mean(matching_precision_features)
242
  macro_matching_recall = statistics.mean(matching_recall_features)
 
249
  macro_coverages = [statistics.mean(c) for c in zip(*coverages_features)]
250
 
251
  return {
252
+ "precision_distance": precision_distance,
253
+ "f1_distance": f1_distance,
254
+ "recall_distance": recall_distance,
255
+ "index_distance": index_distance,
256
+ "precision_distance_features": precision_distance_features,
257
+ "f1_distance_features": f1_distance_features,
258
+ "recall_distance_features": recall_distance_features,
259
+ "index_distance_features": index_distance_features,
260
+ "macro_precision_distance": macro_precision_distance,
261
+ "macro_recall_distance": macro_recall_distance,
262
+ "macro_f1_distance": macro_f1_distance,
263
+ "macro_index_distance": macro_index_distance,
264
  "matching_precision": matching_precision,
265
  "matching_recall": matching_recall,
266
  "matching_f1": matching_f1,
 
310
  coverages.append(coverage / n_calculation)
311
  cuc = np.trapz(coverages, n_samples) / len(n_samples) / max(n_samples)
312
  return coverages, cuc
313
+
314
+ def _compute_metric(self, x, y, metric: str = "mse", axis: int = -1):
315
+ if metric.lower() == "mse":
316
+ return np.mean((x - y) ** 2, axis=axis)
317
+ elif metric.lower() == "mae":
318
+ return np.mean(np.abs(x - y), axis=axis)
319
+ elif metric.lower() == "rmse":
320
+ return np.sqrt(self._compute_metric(x, y, metric="mse", axis=axis))
321
+ else:
322
+ raise ValueError("Unknown metric: {}".format(metric))