Spaces:

bowdbeg
/

matching_series

Sleeping

App Files Files Community

bowdbeg commited on Jun 20

Commit

774aee4

•

1 Parent(s): b81d9c7

add other metrics

Browse files

Files changed (2) hide show

README.md +19 -18
matching_series.py +59 -44

README.md CHANGED Viewed

@@ -13,7 +13,7 @@ pinned: false
 # Metric Card for matching_series
 ## Metric Description
-Matching Series is a metric for evaluating time-series generation models. It is based on the idea of matching the generated time-series with the original time-series. The metric calculates the Mean Squared Error (MSE) between the generated time-series and the original time-series between matched instances. The metric outputs a score greater or equal to 0, where 0 indicates a perfect generation.
 ## How to Use
 At minium, the metric requires the original time-series and the generated time-series as input. The metric can be used to evaluate the performance of time-series generation models.
@@ -28,7 +28,7 @@ At minium, the metric requires the original time-series and the generated time-s
 >>> metric = evaluate.load("bowdbeg/matching_series")
 >>> results = metric.compute(references=references, predictions=predictions, batch_size=1000)
 >>> print(results)
-{'precision_mse': 0.15843592698313289, 'f1_mse': 0.155065974239652, 'recall_mse': 0.1518363944110798, 'index_mse': 0.17040952035850207, 'precision_mse_features': [0.13823438020409948, 0.13795530908046955, 0.13737011148651265, 0.14067189082974238, 0.1364122789352347, 0.1436081670647643, 0.14458237409706912, 0.13806270434163667, 0.1409687410230486, 0.14361925950728213], 'f1_mse_features': [0.1296088638995658, 0.1321776706161825, 0.13029775314091577, 0.13175439826605778, 0.12737279060587542, 0.1356699896603108, 0.13397234988746393, 0.12775081706715302, 0.1315612879575721, 0.13479662354178928], 'recall_mse_features': [0.12199655178880468, 0.12686452003437784, 0.12391796468320122, 0.12390010513296679, 0.11945686853897312, 0.12856343456552471, 0.12481307474748718, 0.11887226171295895, 0.12333088520535256, 0.1269952147807759], 'index_mse_features': [0.1675969516703118, 0.1670366499114896, 0.1671737398882021, 0.17176917018356727, 0.1648541323369367, 0.1719173137987784, 0.1718364937170575, 0.16298119493341198, 0.17348958360035996, 0.18543997354490532], 'macro_precision_mse': 0.14014852165698596, 'macro_recall_mse': 0.1238710881190423, 'macro_f1_mse': 0.13149625446428864, 'macro_index_mse': 0.17040952035850207, 'matching_precision': 0.1, 'matching_recall': 1.0, 'matching_f1': 0.18181818181818182, 'matching_precision_features': [0.9, 0.9, 0.8, 0.9, 0.9, 0.9, 1.0, 0.8, 1.0, 1.0], 'matching_recall_features': [0.1, 0.09, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1], 'matching_f1_features': [0.18, 0.16363636363636364, 0.17777777777777778, 0.18, 0.18, 0.18, 0.18181818181818182, 0.17777777777777778, 0.18181818181818182, 0.18181818181818182], 'macro_matching_precision': 0.91, 'macro_matching_recall': 0.099, 'macro_matching_f1': 0.17846464646464646, 'cuc': 0.12364285714285712, 'coverages': [0.10000000000000002, 0.20000000000000004, 0.3333333333333333, 0.4666666666666666, 0.7666666666666666, 0.9333333333333332, 1.0], 'macro_cuc': 0.12047857142857143, 'macro_coverages': [0.10000000000000002, 0.19000000000000003, 0.32666666666666666, 0.51, 0.72, 0.8966666666666667, 0.99], 'cuc_features': [0.1175, 0.11607142857142858, 0.12214285714285712, 0.12507142857142856, 0.1202142857142857, 0.11735714285714285, 0.12042857142857144, 0.12028571428571429, 0.12864285714285717, 0.11707142857142858], 'coverages_features': [[0.10000000000000002, 0.20000000000000004, 0.3, 0.43333333333333335, 0.6666666666666666, 0.8666666666666667, 1.0], [0.10000000000000002, 0.20000000000000004, 0.3666666666666667, 0.5666666666666667, 0.6666666666666666, 0.9, 0.9], [0.10000000000000002, 0.16666666666666666, 0.3333333333333333, 0.5, 0.6666666666666666, 0.9333333333333332, 1.0], [0.10000000000000002, 0.20000000000000004, 0.3333333333333333, 0.5666666666666667, 0.7999999999999999, 0.9333333333333332, 1.0], [0.10000000000000002, 0.20000000000000004, 0.3333333333333333, 0.43333333333333335, 0.6999999999999998, 0.9, 1.0], [0.10000000000000002, 0.20000000000000004, 0.26666666666666666, 0.43333333333333335, 0.6666666666666666, 0.8666666666666667, 1.0], [0.10000000000000002, 0.16666666666666666, 0.4000000000000001, 0.6, 0.7333333333333334, 0.8666666666666667, 1.0], [0.10000000000000002, 0.16666666666666666, 0.3, 0.5666666666666667, 0.7666666666666666, 0.8666666666666667, 1.0], [0.10000000000000002, 0.20000000000000004, 0.3, 0.5333333333333333, 0.8000000000000002, 1.0, 1.0], [0.10000000000000002, 0.20000000000000004, 0.3333333333333333, 0.4666666666666666, 0.7333333333333334, 0.8333333333333334, 1.0]]}
 ```
 ### Inputs
@@ -37,25 +37,26 @@ At minium, the metric requires the original time-series and the generated time-s
 - **batch_size**: (int, optional): The batch size for computing the metric. This affects quadratically. Default is None.
 - **cuc_n_calculation**: (int, optional): The number of samples to compute the coverage because sampling exists. Default is 3.
 - **cuc_n_samples**: (list of int, optional): The number of samples to compute the coverage. Default is $[2^i \text{for} i \leq \log_2 n] + [n]$.
 ### Output Values
 Let prediction instances be $P = \{p_1, p_2, \ldots, p_n\}$ and reference instances be $R = \{r_1, r_2, \ldots, r_m\}$.
-- **precision_mse**: (float): Average of the MSE between the generated instance and the reference instance with the lowest MSE. Intuitively, this is similar to precision in classification. In the equation, $\frac{1}{n} \sum_{i=1}^{n} \min_{j} \mathrm{MSE}(p_i, r_j)$.
-- **recall_mse**: (float): Average of the MSE between the reference instance and the  with the lowest MSE. Intuitively, this is similar to recall in classification. In the equation, $\frac{1}{m} \sum_{j=1}^{m} \min_{i} \mathrm{MSE}(p_i, r_j)$.
-- **f1_mse**: (float): Harmonic mean of the precision_mse and recall_mse. This is similar to F1-score in classification.
-- **index_mse**: (float): Average of the MSE between the generated instance and the reference instance with the same index. In the equation, $\frac{1}{n} \sum_{i=1}^{n} \mathrm{MSE}(p_i, r_i)$.
-- **precision_mse_features**: (list of float): precision_mse computed individually for each feature.
-- **recall_mse_features**: (list of float): recall_mse computed individually for each feature.
-- **f1_mse_features**: (list of float): f1_mse computed individually for each feature.
-- **index_mse_features**: (list of float): index_mse computed individually for each feature.
-- **macro_precision_mse**: (float): Average of the precision_mse_features.
-- **macro_recall_mse**: (float): Average of the recall_mse_features.
-- **macro_f1_mse**: (float): Average of the f1_mse_features.
-- **macro_index_mse**: (float): Average of the index_mse_features.
-- **matching_precision**: (float): Precision of the matching instances. In the equation, $\frac{ | \{i | \min_{i} \mathrm{MSE}(p_i, r_j)\} | }{m}$.
-- **matching_recall**: (float): Recall of the matching instances. In the equation, $\frac{ | \{j | \min_{j} \mathrm{MSE}(p_i, r_j)\} | }{n}$.
 - **matching_f1**: (float): F1-score of the matching instances.
 - **matching_precision_features**: (list of float): matching_precision computed individually for each feature.
 - **matching_recall_features**: (list of float): matching_recall computed individually for each feature.
@@ -63,8 +64,8 @@ Let prediction instances be $P = \{p_1, p_2, \ldots, p_n\}$ and reference instan
 - **macro_matching_precision**: (float): Average of the matching_precision_features.
 - **macro_matching_recall**: (float): Average of the matching_recall_features.
 - **macro_matching_f1**: (float): Average of the matching_f1_features.
-- **coverages**: (list of float): Coverage of the matching instances computed on the sampled generated data in cuc_n_samples. In the equation, $[\frac{ | \{ j | \min_{j} \mathrm{MSE}(p_i, r_j) \text{where}~p_i \in \mathrm{sample}(P, \mathrm{n\_sample}) \}  | }{m} \text{for}~\mathrm{n\_sample} \in \mathrm{cuc\_n\_samples} ]$.
-- **cuc**: (float): Coverage of the matching instances. In the equation, $\frac{ | \{i | \min_{i} \mathrm{MSE}(p_i, r_j) < \mathrm{threshold}\} | }{n}$.
 - **coverages_features**: (list of list of float): coverages computed individually for each feature.
 - **cuc_features**: (list of float): cuc computed individually for each feature.
 - **macro_coverages**: (list of float): Average of the coverages_features.

 # Metric Card for matching_series
 ## Metric Description
+Matching Series is a metric for evaluating time-series generation models. It is based on the idea of matching the generated time-series with the original time-series. The metric calculates the Mean Squared Error (distance) between the generated time-series and the original time-series between matched instances. The metric outputs a score greater or equal to 0, where 0 indicates a perfect generation.
 ## How to Use
 At minium, the metric requires the original time-series and the generated time-series as input. The metric can be used to evaluate the performance of time-series generation models.
 >>> metric = evaluate.load("bowdbeg/matching_series")
 >>> results = metric.compute(references=references, predictions=predictions, batch_size=1000)
 >>> print(results)
+{'precision_distance': 0.15843592698313289, 'f1_distance': 0.155065974239652, 'recall_distance': 0.1518363944110798, 'index_distance': 0.17040952035850207, 'precision_distance_features': [0.13823438020409948, 0.13795530908046955, 0.13737011148651265, 0.14067189082974238, 0.1364122789352347, 0.1436081670647643, 0.14458237409706912, 0.13806270434163667, 0.1409687410230486, 0.14361925950728213], 'f1_distance_features': [0.1296088638995658, 0.1321776706161825, 0.13029775314091577, 0.13175439826605778, 0.12737279060587542, 0.1356699896603108, 0.13397234988746393, 0.12775081706715302, 0.1315612879575721, 0.13479662354178928], 'recall_distance_features': [0.12199655178880468, 0.12686452003437784, 0.12391796468320122, 0.12390010513296679, 0.11945686853897312, 0.12856343456552471, 0.12481307474748718, 0.11887226171295895, 0.12333088520535256, 0.1269952147807759], 'index_distance_features': [0.1675969516703118, 0.1670366499114896, 0.1671737398882021, 0.17176917018356727, 0.1648541323369367, 0.1719173137987784, 0.1718364937170575, 0.16298119493341198, 0.17348958360035996, 0.18543997354490532], 'macro_precision_distance': 0.14014852165698596, 'macro_recall_distance': 0.1238710881190423, 'macro_f1_distance': 0.13149625446428864, 'macro_index_distance': 0.17040952035850207, 'matching_precision': 0.1, 'matching_recall': 1.0, 'matching_f1': 0.18181818181818182, 'matching_precision_features': [0.9, 0.9, 0.8, 0.9, 0.9, 0.9, 1.0, 0.8, 1.0, 1.0], 'matching_recall_features': [0.1, 0.09, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1], 'matching_f1_features': [0.18, 0.16363636363636364, 0.17777777777777778, 0.18, 0.18, 0.18, 0.18181818181818182, 0.17777777777777778, 0.18181818181818182, 0.18181818181818182], 'macro_matching_precision': 0.91, 'macro_matching_recall': 0.099, 'macro_matching_f1': 0.17846464646464646, 'cuc': 0.12364285714285712, 'coverages': [0.10000000000000002, 0.20000000000000004, 0.3333333333333333, 0.4666666666666666, 0.7666666666666666, 0.9333333333333332, 1.0], 'macro_cuc': 0.12047857142857143, 'macro_coverages': [0.10000000000000002, 0.19000000000000003, 0.32666666666666666, 0.51, 0.72, 0.8966666666666667, 0.99], 'cuc_features': [0.1175, 0.11607142857142858, 0.12214285714285712, 0.12507142857142856, 0.1202142857142857, 0.11735714285714285, 0.12042857142857144, 0.12028571428571429, 0.12864285714285717, 0.11707142857142858], 'coverages_features': [[0.10000000000000002, 0.20000000000000004, 0.3, 0.43333333333333335, 0.6666666666666666, 0.8666666666666667, 1.0], [0.10000000000000002, 0.20000000000000004, 0.3666666666666667, 0.5666666666666667, 0.6666666666666666, 0.9, 0.9], [0.10000000000000002, 0.16666666666666666, 0.3333333333333333, 0.5, 0.6666666666666666, 0.9333333333333332, 1.0], [0.10000000000000002, 0.20000000000000004, 0.3333333333333333, 0.5666666666666667, 0.7999999999999999, 0.9333333333333332, 1.0], [0.10000000000000002, 0.20000000000000004, 0.3333333333333333, 0.43333333333333335, 0.6999999999999998, 0.9, 1.0], [0.10000000000000002, 0.20000000000000004, 0.26666666666666666, 0.43333333333333335, 0.6666666666666666, 0.8666666666666667, 1.0], [0.10000000000000002, 0.16666666666666666, 0.4000000000000001, 0.6, 0.7333333333333334, 0.8666666666666667, 1.0], [0.10000000000000002, 0.16666666666666666, 0.3, 0.5666666666666667, 0.7666666666666666, 0.8666666666666667, 1.0], [0.10000000000000002, 0.20000000000000004, 0.3, 0.5333333333333333, 0.8000000000000002, 1.0, 1.0], [0.10000000000000002, 0.20000000000000004, 0.3333333333333333, 0.4666666666666666, 0.7333333333333334, 0.8333333333333334, 1.0]]}
 ```
 ### Inputs
 - **batch_size**: (int, optional): The batch size for computing the metric. This affects quadratically. Default is None.
 - **cuc_n_calculation**: (int, optional): The number of samples to compute the coverage because sampling exists. Default is 3.
 - **cuc_n_samples**: (list of int, optional): The number of samples to compute the coverage. Default is $[2^i \text{for} i \leq \log_2 n] + [n]$.
+- **metric**: (str, optional): The metric to measure distance between examples. Default is "mse". Available options are "mse", "mae", "rmse".
 ### Output Values
 Let prediction instances be $P = \{p_1, p_2, \ldots, p_n\}$ and reference instances be $R = \{r_1, r_2, \ldots, r_m\}$.
+- **precision_distance**: (float): Average of the distance between the generated instance and the reference instance with the lowest distance. Intuitively, this is similar to precision in classification. In the equation, $\frac{1}{n} \sum_{i=1}^{n} \min_{j} \mathrm{distance}(p_i, r_j)$.
+- **recall_distance**: (float): Average of the distance between the reference instance and the  with the lowest distance. Intuitively, this is similar to recall in classification. In the equation, $\frac{1}{m} \sum_{j=1}^{m} \min_{i} \mathrm{distance}(p_i, r_j)$.
+- **f1_distance**: (float): Harmonic mean of the precision_distance and recall_distance. This is similar to F1-score in classification.
+- **index_distance**: (float): Average of the distance between the generated instance and the reference instance with the same index. In the equation, $\frac{1}{n} \sum_{i=1}^{n} \mathrm{distance}(p_i, r_i)$.
+- **precision_distance_features**: (list of float): precision_distance computed individually for each feature.
+- **recall_distance_features**: (list of float): recall_distance computed individually for each feature.
+- **f1_distance_features**: (list of float): f1_distance computed individually for each feature.
+- **index_distance_features**: (list of float): index_distance computed individually for each feature.
+- **macro_precision_distance**: (float): Average of the precision_distance_features.
+- **macro_recall_distance**: (float): Average of the recall_distance_features.
+- **macro_f1_distance**: (float): Average of the f1_distance_features.
+- **macro_index_distance**: (float): Average of the index_distance_features.
+- **matching_precision**: (float): Precision of the matching instances. In the equation, $\frac{ | \{i | \min_{i} \mathrm{distance}(p_i, r_j)\} | }{m}$.
+- **matching_recall**: (float): Recall of the matching instances. In the equation, $\frac{ | \{j | \min_{j} \mathrm{distance}(p_i, r_j)\} | }{n}$.
 - **matching_f1**: (float): F1-score of the matching instances.
 - **matching_precision_features**: (list of float): matching_precision computed individually for each feature.
 - **matching_recall_features**: (list of float): matching_recall computed individually for each feature.
 - **macro_matching_precision**: (float): Average of the matching_precision_features.
 - **macro_matching_recall**: (float): Average of the matching_recall_features.
 - **macro_matching_f1**: (float): Average of the matching_f1_features.
+- **coverages**: (list of float): Coverage of the matching instances computed on the sampled generated data in cuc_n_samples. In the equation, $[\frac{ | \{ j | \min_{j} \mathrm{distance}(p_i, r_j) \text{where}~p_i \in \mathrm{sample}(P, \mathrm{n\_sample}) \}  | }{m} \text{for}~\mathrm{n\_sample} \in \mathrm{cuc\_n\_samples} ]$.
+- **cuc**: (float): Coverage of the matching instances. In the equation, $\frac{ | \{i | \min_{i} \mathrm{distance}(p_i, r_j) < \mathrm{threshold}\} | }{n}$.
 - **coverages_features**: (list of list of float): coverages computed individually for each feature.
 - **cuc_features**: (list of float): cuc computed individually for each feature.
 - **macro_coverages**: (list of float): Average of the coverages_features.

matching_series.py CHANGED Viewed

@@ -133,6 +133,7 @@ class matching_series(evaluate.Metric):
         batch_size: Optional[int] = None,
         cuc_n_calculation: int = 3,
         cuc_n_samples: Union[List[int], str] = "auto",
     ):
         """
         Compute the scores of the module given the predictions and references
@@ -157,37 +158,41 @@ class matching_series(evaluate.Metric):
         # at first, convert the inputs to numpy arrays
-        # MSE between predictions and references for all example combinations for each features
         # shape: (num_generation, num_reference, num_features)
         if batch_size is not None:
-            mse = np.zeros((len(predictions), len(references), predictions.shape[-1]))
             # iterate over the predictions and references in batches
             for i in range(0, len(predictions) + batch_size, batch_size):
                 for j in range(0, len(references) + batch_size, batch_size):
-                    mse[i : i + batch_size, j : j + batch_size] = np.mean(
-                        (predictions[i : i + batch_size, None] - references[None, j : j + batch_size]) ** 2, axis=-2
                     )
         else:
-            mse = np.mean((predictions[:, None] - references) ** 2, axis=1)
-        index_mse = mse.diagonal(axis1=0, axis2=1).mean()
         # matching scores
-        mse_mean = mse.mean(axis=-1)
         # best match for each generated time series
         # shape: (num_generation,)
-        best_match = np.argmin(mse_mean, axis=-1)
-        # matching mse
         # shape: (num_generation,)
-        precision_mse = mse_mean[np.arange(len(best_match)), best_match].mean()
         # best match for each reference time series
         # shape: (num_reference,)
-        best_match_inv = np.argmin(mse_mean, axis=0)
-        recall_mse = mse_mean[best_match_inv, np.arange(len(best_match_inv))].mean()
-        f1_mse = 2 / (1 / precision_mse + 1 / recall_mse)
         # matching precision, recall and f1
         matching_recall = np.unique(best_match).size / len(best_match_inv)
@@ -195,27 +200,27 @@ class matching_series(evaluate.Metric):
         matching_f1 = 2 / (1 / matching_precision + 1 / matching_recall)
         # take matching for each feature and compute metrics for them
-        precision_mse_features = []
-        recall_mse_features = []
-        f1_mse_features = []
         matching_precision_features = []
         matching_recall_features = []
         matching_f1_features = []
-        index_mse_features = []
         coverages_features = []
         cuc_features = []
         for f in range(predictions.shape[-1]):
-            mse_f = mse[:, :, f]
-            index_mse_f = mse_f.diagonal(axis1=0, axis2=1).mean()
-            best_match_f = np.argmin(mse_f, axis=-1)
-            precision_mse_f = mse_f[np.arange(len(best_match_f)), best_match_f].mean()
-            best_match_inv_f = np.argmin(mse_f, axis=0)
-            recall_mse_f = mse_f[best_match_inv_f, np.arange(len(best_match_inv_f))].mean()
-            f1_mse_f = 2 / (1 / precision_mse_f + 1 / recall_mse_f)
-            precision_mse_features.append(precision_mse_f)
-            recall_mse_features.append(recall_mse_f)
-            f1_mse_features.append(f1_mse_f)
-            index_mse_features.append(index_mse_f)
             matching_recall_f = np.unique(best_match_f).size / len(best_match_f)
             matching_precision_f = np.unique(best_match_inv_f).size / len(best_match_inv_f)
@@ -228,10 +233,10 @@ class matching_series(evaluate.Metric):
             coverages_features.append(coverages_f)
             cuc_features.append(cuc_f)
-        macro_precision_mse = statistics.mean(precision_mse_features)
-        macro_recall_mse = statistics.mean(recall_mse_features)
-        macro_f1_mse = statistics.mean(f1_mse_features)
-        macro_index_mse = statistics.mean(index_mse_features)
         macro_matching_precision = statistics.mean(matching_precision_features)
         macro_matching_recall = statistics.mean(matching_recall_features)
@@ -244,18 +249,18 @@ class matching_series(evaluate.Metric):
         macro_coverages = [statistics.mean(c) for c in zip(*coverages_features)]
         return {
-            "precision_mse": precision_mse,
-            "f1_mse": f1_mse,
-            "recall_mse": recall_mse,
-            "index_mse": index_mse,
-            "precision_mse_features": precision_mse_features,
-            "f1_mse_features": f1_mse_features,
-            "recall_mse_features": recall_mse_features,
-            "index_mse_features": index_mse_features,
-            "macro_precision_mse": macro_precision_mse,
-            "macro_recall_mse": macro_recall_mse,
-            "macro_f1_mse": macro_f1_mse,
-            "macro_index_mse": macro_index_mse,
             "matching_precision": matching_precision,
             "matching_recall": matching_recall,
             "matching_f1": matching_f1,
@@ -305,3 +310,13 @@ class matching_series(evaluate.Metric):
             coverages.append(coverage / n_calculation)
         cuc = np.trapz(coverages, n_samples) / len(n_samples) / max(n_samples)
         return coverages, cuc

         batch_size: Optional[int] = None,
         cuc_n_calculation: int = 3,
         cuc_n_samples: Union[List[int], str] = "auto",
+        metric: str = "mse",
     ):
         """
         Compute the scores of the module given the predictions and references
         # at first, convert the inputs to numpy arrays
+        # distance between predictions and references for all example combinations for each features
         # shape: (num_generation, num_reference, num_features)
         if batch_size is not None:
+            distance = np.zeros((len(predictions), len(references), predictions.shape[-1]))
             # iterate over the predictions and references in batches
             for i in range(0, len(predictions) + batch_size, batch_size):
                 for j in range(0, len(references) + batch_size, batch_size):
+                    d = self._compute_metric(
+                        predictions[i : i + batch_size, None],
+                        references[None, j : j + batch_size],
+                        metric=metric,
+                        axis=-2,
                     )
+                    distance[i : i + batch_size, j : j + batch_size] = d
         else:
+            distance = self._compute_metric(predictions[:, None], references, metric=metric, axis=1)
+        index_distance = distance.diagonal(axis1=0, axis2=1).mean()
         # matching scores
+        distance_mean = distance.mean(axis=-1)
         # best match for each generated time series
         # shape: (num_generation,)
+        best_match = np.argmin(distance_mean, axis=-1)
+        # matching distance
         # shape: (num_generation,)
+        precision_distance = distance_mean[np.arange(len(best_match)), best_match].mean()
         # best match for each reference time series
         # shape: (num_reference,)
+        best_match_inv = np.argmin(distance_mean, axis=0)
+        recall_distance = distance_mean[best_match_inv, np.arange(len(best_match_inv))].mean()
+        f1_distance = 2 / (1 / precision_distance + 1 / recall_distance)
         # matching precision, recall and f1
         matching_recall = np.unique(best_match).size / len(best_match_inv)
         matching_f1 = 2 / (1 / matching_precision + 1 / matching_recall)
         # take matching for each feature and compute metrics for them
+        precision_distance_features = []
+        recall_distance_features = []
+        f1_distance_features = []
         matching_precision_features = []
         matching_recall_features = []
         matching_f1_features = []
+        index_distance_features = []
         coverages_features = []
         cuc_features = []
         for f in range(predictions.shape[-1]):
+            distance_f = distance[:, :, f]
+            index_distance_f = distance_f.diagonal(axis1=0, axis2=1).mean()
+            best_match_f = np.argmin(distance_f, axis=-1)
+            precision_distance_f = distance_f[np.arange(len(best_match_f)), best_match_f].mean()
+            best_match_inv_f = np.argmin(distance_f, axis=0)
+            recall_distance_f = distance_f[best_match_inv_f, np.arange(len(best_match_inv_f))].mean()
+            f1_distance_f = 2 / (1 / precision_distance_f + 1 / recall_distance_f)
+            precision_distance_features.append(precision_distance_f)
+            recall_distance_features.append(recall_distance_f)
+            f1_distance_features.append(f1_distance_f)
+            index_distance_features.append(index_distance_f)
             matching_recall_f = np.unique(best_match_f).size / len(best_match_f)
             matching_precision_f = np.unique(best_match_inv_f).size / len(best_match_inv_f)
             coverages_features.append(coverages_f)
             cuc_features.append(cuc_f)
+        macro_precision_distance = statistics.mean(precision_distance_features)
+        macro_recall_distance = statistics.mean(recall_distance_features)
+        macro_f1_distance = statistics.mean(f1_distance_features)
+        macro_index_distance = statistics.mean(index_distance_features)
         macro_matching_precision = statistics.mean(matching_precision_features)
         macro_matching_recall = statistics.mean(matching_recall_features)
         macro_coverages = [statistics.mean(c) for c in zip(*coverages_features)]
         return {
+            "precision_distance": precision_distance,
+            "f1_distance": f1_distance,
+            "recall_distance": recall_distance,
+            "index_distance": index_distance,
+            "precision_distance_features": precision_distance_features,
+            "f1_distance_features": f1_distance_features,
+            "recall_distance_features": recall_distance_features,
+            "index_distance_features": index_distance_features,
+            "macro_precision_distance": macro_precision_distance,
+            "macro_recall_distance": macro_recall_distance,
+            "macro_f1_distance": macro_f1_distance,
+            "macro_index_distance": macro_index_distance,
             "matching_precision": matching_precision,
             "matching_recall": matching_recall,
             "matching_f1": matching_f1,
             coverages.append(coverage / n_calculation)
         cuc = np.trapz(coverages, n_samples) / len(n_samples) / max(n_samples)
         return coverages, cuc
+    def _compute_metric(self, x, y, metric: str = "mse", axis: int = -1):
+        if metric.lower() == "mse":
+            return np.mean((x - y) ** 2, axis=axis)
+        elif metric.lower() == "mae":
+            return np.mean(np.abs(x - y), axis=axis)
+        elif metric.lower() == "rmse":
+            return np.sqrt(self._compute_metric(x, y, metric="mse", axis=axis))
+        else:
+            raise ValueError("Unknown metric: {}".format(metric))