File size: 14,172 Bytes
cd18dd0
d85d83b
 
 
 
8f3e4ca
cd18dd0
851133a
cd18dd0
 
 
 
d85d83b
 
 
774aee4
d85d83b
 
3a2569c
 
 
 
 
 
 
 
 
 
fcc706c
3a2569c
fcc706c
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3a2569c
d85d83b
 
3a2569c
 
efa4c13
e391132
 
774aee4
fcc706c
 
 
 
 
 
 
 
 
d85d83b
 
 
efa4c13
d85d83b
774aee4
 
fcc706c
774aee4
fcc706c
 
 
 
 
 
 
 
 
 
 
 
efa4c13
d85d83b
fcc706c
efa4c13
d85d83b
 
efa4c13
d85d83b
fcc706c
efa4c13
d85d83b
fcc706c
efa4c13
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
---
title: matching_series
tags:
- evaluate
- metric
description: "Matching-based time-series generation metric"
sdk: gradio
sdk_version: 3.50
app_file: app.py
pinned: false
---

# Metric Card for matching_series

## Metric Description
Matching Series is a metric for evaluating time-series generation models. It is based on the idea of matching the generated time-series with the original time-series. The metric calculates the Mean Squared Error (distance) between the generated time-series and the original time-series between matched instances. The metric outputs a score greater or equal to 0, where 0 indicates a perfect generation.

## How to Use
At minium, the metric requires the original time-series and the generated time-series as input. The metric can be used to evaluate the performance of time-series generation models.

```python
>>> num_generation = 100
>>> num_reference = 10
>>> seq_len = 100
>>> num_features = 10
>>> references = np.random.rand(num_reference, seq_len, num_features)
>>> predictions = np.random.rand(num_generation, seq_len, num_features)
>>> metric = evaluate.load("bowdbeg/matching_series")
>>> results = metric.compute(references=references, predictions=predictions, batch_size=1000, return_all=True)
>>> print(results)
{'precision_distance': 0.1573285013437271, 'recall_distance': 0.15106813609600067, 'mean_distance': 0.1541983187198639, 'index_distance': 0.16858606040477753, 'matching_precision': 0.06, 'matching_recall': 1.0, 'matching_f1': 0.11320756503381972, 'cuc': 0.12428571428571429, 'macro_precision_distance': 0.13803552389144896, 'macro_recall_distance': 0.12179495096206665, 'macro_mean_distance': 0.1299152374267578, 'macro_index_distance': 0.16858604848384856, 'macro_matching_precision': 0.094, 'macro_matching_recall': 0.97, 'macro_matching_f1': 0.17132608782381706, 'macro_cuc': 0.11419285714285714, 'distance': array([[[0.20763363, 0.16514072, 0.18695284, ..., 0.15037987,
         0.19424284, 0.15943716],
        [0.17150438, 0.18020014, 0.17024504, ..., 0.18492931,
         0.18814348, 0.204207  ],
        [0.1769202 , 0.15609328, 0.17568389, ..., 0.17731658,
         0.2027854 , 0.13216409],
        ...,
        [0.1838122 , 0.19475608, 0.14176111, ..., 0.1635111 ,
         0.1652672 , 0.17145865],
        [0.16084194, 0.14208058, 0.17567575, ..., 0.15595785,
         0.16614595, 0.17834347],
        [0.16388315, 0.14126392, 0.18021484, ..., 0.16791071,
         0.18403953, 0.16666758]],

       [[0.16838932, 0.18878576, 0.17654441, ..., 0.1747057 ,
         0.16590554, 0.16901629],
        [0.16553226, 0.1882645 , 0.17863466, ..., 0.19269662,
         0.20451452, 0.19941731],
        [0.16502398, 0.16619626, 0.18069996, ..., 0.16124909,
         0.18933088, 0.1495165 ],
        ...,
        [0.15946846, 0.19988221, 0.17965002, ..., 0.12951666,
         0.2067793 , 0.13811146],
        [0.16227122, 0.17736743, 0.18641905, ..., 0.15038314,
         0.20186146, 0.17849396],
        [0.16410898, 0.18323919, 0.16945514, ..., 0.15783694,
         0.21556957, 0.17172968]],

       [[0.18094379, 0.1364854 , 0.18436092, ..., 0.187335  ,
         0.16240291, 0.13713893],
        [0.18005298, 0.15323727, 0.15788248, ..., 0.19451861,
         0.12822135, 0.14064161],
        [0.1564556 , 0.17312287, 0.1856657 , ..., 0.17237219,
         0.1596888 , 0.16547912],
        ...,
        [0.15611127, 0.16121496, 0.15533476, ..., 0.16520709,
         0.1427248 , 0.19455005],
        [0.17268528, 0.17360437, 0.15962966, ..., 0.18134868,
         0.15509704, 0.20222983],
        [0.18704675, 0.15934442, 0.14928888, ..., 0.18904984,
         0.16192877, 0.18576236]],

       ...,

       [[0.13717972, 0.15645625, 0.16123378, ..., 0.19453087,
         0.14441733, 0.1487963 ],
        [0.1454296 , 0.13368016, 0.18665504, ..., 0.16096605,
         0.15130125, 0.18332979],
        [0.14654924, 0.19097947, 0.19629759, ..., 0.15887487,
         0.19266474, 0.17430782],
        ...,
        [0.161704  , 0.16357127, 0.18512094, ..., 0.16441964,
         0.13961458, 0.17298506],
        [0.1366249 , 0.15852758, 0.1982772 , ..., 0.18822236,
         0.16153064, 0.19617072],
        [0.14570995, 0.15005183, 0.19667573, ..., 0.1856473 ,
         0.18603194, 0.19179863]],

       [[0.17813908, 0.176182  , 0.16847256, ..., 0.16903524,
         0.17150073, 0.15068175],
        [0.17632519, 0.1404587 , 0.16388708, ..., 0.16873878,
         0.15744762, 0.198475  ],
        [0.14986345, 0.1517829 , 0.17624639, ..., 0.18365957,
         0.17399347, 0.15581599],
        ...,
        [0.16128553, 0.1974935 , 0.13766351, ..., 0.14026196,
         0.15450196, 0.16110381],
        [0.16281141, 0.14699166, 0.16935429, ..., 0.1394466 ,
         0.1717883 , 0.16191883],
        [0.14886455, 0.1603608 , 0.15172943, ..., 0.12851712,
         0.19859877, 0.15576601]],

       [[0.20230632, 0.19680001, 0.17143433, ..., 0.18601838,
         0.15998998, 0.16043548],
        [0.19753966, 0.19073424, 0.15046756, ..., 0.18833323,
         0.16755773, 0.20127842],
        [0.16012056, 0.16638812, 0.16493171, ..., 0.15849902,
         0.20269662, 0.1857642 ],
        ...,
        [0.16341361, 0.19168772, 0.16597596, ..., 0.15715535,
         0.18122095, 0.17266828],
        [0.1570099 , 0.18294124, 0.16713732, ..., 0.17442709,
         0.17020254, 0.18804537],
        [0.16752282, 0.1295177 , 0.18792175, ..., 0.13976808,
         0.21054329, 0.18118018]]], dtype=float32), 'match': array([4, 7, 3, 9, 4, 0, 7, 5, 4, 7, 9, 7, 7, 5, 7, 0, 0, 7, 4, 3, 3, 2,
       8, 9, 4, 4, 5, 1, 4, 9, 0, 2, 7, 3, 6, 5, 6, 3, 2, 2, 2, 6, 9, 4,
       4, 9, 1, 6, 0, 6, 9, 2, 0, 6, 7, 2, 0, 4, 5, 2, 3, 9, 2, 3, 9, 1,
       6, 4, 8, 9, 7, 4, 6, 5, 5, 6, 9, 5, 6, 2, 9, 4, 9, 3, 2, 9, 9, 7,
       9, 5, 9, 1, 7, 6, 4, 4, 5, 4, 7, 5]), 'match_inv': array([15, 91, 79,  4,  4,  4, 49,  4, 49, 45]), 'coverages': [0.10000000000000002, 0.16666666666666666, 0.3666666666666667, 0.6333333333333333, 0.8333333333333334, 0.9, 1.0], 'precision_distance_features': [0.1383965164422989, 0.13804036378860474, 0.1388234943151474, 0.1392393559217453, 0.1357768476009369, 0.1364508718252182, 0.14039862155914307, 0.13417008519172668, 0.1368638128042221, 0.14219526946544647], 'recall_distance_features': [0.11730053275823593, 0.12232911586761475, 0.12200610339641571, 0.12571024894714355, 0.12081331014633179, 0.11693283170461655, 0.12660981714725494, 0.12248671054840088, 0.11726576089859009, 0.12649507820606232], 'mean_distance_features': [0.1278485246002674, 0.13018473982810974, 0.13041479885578156, 0.13247480243444443, 0.12829507887363434, 0.12669185176491737, 0.133504219353199, 0.12832839787006378, 0.1270647868514061, 0.1343451738357544], 'index_distance_features': [0.17064405977725983, 0.17019756138324738, 0.17373089492321014, 0.17575454711914062, 0.15942324697971344, 0.1615942418575287, 0.16519878804683685, 0.1714271903038025, 0.17072594165802002, 0.16716401278972626], 'matching_precision_features': [0.1, 0.09, 0.1, 0.1, 0.09, 0.09, 0.1, 0.08, 0.09, 0.1], 'matching_recall_features': [1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.9, 0.9, 0.9, 1.0], 'matching_f1_features': [0.18181819851239656, 0.16513763164885095, 0.18181819851239656, 0.18181819851239656, 0.16513763164885095, 0.16513763164885095, 0.18000001639999985, 0.14693879251145342, 0.16363638033057834, 0.18181819851239656], 'cuc_features': [0.11935714285714286, 0.11578571428571431, 0.11814285714285715, 0.12407142857142857, 0.11207142857142856, 0.11821428571428572, 0.10807142857142855, 0.09635714285714285, 0.10700000000000001, 0.12285714285714286], 'coverages_features': [[0.10000000000000002, 0.20000000000000004, 0.26666666666666666, 0.4666666666666666, 0.7666666666666666, 0.8666666666666667, 1.0], [0.10000000000000002, 0.20000000000000004, 0.3666666666666667, 0.5666666666666668, 0.6, 0.8333333333333334, 1.0], [0.10000000000000002, 0.16666666666666666, 0.26666666666666666, 0.4666666666666666, 0.6999999999999998, 0.8666666666666667, 1.0], [0.10000000000000002, 0.20000000000000004, 0.3, 0.6, 0.7333333333333333, 0.9333333333333332, 1.0], [0.10000000000000002, 0.20000000000000004, 0.3, 0.5, 0.6666666666666666, 0.7666666666666666, 1.0], [0.10000000000000002, 0.20000000000000004, 0.3333333333333333, 0.5333333333333333, 0.7666666666666666, 0.8333333333333334, 1.0], [0.10000000000000002, 0.20000000000000004, 0.3, 0.5333333333333333, 0.6999999999999998, 0.7666666666666666, 0.9], [0.10000000000000002, 0.20000000000000004, 0.2333333333333333, 0.4666666666666666, 0.5333333333333333, 0.6333333333333333, 0.9], [0.10000000000000002, 0.16666666666666666, 0.26666666666666666, 0.4666666666666666, 0.5666666666666667, 0.8000000000000002, 0.9], [0.10000000000000002, 0.16666666666666666, 0.30000000000000004, 0.5666666666666667, 0.7999999999999999, 0.9, 1.0]]}
```

### Inputs
- **predictions**: (list of list of list of float or numpy.ndarray): The generated time-series. The shape of the array should be `(num_generation, seq_len, num_features)`.
- **references**: (list of list of list of float or numpy.ndarray): The original time-series. The shape of the array should be `(num_reference, seq_len, num_features)`.
- **batch_size**: (int, optional): The batch size for computing the metric. This affects quadratically. Default is None.
- **cuc_n_calculation**: (int, optional): The number of samples to compute the coverage because sampling exists. Default is 3.
- **cuc_n_samples**: (list of int, optional): The number of samples to compute the coverage. Default is $[2^i \text{for} i \leq \log_2 n] + [n]$.
- **metric**: (str, optional): The metric to measure distance between examples. Default is "mse". Available options are "mse", "mae", "rmse".
- **num_processes**: (int, optional): The number of processes to use for computing the distance. Default is 1.
- **instance_normalization**: (bool, optional): Whether to normalize the instances along the time axis. Default is False.
- **return_distance**: (bool, optional): Whether to return the distance matrix. Default is False.
- **return_matching**: (bool, optional): Whether to return the matching matrix. Default is False.
- **return_each_features**: (bool, optional): Whether to return the results for each feature. Default is False.
- **return_coverages**: (bool, optional): Whether to return the coverages. Default is False.
- **return_all**: (bool, optional): Whether to return all the results. Default is False.
- **dtype**: (str, optional): The data type used for computation. Default is "float32".
- **eps**: (float, optional): The epsilon value to avoid division by zero. Default is 1e-8.

### Output Values

Let prediction instances be $P = \{p_1, p_2, \ldots, p_n\}$ and reference instances be $R = \{r_1, r_2, \ldots, r_m\}$.

- **precision_distance**: (float): Average of the distance between the generated instance and the reference instance with the lowest distance. Intuitively, this is similar to precision in classification. In the equation, $\frac{1}{n} \sum_{i=1}^{n} \min_{j} \mathrm{distance}(p_i, r_j)$.
- **recall_distance**: (float): Average of the distance between the reference instance and the  with the lowest distance. Intuitively, this is similar to recall in classification. In the equation, $\frac{1}{m} \sum_{j=1}^{m} \min_{i} \mathrm{distance}(p_i, r_j)$.
- **mean_disntance**: (float): Average of the precision_distance and recall_distance.
- **index_distance**: (float): Average of the distance between the generated instance and the reference instance with the same index. In the equation, $\frac{1}{n} \sum_{i=1}^{n} \mathrm{distance}(p_i, r_i)$.
- **matching_precision**: (float): Precision of the matching instances, which means how predictions are covered by references, i.e., how accurate the predictions are. In the equation, $\frac{ | \{i | \argmin_{i} \mathrm{distance}(p_i, r_j)\} | }{n}$.
- **matching_recall**: (float): Recall of the matching instances, which means how predictions cover references. In the equation, $\frac{ | \{j | \argmin_{j} \mathrm{distance}(p_i, r_j)\} | }{m}$.
- **matching_f1**: (float): F1-score of the matching instances, harmonic mean of the matching_precision and matching_recall.
- **coverages**: (list of float): Coverage of the matching instances computed on the sampled generated data in cuc_n_samples. In the equation, $[\frac{1}{m} | \{ j \mid \argmin_{j} \mathrm{distance}(p_i, r_j)~\text{where $p_i \in \mathrm{sample}(P, \mathrm{n\_sample})$} \}  | ~\text{for}~\mathrm{n\_sample} \in \mathrm{cuc\_n\_samples} ]$.
- **cuc**: (float): Under the curve of the coverage. In the equation, $\int_{0}^{n} \mathrm{coverage}(x) dx$. As an approximation, the trapezoidal rule is used.
- **.\*_features**: (list of float): The values computed individually for each feature.
- **macro_.\***: (float): Averaged values computed for each feature, average of the \*\_features.
- **distance**: (numpy.ndarray): The distance matrix between the generated instances and the reference instances.
- **match**: (numpy.ndarray): The matching matrix between the generated instances and the reference instances.
- **match_inv**: (numpy.ndarray): The matching matrix between the reference instances and the generated instances.

<!-- #### Values from Popular Papers -->
<!-- *Give examples, preferrably with links to leaderboards or publications, to papers that have reported this metric, along with the values they have reported.* -->

<!-- ### Examples -->
<!-- *Give code examples of the metric being used. Try to include examples that clear up any potential ambiguity left from the metric description above. If possible, provide a range of examples that show both typical and atypical results, as well as examples where a variety of input parameters are passed.* -->

## Limitations and Bias
This metric is based on the assumption that the generated time-series should match the original time-series. This may not be the case in some scenarios. The metric may not be suitable for evaluating time-series generation models that are not required to match the original time-series.

<!-- ## Citation -->
<!-- *Cite the source where this metric was introduced.* -->

<!-- ## Further References -->
<!-- *Add any useful further references.* -->