Spaces:
Sleeping
Sleeping
In Decision-Making, Machine Learning models need to not only make predictions but also quantify their predictions' uncertainty. A point prediction from the model might be dramatically different from the real value because of the high stochasticity of the real world. But, on the other hand, if the model could estimate the range which guarantees to cover the true value with high probability, model could compute the best and worst rewards and make more sensible decisions. | |
For example | |
* While buying a house, the predictions' upper bound can be useful for a buyer to be certain whether they will be able to buy a house or not. | |
* While Identifying an object, applying threshold on softmax predictions can help us identify what that object could be. | |
Conformal prediction is a technique for quantifying such uncertainties for AI systems. In particular, given an input, conformal prediction estimates a prediction interval in regression problems and a set of classes in classification problems. Both the prediction interval and sets are guaranteed to cover the true value with high probability. | |
# Theory | |
### 1. Prediction Regions | |
Prediction regions in conformal prediction are intervals that provide a range of possible values for the prediction. For a regression task, this is often referred to as a prediction interval. Let's denote the prediction region as $[a, b]$, where $a$ and $b$ represent the lower and upper bounds, respectively. The confidence level is denoted by $\alpha$. The prediction region is constructed in such a way that it contains the true value with a probability of at least $(1 - \alpha)$. | |
Mathematically, for a prediction $\hat{y}$, the prediction region is defined as: | |
$$ P(a \leq y \leq b) \geq 1 - \alpha $$ | |
This ensures that the true value $y$ falls within the predicted interval with a confidence level of at least $(1 - \alpha)$. | |
For a classification task, the prediction region is a set of classes that's above a certain threshold. The threshold is calculated by $\alpha$. Mathematically, for a prediction $\hat{C}$, the prediction region is defined as: | |
$$ P(y \in \hat{C}) \geq 1 - \alpha $$ | |
This ensures that the true value $y$ falls within the predicted set of classes with a confidence level of at least $(1 - \alpha)$. | |
### 2. Validity | |
The validity of a conformal predictor is a crucial aspect. It ensures that, over repeated experiments, the true value falls within the predicted region with the specified confidence level. Mathematically, for a given prediction $\hat{y}$ and a true outcome $y$, the validity condition is expressed as: | |
$$ P(y \in [a, b]) \geq 1 - \alpha $$ | |
This means that the probability of the true value $y$ lying within the predicted interval $[a, b]$ is greater than or equal to $(1 - \alpha)$. | |
For a classification task, the validity condition is expressed as: | |
$$ P(y \in \hat{C}) \geq 1 - \alpha $$ | |
This means that the probability of the true value $y$ lying within the predicted set of classes $\hat{C}$ is greater than or equal to $(1 - \alpha)$. | |
### 3. Inductive Conformal Prediction | |
Inductive Conformal Prediction is characterized by its adaptability to the data at hand without relying on specific assumptions about the underlying distribution. It ensures flexibility across various types of problems. The algorithm is as follows: | |
1. Given a dataset $D = \{(x_1, y_1), (x_2, y_2), ..., (x_n, y_n)\}$, split it into a training set $D_{train}$ and a calibration set $D_{cal}$. | |
2. Train a machine learning model $M$ on $D_{train}$. | |
3. For each test sample $x_i$ in the calibration set $D_{cal}$, compute the prediction $\hat{y}_i$ and the prediction error $e_i = |y_i - \hat{y}_i|$ (for example, MAE in case of Regression). | |
4. Sort the prediction errors in ascending order and select the top $k$ errors, where $k$ is the number of samples in the calibration set $D_{cal}$. | |
5. For each test sample $x_i$ in the calibration set $D_{cal}$, compute the prediction region $R_i$ using the top $k$ errors. | |
6. For each test sample $x_i$ in the test set $D_{test}$, compute the | |
- prediction region $R_i$ using the top $k$ errors. | |
- validity score $s_i$ by counting the number of times the true value $y_i$ falls within the prediction region $R_i$ over repeated experiments. | |
- p-value $p_i$ by dividing the validity score $s_i$ by the number of repeated experiments. | |
- prediction region $R_i$ using the top $k$ errors and the p-value $p_i$. | |
- validity score $s_i$ by counting the number of times the true value $y_i$ falls within the prediction region $R_i$ over repeated experiments. | |
- p-value $p_i$ by dividing the validity score $s_i$ by the number of repeated experiments. | |
- prediction region $R_i$ using the top $k$ errors and the p-value $p_i$. | |