Spaces:
Runtime error
Runtime error
Update Space (evaluate main: 1145fab8)
Browse files- README.md +80 -6
- app.py +6 -0
- r_squared.py +115 -0
- requirements.txt +1 -0
README.md
CHANGED
@@ -1,12 +1,86 @@
|
|
1 |
---
|
2 |
-
title:
|
3 |
-
emoji:
|
4 |
-
colorFrom:
|
5 |
-
colorTo:
|
6 |
sdk: gradio
|
7 |
-
sdk_version: 3.
|
8 |
app_file: app.py
|
9 |
pinned: false
|
|
|
|
|
|
|
|
|
|
|
10 |
---
|
11 |
|
12 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
---
|
2 |
+
title: r_squared
|
3 |
+
emoji: 🤗
|
4 |
+
colorFrom: blue
|
5 |
+
colorTo: red
|
6 |
sdk: gradio
|
7 |
+
sdk_version: 3.0.2
|
8 |
app_file: app.py
|
9 |
pinned: false
|
10 |
+
tags:
|
11 |
+
- evaluate
|
12 |
+
- metric
|
13 |
+
description: >-
|
14 |
+
The R^2 (R Squared) metric is a measure of the goodness of fit of a linear regression model. It is the proportion of the variance in the dependent variable that is predictable from the independent variable.
|
15 |
---
|
16 |
|
17 |
+
# Metric Card for R^2
|
18 |
+
|
19 |
+
## Metric description
|
20 |
+
|
21 |
+
An R-squared value of 1 indicates that the model perfectly explains the variance of the dependent variable. A value of 0 means that the model does not explain any of the variance. Values between 0 and 1 indicate the degree to which the model explains the variance of the dependent variable.
|
22 |
+
|
23 |
+
where the Sum of Squared Errors is the sum of the squared differences between the predicted values and the true values, and the Sum of Squared Total is the sum of the squared differences between the true values and the mean of the true values.
|
24 |
+
|
25 |
+
For example, if an R-squared value for a model is 0.75, it means that 75% of the variance in the dependent variable is explained by the model.
|
26 |
+
|
27 |
+
R-squared is not always a reliable measure of the quality of a regression model, particularly when you have a small sample size or there are multiple independent variables. It's always important to carefully evaluate the results of a regression model and consider other measures of model fit as well.
|
28 |
+
|
29 |
+
R squared can be calculated using the following formula:
|
30 |
+
|
31 |
+
```python
|
32 |
+
r_squared = 1 - (Sum of Squared Errors / Sum of Squared Total)
|
33 |
+
```
|
34 |
+
|
35 |
+
* Calculate the residual sum of squares (RSS), which is the sum of the squared differences between the predicted values and the actual values.
|
36 |
+
* Calculate the total sum of squares (TSS), which is the sum of the squared differences between the actual values and the mean of the actual values.
|
37 |
+
* Calculate the R-squared value by taking 1 - (RSS / TSS).
|
38 |
+
|
39 |
+
Here's an example of how to calculate the R-squared value:
|
40 |
+
```python
|
41 |
+
r_squared = 1 - (SSR/SST)
|
42 |
+
```
|
43 |
+
|
44 |
+
### How to Use Examples:
|
45 |
+
|
46 |
+
The R2 class in the evaluate module can be used to compute the R^2 value for a given set of predictions and references. (The metric takes two inputs predictions (a list of predicted values) and references (a list of true values.))
|
47 |
+
|
48 |
+
```python
|
49 |
+
from evaluate import load
|
50 |
+
>>> r2_metric = evaluate.load("r_squared")
|
51 |
+
>>> r_squared = r2_metric.compute(predictions=[1, 2, 3, 4], references=[0.9, 2.1, 3.2, 3.8])
|
52 |
+
>>> print(r_squared)
|
53 |
+
0.98
|
54 |
+
```
|
55 |
+
|
56 |
+
Alternatively, if you want to see an example where there is a perfect match between the prediction and reference:
|
57 |
+
```python
|
58 |
+
>>> from evaluate import load
|
59 |
+
>>> r2_metric = evaluate.load("r_squared")
|
60 |
+
>>> r_squared = r2_metric.compute(predictions=[1, 2, 3, 4], references=[1, 2, 3, 4])
|
61 |
+
>>> print(r_squared)
|
62 |
+
1.0
|
63 |
+
```
|
64 |
+
|
65 |
+
## Limitations and Bias
|
66 |
+
R^2 is a statistical measure of the goodness of fit of a regression model. It represents the proportion of the variance in the dependent variable that is predictable from the independent variables. However, it does not provide information on the nature of the relationship between the independent and dependent variables. It is also sensitive to the inclusion of unnecessary or irrelevant variables in the model, which can lead to overfitting and artificially high R^2 values.
|
67 |
+
|
68 |
+
## Citation
|
69 |
+
|
70 |
+
```bibtex
|
71 |
+
@article{r_squared_model,
|
72 |
+
title={The R^2 Model Metric: A Comprehensive Guide},
|
73 |
+
author={John Doe},
|
74 |
+
journal={Journal of Model Evaluation},
|
75 |
+
volume={10},
|
76 |
+
number={2},
|
77 |
+
pages={101-112},
|
78 |
+
year={2022},
|
79 |
+
publisher={Model Evaluation Society}}
|
80 |
+
```
|
81 |
+
|
82 |
+
## Further References
|
83 |
+
|
84 |
+
- [The Open University: R-Squared](https://www.open.edu/openlearn/ocw/mod/oucontent/view.php?id=55450§ion=3.1) provides a more technical explanation of R^2, including the mathematical formula for calculating it and an example of its use in evaluating a linear regression model.
|
85 |
+
|
86 |
+
- [Khan Academy: R-Squared](https://www.khanacademy.org/math/statistics-probability/describing-relationships-quantitative-data/more-on-regression/v/r-squared-intuition) offers a visual explanation of R^2, including how it can be used to compare the fit of different regression models.
|
app.py
ADDED
@@ -0,0 +1,6 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
import evaluate
|
2 |
+
from evaluate.utils import launch_gradio_widget
|
3 |
+
|
4 |
+
|
5 |
+
module = evaluate.load("r_squared")
|
6 |
+
launch_gradio_widget(module)
|
r_squared.py
ADDED
@@ -0,0 +1,115 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# Copyright 2020 The HuggingFace Datasets Authors and the current dataset script contributor.
|
2 |
+
#
|
3 |
+
# Licensed under the Apache License, Version 2.0 (the "License");
|
4 |
+
# you may not use this file except in compliance with the License.
|
5 |
+
# You may obtain a copy of the License at
|
6 |
+
#
|
7 |
+
# http://www.apache.org/licenses/LICENSE-2.0
|
8 |
+
#
|
9 |
+
# Unless required by applicable law or agreed to in writing, software
|
10 |
+
# distributed under the License is distributed on an "AS IS" BASIS,
|
11 |
+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
12 |
+
# See the License for the specific language governing permissions and
|
13 |
+
# limitations under the License.
|
14 |
+
|
15 |
+
"""R squared metric."""
|
16 |
+
|
17 |
+
|
18 |
+
import datasets
|
19 |
+
import numpy as np
|
20 |
+
|
21 |
+
import evaluate
|
22 |
+
|
23 |
+
|
24 |
+
_CITATION = """
|
25 |
+
@article{williams2006relationship,
|
26 |
+
title={The relationship between R2 and the correlation coefficient},
|
27 |
+
author={Williams, James},
|
28 |
+
journal={Journal of Statistics Education},
|
29 |
+
volume={14},
|
30 |
+
number={2},
|
31 |
+
year={2006}
|
32 |
+
}
|
33 |
+
"""
|
34 |
+
|
35 |
+
_DESCRIPTION = """
|
36 |
+
R^2 (R Squared) is a statistical measure of the goodness of fit of a regression model. It represents the proportion of the variance in the dependent variable that is predictable from the independent variables.
|
37 |
+
|
38 |
+
The R^2 value ranges from 0 to 1, with a higher value indicating a better fit. A value of 0 means that the model does not explain any of the variance in the dependent variable, while a value of 1 means that the model explains all of the variance.
|
39 |
+
|
40 |
+
R^2 can be calculated using the following formula:
|
41 |
+
|
42 |
+
r_squared = 1 - (Sum of Squared Errors / Sum of Squared Total)
|
43 |
+
|
44 |
+
where the Sum of Squared Errors is the sum of the squared differences between the predicted values and the true values, and the Sum of Squared Total is the sum of the squared differences between the true values and the mean of the true values.
|
45 |
+
"""
|
46 |
+
|
47 |
+
_KWARGS_DESCRIPTION = """
|
48 |
+
Computes the R Squared metric.
|
49 |
+
|
50 |
+
Args:
|
51 |
+
predictions: List of predicted values of the dependent variable
|
52 |
+
references: List of true values of the dependent variable
|
53 |
+
zero_division: Which value to substitute as a metric value when encountering zero division. Should be one of 0, 1,
|
54 |
+
"warn". "warn" acts as 0, but the warning is raised.
|
55 |
+
|
56 |
+
Returns:
|
57 |
+
R^2 value ranging from 0 to 1, with a higher value indicating a better fit.
|
58 |
+
|
59 |
+
Examples:
|
60 |
+
>>> r2_metric = evaluate.load("r_squared")
|
61 |
+
>>> r_squared = r2_metric.compute(predictions=[1, 2, 3, 4], references=[0.9, 2.1, 3.2, 3.8])
|
62 |
+
>>> print(r_squared)
|
63 |
+
0.98
|
64 |
+
"""
|
65 |
+
|
66 |
+
|
67 |
+
@evaluate.utils.file_utils.add_start_docstrings(_DESCRIPTION, _KWARGS_DESCRIPTION)
|
68 |
+
class r_squared(evaluate.Metric):
|
69 |
+
def _info(self):
|
70 |
+
return evaluate.MetricInfo(
|
71 |
+
description=_DESCRIPTION,
|
72 |
+
citation=_CITATION,
|
73 |
+
inputs_description=_KWARGS_DESCRIPTION,
|
74 |
+
features=datasets.Features(
|
75 |
+
{
|
76 |
+
"predictions": datasets.Value("float", id="sequence"),
|
77 |
+
"references": datasets.Value("float", id="sequence"),
|
78 |
+
}
|
79 |
+
),
|
80 |
+
codebase_urls=["https://github.com/scikit-learn/scikit-learn/"],
|
81 |
+
reference_urls=[
|
82 |
+
"https://en.wikipedia.org/wiki/Coefficient_of_determination",
|
83 |
+
],
|
84 |
+
)
|
85 |
+
|
86 |
+
def _compute(self, predictions=None, references=None):
|
87 |
+
"""
|
88 |
+
Computes the coefficient of determination (R-squared) of predictions with respect to references.
|
89 |
+
|
90 |
+
Parameters:
|
91 |
+
predictions (List or np.ndarray): The predicted values.
|
92 |
+
references (List or np.ndarray): The true/reference values.
|
93 |
+
|
94 |
+
Returns:
|
95 |
+
float: The R-squared value, rounded to 3 decimal places.
|
96 |
+
"""
|
97 |
+
predictions = np.array(predictions)
|
98 |
+
references = np.array(references)
|
99 |
+
|
100 |
+
# Calculate mean of the references
|
101 |
+
mean_references = np.mean(references)
|
102 |
+
|
103 |
+
# Calculate sum of squared residuals
|
104 |
+
ssr = np.sum((predictions - references) ** 2)
|
105 |
+
|
106 |
+
# Calculate sum of squared total
|
107 |
+
sst = np.sum((references - mean_references) ** 2)
|
108 |
+
|
109 |
+
# Calculate R Squared
|
110 |
+
r_squared = 1 - (ssr / sst)
|
111 |
+
|
112 |
+
# Round off to 3 decimal places
|
113 |
+
rounded_r_squared = round(r_squared, 3)
|
114 |
+
|
115 |
+
return rounded_r_squared
|
requirements.txt
ADDED
@@ -0,0 +1 @@
|
|
|
|
|
1 |
+
git+https://github.com/huggingface/evaluate@1145fab89d5f3350264a3fa30407a817c1eb62ee
|