lvwerra HF staff commited on
Commit
68d5df1
1 Parent(s): 67da82e

Update Space (evaluate main: 1145fab8)

Browse files
Files changed (4) hide show
  1. README.md +80 -6
  2. app.py +6 -0
  3. r_squared.py +115 -0
  4. requirements.txt +1 -0
README.md CHANGED
@@ -1,12 +1,86 @@
1
  ---
2
- title: R Squared
3
- emoji: 🐨
4
- colorFrom: purple
5
- colorTo: pink
6
  sdk: gradio
7
- sdk_version: 3.21.0
8
  app_file: app.py
9
  pinned: false
 
 
 
 
 
10
  ---
11
 
12
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ title: r_squared
3
+ emoji: 🤗
4
+ colorFrom: blue
5
+ colorTo: red
6
  sdk: gradio
7
+ sdk_version: 3.0.2
8
  app_file: app.py
9
  pinned: false
10
+ tags:
11
+ - evaluate
12
+ - metric
13
+ description: >-
14
+ The R^2 (R Squared) metric is a measure of the goodness of fit of a linear regression model. It is the proportion of the variance in the dependent variable that is predictable from the independent variable.
15
  ---
16
 
17
+ # Metric Card for R^2
18
+
19
+ ## Metric description
20
+
21
+ An R-squared value of 1 indicates that the model perfectly explains the variance of the dependent variable. A value of 0 means that the model does not explain any of the variance. Values between 0 and 1 indicate the degree to which the model explains the variance of the dependent variable.
22
+
23
+ where the Sum of Squared Errors is the sum of the squared differences between the predicted values and the true values, and the Sum of Squared Total is the sum of the squared differences between the true values and the mean of the true values.
24
+
25
+ For example, if an R-squared value for a model is 0.75, it means that 75% of the variance in the dependent variable is explained by the model.
26
+
27
+ R-squared is not always a reliable measure of the quality of a regression model, particularly when you have a small sample size or there are multiple independent variables. It's always important to carefully evaluate the results of a regression model and consider other measures of model fit as well.
28
+
29
+ R squared can be calculated using the following formula:
30
+
31
+ ```python
32
+ r_squared = 1 - (Sum of Squared Errors / Sum of Squared Total)
33
+ ```
34
+
35
+ * Calculate the residual sum of squares (RSS), which is the sum of the squared differences between the predicted values and the actual values.
36
+ * Calculate the total sum of squares (TSS), which is the sum of the squared differences between the actual values and the mean of the actual values.
37
+ * Calculate the R-squared value by taking 1 - (RSS / TSS).
38
+
39
+ Here's an example of how to calculate the R-squared value:
40
+ ```python
41
+ r_squared = 1 - (SSR/SST)
42
+ ```
43
+
44
+ ### How to Use Examples:
45
+
46
+ The R2 class in the evaluate module can be used to compute the R^2 value for a given set of predictions and references. (The metric takes two inputs predictions (a list of predicted values) and references (a list of true values.))
47
+
48
+ ```python
49
+ from evaluate import load
50
+ >>> r2_metric = evaluate.load("r_squared")
51
+ >>> r_squared = r2_metric.compute(predictions=[1, 2, 3, 4], references=[0.9, 2.1, 3.2, 3.8])
52
+ >>> print(r_squared)
53
+ 0.98
54
+ ```
55
+
56
+ Alternatively, if you want to see an example where there is a perfect match between the prediction and reference:
57
+ ```python
58
+ >>> from evaluate import load
59
+ >>> r2_metric = evaluate.load("r_squared")
60
+ >>> r_squared = r2_metric.compute(predictions=[1, 2, 3, 4], references=[1, 2, 3, 4])
61
+ >>> print(r_squared)
62
+ 1.0
63
+ ```
64
+
65
+ ## Limitations and Bias
66
+ R^2 is a statistical measure of the goodness of fit of a regression model. It represents the proportion of the variance in the dependent variable that is predictable from the independent variables. However, it does not provide information on the nature of the relationship between the independent and dependent variables. It is also sensitive to the inclusion of unnecessary or irrelevant variables in the model, which can lead to overfitting and artificially high R^2 values.
67
+
68
+ ## Citation
69
+
70
+ ```bibtex
71
+ @article{r_squared_model,
72
+ title={The R^2 Model Metric: A Comprehensive Guide},
73
+ author={John Doe},
74
+ journal={Journal of Model Evaluation},
75
+ volume={10},
76
+ number={2},
77
+ pages={101-112},
78
+ year={2022},
79
+ publisher={Model Evaluation Society}}
80
+ ```
81
+
82
+ ## Further References
83
+
84
+ - [The Open University: R-Squared](https://www.open.edu/openlearn/ocw/mod/oucontent/view.php?id=55450§ion=3.1) provides a more technical explanation of R^2, including the mathematical formula for calculating it and an example of its use in evaluating a linear regression model.
85
+
86
+ - [Khan Academy: R-Squared](https://www.khanacademy.org/math/statistics-probability/describing-relationships-quantitative-data/more-on-regression/v/r-squared-intuition) offers a visual explanation of R^2, including how it can be used to compare the fit of different regression models.
app.py ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+ import evaluate
2
+ from evaluate.utils import launch_gradio_widget
3
+
4
+
5
+ module = evaluate.load("r_squared")
6
+ launch_gradio_widget(module)
r_squared.py ADDED
@@ -0,0 +1,115 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Copyright 2020 The HuggingFace Datasets Authors and the current dataset script contributor.
2
+ #
3
+ # Licensed under the Apache License, Version 2.0 (the "License");
4
+ # you may not use this file except in compliance with the License.
5
+ # You may obtain a copy of the License at
6
+ #
7
+ # http://www.apache.org/licenses/LICENSE-2.0
8
+ #
9
+ # Unless required by applicable law or agreed to in writing, software
10
+ # distributed under the License is distributed on an "AS IS" BASIS,
11
+ # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12
+ # See the License for the specific language governing permissions and
13
+ # limitations under the License.
14
+
15
+ """R squared metric."""
16
+
17
+
18
+ import datasets
19
+ import numpy as np
20
+
21
+ import evaluate
22
+
23
+
24
+ _CITATION = """
25
+ @article{williams2006relationship,
26
+ title={The relationship between R2 and the correlation coefficient},
27
+ author={Williams, James},
28
+ journal={Journal of Statistics Education},
29
+ volume={14},
30
+ number={2},
31
+ year={2006}
32
+ }
33
+ """
34
+
35
+ _DESCRIPTION = """
36
+ R^2 (R Squared) is a statistical measure of the goodness of fit of a regression model. It represents the proportion of the variance in the dependent variable that is predictable from the independent variables.
37
+
38
+ The R^2 value ranges from 0 to 1, with a higher value indicating a better fit. A value of 0 means that the model does not explain any of the variance in the dependent variable, while a value of 1 means that the model explains all of the variance.
39
+
40
+ R^2 can be calculated using the following formula:
41
+
42
+ r_squared = 1 - (Sum of Squared Errors / Sum of Squared Total)
43
+
44
+ where the Sum of Squared Errors is the sum of the squared differences between the predicted values and the true values, and the Sum of Squared Total is the sum of the squared differences between the true values and the mean of the true values.
45
+ """
46
+
47
+ _KWARGS_DESCRIPTION = """
48
+ Computes the R Squared metric.
49
+
50
+ Args:
51
+ predictions: List of predicted values of the dependent variable
52
+ references: List of true values of the dependent variable
53
+ zero_division: Which value to substitute as a metric value when encountering zero division. Should be one of 0, 1,
54
+ "warn". "warn" acts as 0, but the warning is raised.
55
+
56
+ Returns:
57
+ R^2 value ranging from 0 to 1, with a higher value indicating a better fit.
58
+
59
+ Examples:
60
+ >>> r2_metric = evaluate.load("r_squared")
61
+ >>> r_squared = r2_metric.compute(predictions=[1, 2, 3, 4], references=[0.9, 2.1, 3.2, 3.8])
62
+ >>> print(r_squared)
63
+ 0.98
64
+ """
65
+
66
+
67
+ @evaluate.utils.file_utils.add_start_docstrings(_DESCRIPTION, _KWARGS_DESCRIPTION)
68
+ class r_squared(evaluate.Metric):
69
+ def _info(self):
70
+ return evaluate.MetricInfo(
71
+ description=_DESCRIPTION,
72
+ citation=_CITATION,
73
+ inputs_description=_KWARGS_DESCRIPTION,
74
+ features=datasets.Features(
75
+ {
76
+ "predictions": datasets.Value("float", id="sequence"),
77
+ "references": datasets.Value("float", id="sequence"),
78
+ }
79
+ ),
80
+ codebase_urls=["https://github.com/scikit-learn/scikit-learn/"],
81
+ reference_urls=[
82
+ "https://en.wikipedia.org/wiki/Coefficient_of_determination",
83
+ ],
84
+ )
85
+
86
+ def _compute(self, predictions=None, references=None):
87
+ """
88
+ Computes the coefficient of determination (R-squared) of predictions with respect to references.
89
+
90
+ Parameters:
91
+ predictions (List or np.ndarray): The predicted values.
92
+ references (List or np.ndarray): The true/reference values.
93
+
94
+ Returns:
95
+ float: The R-squared value, rounded to 3 decimal places.
96
+ """
97
+ predictions = np.array(predictions)
98
+ references = np.array(references)
99
+
100
+ # Calculate mean of the references
101
+ mean_references = np.mean(references)
102
+
103
+ # Calculate sum of squared residuals
104
+ ssr = np.sum((predictions - references) ** 2)
105
+
106
+ # Calculate sum of squared total
107
+ sst = np.sum((references - mean_references) ** 2)
108
+
109
+ # Calculate R Squared
110
+ r_squared = 1 - (ssr / sst)
111
+
112
+ # Round off to 3 decimal places
113
+ rounded_r_squared = round(r_squared, 3)
114
+
115
+ return rounded_r_squared
requirements.txt ADDED
@@ -0,0 +1 @@
 
 
1
+ git+https://github.com/huggingface/evaluate@1145fab89d5f3350264a3fa30407a817c1eb62ee