Spaces:

MarioBarbeque
/

CombinedEvaluationMetrics

Sleeping

App Files Files Community

John Graham Reynolds commited on Nov 5, 2024

Commit

bcbab79

1 Parent(s): 86e4acf

change output to text, try adding example

Browse files

Files changed (1) hide show

app.py +11 -4

app.py CHANGED Viewed

@@ -5,7 +5,7 @@ import evaluate
 import gradio as gr
 import pandas as pd
-title = "`Combine` multiple metrics with this 🤗 Evaluate 🪲 Fix!"
 description = """<p style='text-align: center'>
 As I introduce myself to the entirety of the 🤗 ecosystem, I've put together this Space to show off a temporary fix for a current 🪲 in the 🤗 Evaluate library. \n
@@ -13,7 +13,8 @@ As I introduce myself to the entirety of the 🤗 ecosystem, I've put together t
 Check out the original, longstanding issue [here](https://github.com/huggingface/evaluate/issues/234). This details how it is currently impossible to \
 `evaluate.combine()` multiple metrics related to multilabel text classification. Particularly, one cannot `combine` the `f1`, `precision`, and `recall` scores for \
 evaluation. I encountered this issue specifically while training [RoBERTa-base-DReiFT](https://huggingface.co/MarioBarbeque/RoBERTa-base-DReiFT) for multilabel \
-text classification of 805 labeled medical conditions based on drug reviews. \n
 This Space shows how one can instantiate these custom `evaluate.Metric`s, each with their own unique methodology for averaging across labels, before `combine`-ing them into a
 HF `evaluate.CombinedEvaluations` object. From here, we can easily compute each of the metrics simultaneously using `compute`.</p>
@@ -80,17 +81,23 @@ space = gr.Interface(
             datatype=["number", "number"],
             row_count=5,
             col_count=(2, "fixed"),
         ),
         gr.Dataframe(
             headers=["Metric", "Averaging Type"],
             datatype=["str", "str"],
-            row_count=3,
             col_count=(2, "fixed"),
         )
     ],
-    outputs="textbox",
     title=title,
     description=description,
     article=article,
     cache_examples=False
 ).launch()

 import gradio as gr
 import pandas as pd
+title = "'Combine' multiple metrics with this 🤗 Evaluate 🪲 Fix!"
 description = """<p style='text-align: center'>
 As I introduce myself to the entirety of the 🤗 ecosystem, I've put together this Space to show off a temporary fix for a current 🪲 in the 🤗 Evaluate library. \n
 Check out the original, longstanding issue [here](https://github.com/huggingface/evaluate/issues/234). This details how it is currently impossible to \
 `evaluate.combine()` multiple metrics related to multilabel text classification. Particularly, one cannot `combine` the `f1`, `precision`, and `recall` scores for \
 evaluation. I encountered this issue specifically while training [RoBERTa-base-DReiFT](https://huggingface.co/MarioBarbeque/RoBERTa-base-DReiFT) for multilabel \
+text classification of 805 labeled medical conditions based on drug reviews. The [following workaround](https://github.com/johngrahamreynolds/FixedMetricsForHF) was
+congifured. \n
 This Space shows how one can instantiate these custom `evaluate.Metric`s, each with their own unique methodology for averaging across labels, before `combine`-ing them into a
 HF `evaluate.CombinedEvaluations` object. From here, we can easily compute each of the metrics simultaneously using `compute`.</p>
             datatype=["number", "number"],
             row_count=5,
             col_count=(2, "fixed"),
+            label_name="Table of Predicted vs Actual Class Labels"
         ),
         gr.Dataframe(
             headers=["Metric", "Averaging Type"],
             datatype=["str", "str"],
+            row_count=(3, "fixed"),
             col_count=(2, "fixed"),
+            label_name="Table of Metrics and Averaging Method across Labels "
         )
     ],
+    outputs="text",
     title=title,
     description=description,
     article=article,
+    examples=[
+        [[[1,1],[1,0],[2,0],[1,2],[2,2]], [["f1", "weighted"], ["precision", "micro"], ["recall", "weighted"]]],
+        # [[["precision", "micro"], ["recall", "weighted"], ["f1", "macro"]]],
+    ]
     cache_examples=False
 ).launch()