|
--- |
|
title: action_generation |
|
datasets: |
|
- none |
|
tags: |
|
- evaluate |
|
- metric |
|
description: "TODO: add a description here" |
|
sdk: gradio |
|
sdk_version: 3.19.1 |
|
app_file: app.py |
|
pinned: false |
|
--- |
|
|
|
# Metric Card for action_generation |
|
|
|
## Metric Description |
|
Evaluate the result of action generation task. |
|
Consider the output format `/class/phrase`. Compute the scores for both `/class` and `phrase` separately, and then perform a weighted sum of these scores. |
|
|
|
## How to Use |
|
```python |
|
import evaluate |
|
valid_labels = [ |
|
"/開箱", |
|
"/教學", |
|
"/表達", |
|
"/分享/外部資訊", |
|
"/分享/個人資訊", |
|
"/推薦/產品", |
|
"/推薦/服務", |
|
"/推薦/其他", |
|
"" |
|
] |
|
predictions = [ |
|
["/開箱/xxx", "/教學/yyy", "/表達/zzz"], |
|
["/分享/外部資訊/aaa", "/教學/yyy", "/表達/zzz", "/分享/個人資訊/bbb"] |
|
] |
|
references = [ |
|
["/開箱/xxx", "/教學/yyy", "/表達/zzz"], |
|
["/推薦/產品/bbb", "/教學/yyy", "/表達/zzz"] |
|
] |
|
metric = evaluate.load("DarrenChensformer/action_generation") |
|
result = metric.compute(predictions=predictions, references=references, valid_labels=valid_labels, detailed_scores=True) |
|
print(result) |
|
``` |
|
|
|
``` |
|
{'class': {'precision': 0.7143, 'recall': 0.8333, 'f1': 0.7692}, 'phrase': {'precision': 0.8571, 'recall': 1.0, 'f1': 0.9231}, 'weighted_sum': {'precision': 0.7429, 'recall': 0.8666, 'f1': 0.8}} |
|
``` |
|
|
|
### Inputs |
|
*List all input arguments in the format below* |
|
- **input_field** *(type): Definition of input, with explanation if necessary. State any default value(s).* |
|
|
|
### Output Values |
|
|
|
*Explain what this metric outputs and provide an example of what the metric output looks like. Modules should return a dictionary with one or multiple key-value pairs, e.g. {"bleu" : 6.02}* |
|
|
|
*State the range of possible values that the metric's output can take, as well as what in that range is considered good. For example: "This metric can take on any value between 0 and 100, inclusive. Higher scores are better."* |
|
|
|
|
|
### Examples |
|
*Give code examples of the metric being used. Try to include examples that clear up any potential ambiguity left from the metric description above. If possible, provide a range of examples that show both typical and atypical results, as well as examples where a variety of input parameters are passed.* |
|
|
|
## Limitations and Bias |
|
*Note any known limitations or biases that the metric has, with links and references if possible.* |
|
|
|
## Citation |
|
*Cite the source where this metric was introduced.* |
|
|
|
## Further References |
|
*Add any useful further references.* |
|
|