File size: 2,542 Bytes
61f94a0 0daf005 23dcd87 0daf005 61f94a0 0daf005 61f94a0 0daf005 07926c5 0daf005 728db8d 0daf005 6e27aba 0daf005 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 |
---
title: action_generation
datasets:
- none
tags:
- evaluate
- metric
description: "TODO: add a description here"
sdk: gradio
sdk_version: 3.19.1
app_file: app.py
pinned: false
---
# Metric Card for action_generation
## Metric Description
Evaluate the result of action generation task.
Consider the output format `/class/phrase`. Compute the scores for both `/class` and `phrase` separately, and then perform a weighted sum of these scores.
## How to Use
```python
import evaluate
valid_labels = [
"/開箱",
"/教學",
"/表達",
"/分享/外部資訊",
"/分享/個人資訊",
"/推薦/產品",
"/推薦/服務",
"/推薦/其他",
""
]
predictions = [
["/開箱/xxx", "/教學/yyy", "/表達/zzz"],
["/分享/外部資訊/aaa", "/教學/yyy", "/表達/zzz", "/分享/個人資訊/bbb"]
]
references = [
["/開箱/xxx", "/教學/yyy", "/表達/zzz"],
["/推薦/產品/bbb", "/教學/yyy", "/表達/zzz"]
]
metric = evaluate.load("DarrenChensformer/action_generation")
result = metric.compute(predictions=predictions, references=references, valid_labels=valid_labels, detailed_scores=True)
print(result)
```
```
{'class': {'precision': 0.7143, 'recall': 0.8333, 'f1': 0.7692}, 'phrase': {'precision': 0.8571, 'recall': 1.0, 'f1': 0.9231}, 'weighted_sum': {'precision': 0.7429, 'recall': 0.8666, 'f1': 0.8}}
```
### Inputs
*List all input arguments in the format below*
- **input_field** *(type): Definition of input, with explanation if necessary. State any default value(s).*
### Output Values
*Explain what this metric outputs and provide an example of what the metric output looks like. Modules should return a dictionary with one or multiple key-value pairs, e.g. {"bleu" : 6.02}*
*State the range of possible values that the metric's output can take, as well as what in that range is considered good. For example: "This metric can take on any value between 0 and 100, inclusive. Higher scores are better."*
### Examples
*Give code examples of the metric being used. Try to include examples that clear up any potential ambiguity left from the metric description above. If possible, provide a range of examples that show both typical and atypical results, as well as examples where a variety of input parameters are passed.*
## Limitations and Bias
*Note any known limitations or biases that the metric has, with links and references if possible.*
## Citation
*Cite the source where this metric was introduced.*
## Further References
*Add any useful further references.*
|