Spaces:
Runtime error
Runtime error
github-actions
commited on
Commit
•
594067e
1
Parent(s):
48fa02d
Auto files update [main]
Browse files- README.md +8 -8
- app.py +0 -10
- codebleu.py +0 -3
README.md
CHANGED
@@ -5,7 +5,7 @@ tags:
|
|
5 |
- metric
|
6 |
- code
|
7 |
- codebleu
|
8 |
-
description: "Unofficial `CodeBLEU` implementation
|
9 |
sdk: gradio
|
10 |
sdk_version: 3.19.1
|
11 |
app_file: app.py
|
@@ -14,10 +14,13 @@ pinned: false
|
|
14 |
|
15 |
# Metric Card for codebleu
|
16 |
|
17 |
-
|
|
|
|
|
|
|
|
|
18 |
|
19 |
## Metric Description
|
20 |
-
Unofficial `CodeBLEU` implementation with Linux and MacOS supports available with PyPI and HF HUB.
|
21 |
|
22 |
> An ideal evaluation metric should consider the grammatical correctness and the logic correctness.
|
23 |
> We propose weighted n-gram match and syntactic AST match to measure grammatical correctness, and introduce semantic data-flow match to calculate logic correctness.
|
@@ -29,9 +32,6 @@ In a nutshell, `CodeBLEU` is a weighted combination of `n-gram match (BLEU)`, `w
|
|
29 |
The metric has shown higher correlation with human evaluation than `BLEU` and `accuracy` metrics.
|
30 |
|
31 |
## How to Use
|
32 |
-
*Give general statement of how to use the metric*
|
33 |
-
|
34 |
-
*Provide simplest possible example for using the metric*
|
35 |
|
36 |
### Inputs
|
37 |
|
@@ -80,7 +80,7 @@ print(result)
|
|
80 |
# }
|
81 |
```
|
82 |
|
83 |
-
Or using `evaluate` library (package required):
|
84 |
```python
|
85 |
import evaluate
|
86 |
metric = evaluate.load("k4black/codebleu")
|
@@ -91,7 +91,7 @@ reference = "def sum ( first , second ) :\n return second + first"
|
|
91 |
result = metric.compute([reference], [prediction], lang="python", weights=(0.25, 0.25, 0.25, 0.25), tokenizer=None)
|
92 |
```
|
93 |
|
94 |
-
Note: `
|
95 |
|
96 |
|
97 |
## Limitations and Bias
|
|
|
5 |
- metric
|
6 |
- code
|
7 |
- codebleu
|
8 |
+
description: "Unofficial `CodeBLEU` implementation that supports Linux and MacOS."
|
9 |
sdk: gradio
|
10 |
sdk_version: 3.19.1
|
11 |
app_file: app.py
|
|
|
14 |
|
15 |
# Metric Card for codebleu
|
16 |
|
17 |
+
This repository contains an unofficial `CodeBLEU` implementation that supports Linux and MacOS. It is available through `PyPI` and the `evaluate` library.
|
18 |
+
|
19 |
+
The code is based on the original [CodeXGLUE/CodeBLEU](https://github.com/microsoft/CodeXGLUE/tree/main/Code-Code/code-to-code-trans/evaluator/CodeBLEU) and updated version by [XLCoST/CodeBLEU](https://github.com/reddy-lab-code-research/XLCoST/tree/main/code/translation/evaluator/CodeBLEU). It has been refactored, tested, built for macOS, and multiple improvements have been made to enhance usability
|
20 |
+
|
21 |
+
Available for: `Python`, `C`, `C#`, `C++`, `Java`, `JavaScript`, `PHP`.
|
22 |
|
23 |
## Metric Description
|
|
|
24 |
|
25 |
> An ideal evaluation metric should consider the grammatical correctness and the logic correctness.
|
26 |
> We propose weighted n-gram match and syntactic AST match to measure grammatical correctness, and introduce semantic data-flow match to calculate logic correctness.
|
|
|
32 |
The metric has shown higher correlation with human evaluation than `BLEU` and `accuracy` metrics.
|
33 |
|
34 |
## How to Use
|
|
|
|
|
|
|
35 |
|
36 |
### Inputs
|
37 |
|
|
|
80 |
# }
|
81 |
```
|
82 |
|
83 |
+
Or using `evaluate` library (`codebleu` package required):
|
84 |
```python
|
85 |
import evaluate
|
86 |
metric = evaluate.load("k4black/codebleu")
|
|
|
91 |
result = metric.compute([reference], [prediction], lang="python", weights=(0.25, 0.25, 0.25, 0.25), tokenizer=None)
|
92 |
```
|
93 |
|
94 |
+
Note: `lang` is required;
|
95 |
|
96 |
|
97 |
## Limitations and Bias
|
app.py
CHANGED
@@ -1,15 +1,5 @@
|
|
1 |
-
import importlib
|
2 |
-
import subprocess
|
3 |
-
import sys
|
4 |
-
|
5 |
import evaluate
|
6 |
from evaluate.utils import launch_gradio_widget
|
7 |
|
8 |
-
|
9 |
-
# hotfix: somehow codebleu is not installed in the docker image
|
10 |
-
subprocess.run([sys.executable, "-m", "pip", "install", "codebleu"], check=True)
|
11 |
-
globals()["codebleu"] = importlib.import_module("codebleu")
|
12 |
-
|
13 |
-
|
14 |
module = evaluate.load("k4black/codebleu")
|
15 |
launch_gradio_widget(module)
|
|
|
|
|
|
|
|
|
|
|
1 |
import evaluate
|
2 |
from evaluate.utils import launch_gradio_widget
|
3 |
|
|
|
|
|
|
|
|
|
|
|
|
|
4 |
module = evaluate.load("k4black/codebleu")
|
5 |
launch_gradio_widget(module)
|
codebleu.py
CHANGED
@@ -18,7 +18,6 @@ import datasets
|
|
18 |
import evaluate
|
19 |
|
20 |
|
21 |
-
# TODO: Add BibTeX citation
|
22 |
_CITATION = """\
|
23 |
@misc{ren2020codebleu,
|
24 |
title={CodeBLEU: a Method for Automatic Evaluation of Code Synthesis},
|
@@ -30,7 +29,6 @@ _CITATION = """\
|
|
30 |
}
|
31 |
"""
|
32 |
|
33 |
-
# TODO: Add description of the module here
|
34 |
_DESCRIPTION = """\
|
35 |
Unofficial `CodeBLEU` implementation with Linux and MacOS supports available with PyPI and HF HUB.
|
36 |
|
@@ -38,7 +36,6 @@ Based on original [CodeXGLUE/CodeBLEU](https://github.com/microsoft/CodeXGLUE/tr
|
|
38 |
"""
|
39 |
|
40 |
|
41 |
-
# TODO: Add description of the arguments of the module here
|
42 |
_KWARGS_DESCRIPTION = """
|
43 |
Calculate a weighted combination of `n-gram match (BLEU)`, `weighted n-gram match (BLEU-weighted)`, `AST match` and `data-flow match` scores.
|
44 |
|
|
|
18 |
import evaluate
|
19 |
|
20 |
|
|
|
21 |
_CITATION = """\
|
22 |
@misc{ren2020codebleu,
|
23 |
title={CodeBLEU: a Method for Automatic Evaluation of Code Synthesis},
|
|
|
29 |
}
|
30 |
"""
|
31 |
|
|
|
32 |
_DESCRIPTION = """\
|
33 |
Unofficial `CodeBLEU` implementation with Linux and MacOS supports available with PyPI and HF HUB.
|
34 |
|
|
|
36 |
"""
|
37 |
|
38 |
|
|
|
39 |
_KWARGS_DESCRIPTION = """
|
40 |
Calculate a weighted combination of `n-gram match (BLEU)`, `weighted n-gram match (BLEU-weighted)`, `AST match` and `data-flow match` scores.
|
41 |
|