Spaces:
Runtime error
A newer version of the Gradio SDK is available:
5.4.0
title: codebleu
tags:
- evaluate
- metric
description: CodeBLEU
sdk: gradio
sdk_version: 3.0.2
app_file: app.py
pinned: false
Metric Card for CodeBLEU
Metric Description
CodeBLEU from CodeXGLUE and from article CodeBLEU: a Method for Automatic Evaluation of Code Synthesis
NOTE: currently works on Linux machines only due to dependency from languages .so
How to Use
module = evaluate.load("dvitel/codebleu")
src = 'class AcidicSwampOoze(MinionCard):§ def __init__(self):§ super().__init__("Acidic Swamp Ooze", 2, CHARACTER_CLASS.ALL, CARD_RARITY.COMMON, battlecry=Battlecry(Destroy(), WeaponSelector(EnemyPlayer())))§§ def create_minion(self, player):§ return Minion(3, 2)§'
tgt = 'class AcidSwampOoze(MinionCard):§ def __init__(self):§ super().__init__("Acidic Swamp Ooze", 2, CHARACTER_CLASS.ALL, CARD_RARITY.COMMON, battlecry=Battlecry(Destroy(), WeaponSelector(EnemyPlayer())))§§ def create_minion(self, player):§ return Minion(3, 2)§'
src = src.replace("Β§","\n")
tgt = tgt.replace("Β§","\n")
res = module.compute(predictions = [tgt], references = [[src]])
print(res)
#{'CodeBLEU': 0.9473264567644872, 'ngram_match_score': 0.8915993127600096, 'weighted_ngram_match_score': 0.8977065142979394, 'syntax_match_score': 1.0, 'dataflow_match_score': 1.0}
Inputs
- predictions (
list
ofstr
s): Translations to score. - references (
list
oflist
s ofstr
s): references for each translation. - lang programming language in ['java','js','c_sharp','php','go','python','ruby']
- tokenizer: approach used for standardizing
predictions
andreferences
. The default tokenizer istokenizer_13a
, a relatively minimal tokenization approach that is however equivalent tomteval-v13a
, used by WMT. This can be replaced by another tokenizer from a source such as SacreBLEU. - params: str, weights for averaging(see CodeBLEU paper). Defaults to equal weights "0.25,0.25,0.25,0.25".
Output Values
- CodeBLEU: resulting score,
- ngram_match_score: See paper CodeBLEU,
- weighted_ngram_match_score: See paper CodeBLEU,
- syntax_match_score: See paper CodeBLEU,
- dataflow_match_score: See paper CodeBLEU,
Values from Popular Papers
Give examples, preferrably with links to leaderboards or publications, to papers that have reported this metric, along with the values they have reported.
Examples
Give code examples of the metric being used. Try to include examples that clear up any potential ambiguity left from the metric description above. If possible, provide a range of examples that show both typical and atypical results, as well as examples where a variety of input parameters are passed.
Limitations and Bias
Linux OS only. See above a set of programming languages supported.
Citation
@InProceedings{huggingface:module,
title = {CodeBLEU: A Metric for Evaluating Code Generation},
authors={Sedykh, Ivan},
year={2022}
}
Further References
Add any useful further references.