Spaces:

dvitel
/

codebleu

Runtime error

App Files Files Community

codebleu / README.md

dvitel

allow references to be simple list

08dc526 almost 2 years ago

preview code

raw

history blame

3.22 kB

	---
	title: codebleu
	tags:
	- evaluate
	- metric
	description: "CodeBLEU"
	sdk: gradio
	sdk_version: 3.0.2
	app_file: app.py
	pinned: false
	---

	# Metric Card for CodeBLEU

	## Metric Description

	CodeBLEU from [CodeXGLUE](https://github.com/microsoft/CodeXGLUE/tree/main/Code-Code/code-to-code-trans/evaluator)
	and from article [CodeBLEU: a Method for Automatic Evaluation of Code Synthesis](https://arxiv.org/abs/2009.10297)

	NOTE: currently works on Linux machines only due to dependency on languages.so.

	## How to Use

	```python
	src = 'class AcidicSwampOoze(MinionCard):§ def __init__(self):§ super().__init__("Acidic Swamp Ooze", 2, CHARACTER_CLASS.ALL, CARD_RARITY.COMMON, battlecry=Battlecry(Destroy(), WeaponSelector(EnemyPlayer())))§§ def create_minion(self, player):§ return Minion(3, 2)§'
	tgt = 'class AcidSwampOoze(MinionCard):§ def __init__(self):§ super().__init__("Acidic Swamp Ooze", 2, CHARACTER_CLASS.ALL, CARD_RARITY.COMMON, battlecry=Battlecry(Destroy(), WeaponSelector(EnemyPlayer())))§§ def create_minion(self, player):§ return Minion(3, 2)§'
	src = src.replace("§","\n")
	tgt = tgt.replace("§","\n")
	res = module.compute(predictions = [tgt], references = [[src]])
	print(res)
	#{'CodeBLEU': 0.9473264567644872, 'ngram_match_score': 0.8915993127600096, 'weighted_ngram_match_score': 0.8977065142979394, 'syntax_match_score': 1.0, 'dataflow_match_score': 1.0}
	```

	### Inputs
	- predictions (`list` of `str`s): Translations to score.
	- references (`list` of `list`s of `str`s): references for each translation.
	- lang programming language in ['java','js','c_sharp','php','go','python','ruby']
	- tokenizer: approach used for standardizing `predictions` and `references`.
	The default tokenizer is `tokenizer_13a`, a relatively minimal tokenization approach that is however equivalent to `mteval-v13a`, used by WMT.
	This can be replaced by another tokenizer from a source such as [SacreBLEU](https://github.com/mjpost/sacrebleu/tree/master/sacrebleu/tokenizers).
	- params: str, weights for averaging(see CodeBLEU paper).
	Defaults to equal weights "0.25,0.25,0.25,0.25".

	### Output Values

	- CodeBLEU: resulting score,
	- ngram_match_score: See paper CodeBLEU,
	- weighted_ngram_match_score: See paper CodeBLEU,
	- syntax_match_score: See paper CodeBLEU,
	- dataflow_match_score: See paper CodeBLEU,

	#### Values from Popular Papers
	Give examples, preferrably with links to leaderboards or publications, to papers that have reported this metric, along with the values they have reported.

	### Examples
	Give code examples of the metric being used. Try to include examples that clear up any potential ambiguity left from the metric description above. If possible, provide a range of examples that show both typical and atypical results, as well as examples where a variety of input parameters are passed.

	## Limitations and Bias
	Linux OS only. See above a set of programming languages supported.

	## Citation
	```bibtex
	@InProceedings{huggingface:module,
	title = {CodeBLEU: A Metric for Evaluating Code Generation},
	authors={Sedykh, Ivan},
	year={2022}
	}
	```

	## Further References
	Add any useful further references.