Spaces:

evaluate-metric
/

sari

Running

App Files Files Community

sari / README.md

lvwerra HF staff

Update Space (evaluate main: 05209ece)

cf41915 over 2 years ago

preview code

raw

history blame

5.99 kB

	---
	title: SARI
	emoji: 🤗
	colorFrom: blue
	colorTo: red
	sdk: gradio
	sdk_version: 3.0.2
	app_file: app.py
	pinned: false
	tags:
	- evaluate
	- metric
	description: >-
	SARI is a metric used for evaluating automatic text simplification systems.
	The metric compares the predicted simplified sentences against the reference
	and the source sentences. It explicitly measures the goodness of words that are
	added, deleted and kept by the system.
	Sari = (F1_add + F1_keep + P_del) / 3
	where
	F1_add: n-gram F1 score for add operation
	F1_keep: n-gram F1 score for keep operation
	P_del: n-gram precision score for delete operation
	n = 4, as in the original paper.

	This implementation is adapted from Tensorflow's tensor2tensor implementation [3].
	It has two differences with the original GitHub [1] implementation:
	(1) Defines 0/0=1 instead of 0 to give higher scores for predictions that match
	a target exactly.
	(2) Fixes an alleged bug [2] in the keep score computation.
	[1] https://github.com/cocoxu/simplification/blob/master/SARI.py
	(commit 0210f15)
	[2] https://github.com/cocoxu/simplification/issues/6
	[3] https://github.com/tensorflow/tensor2tensor/blob/master/tensor2tensor/utils/sari_hook.py
	---

	# Metric Card for SARI


	## Metric description
	SARI (*system output against references and against the input sentence*) is a metric used for evaluating automatic text simplification systems.

	The metric compares the predicted simplified sentences against the reference and the source sentences. It explicitly measures the goodness of words that are added, deleted and kept by the system.

	SARI can be computed as:

	`sari = ( F1_add + F1_keep + P_del) / 3`

	where

	`F1_add` is the n-gram F1 score for add operations

	`F1_keep` is the n-gram F1 score for keep operations

	`P_del` is the n-gram precision score for delete operations

	The number of n grams, `n`, is equal to 4, as in the original paper.

	This implementation is adapted from [Tensorflow's tensor2tensor implementation](https://github.com/tensorflow/tensor2tensor/blob/master/tensor2tensor/utils/sari_hook.py).
	It has two differences with the [original GitHub implementation](https://github.com/cocoxu/simplification/blob/master/SARI.py):

	1) It defines 0/0=1 instead of 0 to give higher scores for predictions that match a target exactly.
	2) It fixes an [alleged bug](https://github.com/cocoxu/simplification/issues/6) in the keep score computation.



	## How to use

	The metric takes 3 inputs: sources (a list of source sentence strings), predictions (a list of predicted sentence strings) and references (a list of lists of reference sentence strings)

	```python
	from evaluate import load
	sari = load("sari")
	sources=["About 95 species are currently accepted."]
	predictions=["About 95 you now get in."]
	references=[["About 95 species are currently known.","About 95 species are now accepted.","95 species are now accepted."]]
	sari_score = sari.compute(sources=sources, predictions=predictions, references=references)
	```
	## Output values

	This metric outputs a dictionary with the SARI score:

	```
	print(sari_score)
	{'sari': 26.953601953601954}
	```

	The range of values for the SARI score is between 0 and 100 -- the higher the value, the better the performance of the model being evaluated, with a SARI of 100 being a perfect score.

	### Values from popular papers

	The [original paper that proposes the SARI metric](https://aclanthology.org/Q16-1029.pdf) reports scores ranging from 26 to 43 for different simplification systems and different datasets. They also find that the metric ranks all of the simplification systems and human references in the same order as the human assessment used as a comparison, and that it correlates reasonably with human judgments.

	More recent SARI scores for text simplification can be found on leaderboards for datasets such as [TurkCorpus](https://paperswithcode.com/sota/text-simplification-on-turkcorpus) and [Newsela](https://paperswithcode.com/sota/text-simplification-on-newsela).

	## Examples

	Perfect match between prediction and reference:

	```python
	from evaluate import load
	sari = load("sari")
	sources=["About 95 species are currently accepted ."]
	predictions=["About 95 species are currently accepted ."]
	references=[["About 95 species are currently accepted ."]]
	sari_score = sari.compute(sources=sources, predictions=predictions, references=references)
	print(sari_score)
	{'sari': 100.0}
	```

	Partial match between prediction and reference:

	```python
	from evaluate import load
	sari = load("sari")
	sources=["About 95 species are currently accepted ."]
	predictions=["About 95 you now get in ."]
	references=[["About 95 species are currently known .","About 95 species are now accepted .","95 species are now accepted ."]]
	sari_score = sari.compute(sources=sources, predictions=predictions, references=references)
	print(sari_score)
	{'sari': 26.953601953601954}
	```

	## Limitations and bias

	SARI is a valuable measure for comparing different text simplification systems as well as one that can assist the iterative development of a system.

	However, while the [original paper presenting SARI](https://aclanthology.org/Q16-1029.pdf) states that it captures "the notion of grammaticality and meaning preservation", this is a difficult claim to empirically validate.

	## Citation

	```bibtex
	@inproceedings{xu-etal-2016-optimizing,
	title = {Optimizing Statistical Machine Translation for Text Simplification},
	authors={Xu, Wei and Napoles, Courtney and Pavlick, Ellie and Chen, Quanze and Callison-Burch, Chris},
	journal = {Transactions of the Association for Computational Linguistics},
	volume = {4},
	year={2016},
	url = {https://www.aclweb.org/anthology/Q16-1029},
	pages = {401--415},
	}
	```

	## Further References

	- [NLP Progress -- Text Simplification](http://nlpprogress.com/english/simplification.html)
	- [Hugging Face Hub -- Text Simplification Models](https://huggingface.co/datasets?filter=task_ids:text-simplification)