File size: 2,653 Bytes
175cd1f
814113b
 
 
 
 
 
 
 
175cd1f
23bb343
175cd1f
 
 
 
814113b
 
 
 
23bb343
814113b
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
23bb343
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
814113b
 
 
 
 
 
 
 
 
 
 
23bb343
814113b
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
e922891
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
---
title: Levenshtein distance
emoji: ✍️
colorFrom: blue
colorTo: green
tags:
- evaluate
- metric
description: Levenshtein (edit) distance
sdk: gradio
sdk_version: 5.24.0
app_file: app.py
pinned: false
---

# Metric Card for the Levenshtein (edit) distance

## Metric Description

This metric computes the Levenshtein distance, also commonly called "edit distance". The Levenshtein distance measures the number of combined insertions, deletions and substitutions operations (one per character) to perform on a string so that it becomes identical to a second one. It is a popular metric for text similarity.
This module directly calls the [Levenshtein package](https://github.com/rapidfuzz/Levenshtein) for fast execution speed.

## How to Use

### Inputs

*List all input arguments in the format below*
- **predictions** *(string): sequence of prediction strings*
- **references** *(string): sequence of reference string;*
- **kwargs** *keyword arguments to pass to the [Levenshtein.distance](https://rapidfuzz.github.io/Levenshtein/levenshtein.html#Levenshtein.distance) method.*

### Output Values

Dictionary mapping to the average Levenshtein distance (lower is better) and the ratio ([0, 1]) distance (higher is better).

### Examples

#### Levenshtein distance

```Python
import evaluate

levenshtein = evaluate.load("Natooz/Levenshtein")
results = levenshtein.compute(
    predictions=[
        "foo", "baroo"  # 0 and 2 edits
    ],
    references=[
        "foo", "bar"
    ],
)
print(results)
# {"levenshtein": 1, "levenshtein_ratio": 0.875}
```

#### Indel (insertion-deletion) distance

The weight of each operation can be provided in order to customize the score. For example, the substitution score can be set to 2 to compute the "indel" distance, so that each substitution is counted as two operations (deletion + insertion).

```Python
import evaluate

levenshtein = evaluate.load("Natooz/Levenshtein")
results = levenshtein.compute(
    predictions=[
        "foo", "baroo"  # 0 and 2 edits
    ],
    references=[
        "foo", "bar"
    ],
    weights=(1, 1, 2),  # weight of 2 for substitutions
)
print(results)
# {"levenshtein": 1, "levenshtein_ratio": 0.875}
```

## Citation

```bibtex
@ARTICLE{1966SPhD...10..707L,
       author = {{Levenshtein}, V.~I.},
        title = "{Binary Codes Capable of Correcting Deletions, Insertions and Reversals}",
      journal = {Soviet Physics Doklady},
         year = 1966,
        month = feb,
       volume = {10},
        pages = {707},
       adsurl = {https://ui.adsabs.harvard.edu/abs/1966SPhD...10..707L},
      adsnote = {Provided by the SAO/NASA Astrophysics Data System}
}
```