Spaces:

ginic
/

phone_errors

Sleeping

App Files Files Community

ginic commited on Mar 28, 2024

Commit

14b55be

1 Parent(s): 2b6b085

Changed pluralization for mean keys

Browse files

Files changed (2) hide show

README.md +9 -5
phone_distance.py +6 -6

README.md CHANGED Viewed

@@ -39,9 +39,9 @@ The computation returns a dictionary with the following key and values:
  - **phone_error_rates** (`list` of `float`): Phone error rate (PER) gives edit distance in terms of phones for each prediction-reference pair, rather than Unicode characters, since phones can consist of multiple characters. It is normalized by the number of phones of the reference string. The result with have the same length as the input prediction and reference lists.
  - **mean_phone_error_rate** (`float`): Overall mean of PER.
  - **phone_feature_error_rates** (`list` of `float`): Phone feature error rate (PFER) is Levenshtein distance between strings where distance between individual phones is computed using Hamming distance between phonetic features for each prediction-reference pair. By default it is a metric that obeys the triangle equality, but can also be normalized by number of phones.
- - **mean_phone_feature_error_rates** (`float`):  Overall mean of PFER.
  - **feature_error_rates** (`list` of `float`): Feature error rate (FER) is the edit distance in terms of articulatory features normalized by the number of phones in the reference, computed for each prediction-reference pair.
- - **mean_feature_error_rates** (`float`): Overall mean of FER.
 #### Values from Popular Papers
@@ -52,19 +52,23 @@ The computation returns a dictionary with the following key and values:
 Simplest use case to compute phone error rates between two IPA strings:
 ```python
 >>> phone_distance.compute(predictions=["bob", "ði", "spin"], references=["pop", "ðə", "spʰin"])
-{'phone_error_rates': [0.6666666666666666, 0.5, 0.25], 'mean_phone_error_rate': 0.47222222222222215, 'phone_feature_error_rates': [0.08333333333333333, 0.125, 0.041666666666666664], 'mean_phone_feature_error_rates': 0.08333333333333333, 'feature_error_rates': [0.027777777777777776, 0.0625, 0.30208333333333337], 'mean_feature_error_rates': 0.13078703703703706}
 ```
 Normalize phone feature error rate by the length of the reference string:
 ```python
 >>> phone_distance.compute(predictions=["bob", "ði"], references=["pop", "ðə"], is_normalize_pfer=True)
-{'phone_error_rates': [0.6666666666666666, 0.5], 'mean_phone_error_rate': 0.5833333333333333, 'phone_feature_error_rates': [0.027777777777777776, 0.0625], 'mean_phone_feature_error_rates': 0.04513888888888889, 'feature_error_rates': [0.027777777777777776, 0.0625], 'mean_feature_error_rates': 0.04513888888888889}
 ```
 Error rates may be greater than 1.0 if the reference string is shorter than the prediction string:
 ```python
 >>> phone_distance.compute(predictions=["bob"], references=["po"])
-{'phone_error_rates': [1.0], 'mean_phone_error_rate': 1.0, 'phone_feature_error_rates': [1.0416666666666667], 'mean_phone_feature_error_rates': 1.0416666666666667, 'feature_error_rates': [0.020833333333333332], 'mean_feature_error_rates': 0.020833333333333332}
 ```
 Empty reference strings will cause an ValueError, you should handle them separately:

  - **phone_error_rates** (`list` of `float`): Phone error rate (PER) gives edit distance in terms of phones for each prediction-reference pair, rather than Unicode characters, since phones can consist of multiple characters. It is normalized by the number of phones of the reference string. The result with have the same length as the input prediction and reference lists.
  - **mean_phone_error_rate** (`float`): Overall mean of PER.
  - **phone_feature_error_rates** (`list` of `float`): Phone feature error rate (PFER) is Levenshtein distance between strings where distance between individual phones is computed using Hamming distance between phonetic features for each prediction-reference pair. By default it is a metric that obeys the triangle equality, but can also be normalized by number of phones.
+ - **mean_phone_feature_error_rate** (`float`):  Overall mean of PFER.
  - **feature_error_rates** (`list` of `float`): Feature error rate (FER) is the edit distance in terms of articulatory features normalized by the number of phones in the reference, computed for each prediction-reference pair.
+ - **mean_feature_error_rate** (`float`): Overall mean of FER.
 #### Values from Popular Papers
 Simplest use case to compute phone error rates between two IPA strings:
 ```python
 >>> phone_distance.compute(predictions=["bob", "ði", "spin"], references=["pop", "ðə", "spʰin"])
+{'phone_error_rates': [0.6666666666666666, 0.5, 0.25], 'mean_phone_error_rate': 0.47222222222222215, 'phone_feature_error_rates': [0.08333333333333333, 0.125, 0.041666666666666664], 'mean_phone_feature_error_rate': 0.08333333333333333, 'feature_error_rates': [0.027777777777777776, 0.0625, 0.30208333333333337], 'mean_feature_error_rate': 0.13078703703703706}
 ```
 Normalize phone feature error rate by the length of the reference string:
 ```python
 >>> phone_distance.compute(predictions=["bob", "ði"], references=["pop", "ðə"], is_normalize_pfer=True)
+{'phone_error_rates': [0.6666666666666666, 0.5], 'mean_phone_error_rate': 0.5833333333333333,
+ 'phone_feature_error_rates': [0.027777777777777776, 0.0625], 'mean_phone_feature_error_rate': 0.04513888888888889,
+ 'feature_error_rates': [0.027777777777777776, 0.0625], 'mean_feature_error_rate': 0.04513888888888889}
 ```
 Error rates may be greater than 1.0 if the reference string is shorter than the prediction string:
 ```python
 >>> phone_distance.compute(predictions=["bob"], references=["po"])
+{'phone_error_rates': [1.0], 'mean_phone_error_rate': 1.0,
+ 'phone_feature_error_rates': [1.0416666666666667], 'mean_phone_feature_error_rate': 1.0416666666666667,
+ 'feature_error_rates': [0.020833333333333332], 'mean_feature_error_rate': 0.020833333333333332}
 ```
 Empty reference strings will cause an ValueError, you should handle them separately:

phone_distance.py CHANGED Viewed

@@ -65,15 +65,15 @@ Returns:
     phone_error_rates: list of floats giving PER for each prediction, reference pair
     mean_phone_error_rate: float, average PER across all examples
     phone_feature_error_rates: list of floats giving PFER for each prediction, reference pair
-    mean_phone_feature_error_rates: float, average PFER across all examples
     feature_error_rates: list of floats giving FER for each prediction, reference pair
-    mean_feature_error_rates: float, average FER across all examples
 Examples:
     Compare articulatory differences in voicing in "bob" vs. "pop" and different pronunciations of "the":
 >>> phone_distance = evaluate.load("ginic/phone_errors")
 >>> phone_distance.compute(predictions=["bob", "ði"], references=["pop", "ðə"])
-{'phone_error_rates': [0.6666666666666666, 0.5], 'mean_phone_error_rate': 0.5833333333333333, 'phone_feature_error_rates': [0.08333333333333333, 0.125], 'mean_phone_feature_error_rates': 0.10416666666666666, 'feature_error_rates': [0.027777777777777776, 0.0625], 'mean_feature_error_rates': 0.04513888888888889}
     Normalize PFER by the length of string with largest number of phones:
 >>> phone_distance = evaluate.load("ginic/phone_errors")
@@ -147,7 +147,7 @@ class PhoneDistance(evaluate.Metric):
         Returns:
             dict:  {"phone_error_rates": list[float], "mean_phone_error_rate": float, "phone_feature_error_rates": list[float], "mean_phone_feature_error_rates": float,
-                    "feature_error_rates": list[float], "mean_feature_error_rates": float}
         """
         distance_computer = panphon.distance.Distance(feature_model=feature_model)
         phone_error_rates = []
@@ -168,8 +168,8 @@ class PhoneDistance(evaluate.Metric):
             "phone_error_rates": phone_error_rates,
             "mean_phone_error_rate": np.mean(phone_error_rates),
             "phone_feature_error_rates": hamming_distances,
-            "mean_phone_feature_error_rates": np.mean(hamming_distances),
             "feature_error_rates": feature_error_rates,
-            "mean_feature_error_rates": np.mean(feature_error_rates)
         }

     phone_error_rates: list of floats giving PER for each prediction, reference pair
     mean_phone_error_rate: float, average PER across all examples
     phone_feature_error_rates: list of floats giving PFER for each prediction, reference pair
+    mean_phone_feature_error_rate: float, average PFER across all examples
     feature_error_rates: list of floats giving FER for each prediction, reference pair
+    mean_feature_error_rate: float, average FER across all examples
 Examples:
     Compare articulatory differences in voicing in "bob" vs. "pop" and different pronunciations of "the":
 >>> phone_distance = evaluate.load("ginic/phone_errors")
 >>> phone_distance.compute(predictions=["bob", "ði"], references=["pop", "ðə"])
+{'phone_error_rates': [0.6666666666666666, 0.5], 'mean_phone_error_rate': 0.5833333333333333, 'phone_feature_error_rates': [0.08333333333333333, 0.125], 'mean_phone_feature_error_rate': 0.10416666666666666, 'feature_error_rates': [0.027777777777777776, 0.0625], 'mean_feature_error_rate': 0.04513888888888889}
     Normalize PFER by the length of string with largest number of phones:
 >>> phone_distance = evaluate.load("ginic/phone_errors")
         Returns:
             dict:  {"phone_error_rates": list[float], "mean_phone_error_rate": float, "phone_feature_error_rates": list[float], "mean_phone_feature_error_rates": float,
+                    "feature_error_rates": list[float], "mean_feature_error_rate": float}
         """
         distance_computer = panphon.distance.Distance(feature_model=feature_model)
         phone_error_rates = []
             "phone_error_rates": phone_error_rates,
             "mean_phone_error_rate": np.mean(phone_error_rates),
             "phone_feature_error_rates": hamming_distances,
+            "mean_phone_feature_error_rate": np.mean(hamming_distances),
             "feature_error_rates": feature_error_rates,
+            "mean_feature_error_rate": np.mean(feature_error_rates)
         }