rbawden
/

modern_french_normalisation

Text2Text Generation

Model card Files Files and versions Community

rbawden commited on Jan 11, 2023

Commit

a599d18

·

1 Parent(s): ece6db6

Update README.md

Files changed (1) hide show

README.md +6 -0

README.md CHANGED Viewed

@@ -29,6 +29,12 @@ print(list_outputs)
 >> [{'text': 'Elle haïssait particulièrement le Cardinal de Lorraine; ', 'alignment': [([0, 3], [0, 3]), ([5, 12], [5, 12]), ([14, 29], [14, 29]), ([31, 32], [31, 32]), ([34, 41], [34, 41]), ([43, 44], [43, 44]), ([46, 53], [46, 53]), ([54, 54], [54, 54])]}, {'text': "Adieu, j'irai chez vous tantôt vous rendre grâce. ", 'alignment': [([0, 4], [0, 4]), ([5, 5], [5, 5]), ([7, 8], [7, 8]), ([9, 12], [9, 12]), ([14, 17], [14, 17]), ([19, 22], [19, 22]), ([24, 30], [24, 29]), ([32, 35], [31, 34]), ([37, 42], [36, 41]), ([44, 48], [43, 47]), ([49, 49], [48, 48])]}]
 ```
 ### Limitations and bias
 The model has been learnt in a supervised fashion and therefore like any such model is likely to perform well on texts similar to those used for training and less well on other texts. Whilst care was taken to include a range of different domains from different periods in the 17th c. in the training data, there are nevertheless imbalances, notably with some decades (e.g. 1610s) being underrepresented.

 >> [{'text': 'Elle haïssait particulièrement le Cardinal de Lorraine; ', 'alignment': [([0, 3], [0, 3]), ([5, 12], [5, 12]), ([14, 29], [14, 29]), ([31, 32], [31, 32]), ([34, 41], [34, 41]), ([43, 44], [43, 44]), ([46, 53], [46, 53]), ([54, 54], [54, 54])]}, {'text': "Adieu, j'irai chez vous tantôt vous rendre grâce. ", 'alignment': [([0, 4], [0, 4]), ([5, 5], [5, 5]), ([7, 8], [7, 8]), ([9, 12], [9, 12]), ([14, 17], [14, 17]), ([19, 22], [19, 22]), ([24, 30], [24, 29]), ([32, 35], [31, 34]), ([37, 42], [36, 41]), ([44, 48], [43, 47]), ([49, 49], [48, 48])]}]
 ```
+To disable postprocessing (faster but less good normalisation), set the arguments `no_postproc_lex` and `no_post_clean` to True when instantiating the pipeline:
+```
+normaliser = pipeline(model="rbawden/modern_french_normalisation", no_postproc_lex=True, no_post_clean=True, batch_size=32, beam_size=5, cache_file="./cache.pickle", trust_remote_code=True)
+```
 ### Limitations and bias
 The model has been learnt in a supervised fashion and therefore like any such model is likely to perform well on texts similar to those used for training and less well on other texts. Whilst care was taken to include a range of different domains from different periods in the 17th c. in the training data, there are nevertheless imbalances, notably with some decades (e.g. 1610s) being underrepresented.