uvegesistvan
commited on
Commit
•
f14e674
1
Parent(s):
1d9f8cf
Update README.md
Browse files
README.md
CHANGED
@@ -26,8 +26,8 @@ widget:
|
|
26 |
Cased fine-tuned BERT model for Hungarian, trained on a dataset provided by National Tax and Customs Administration - Hungary (NAV): Public Accessibilty Programme.
|
27 |
Refined version of the huBERTPlain ('uvegesistvan/huBERTPlain') model.
|
28 |
Trainig data cleaned further:
|
29 |
-
|
30 |
-
|
31 |
|
32 |
## Intended uses & limitations
|
33 |
|
|
|
26 |
Cased fine-tuned BERT model for Hungarian, trained on a dataset provided by National Tax and Customs Administration - Hungary (NAV): Public Accessibilty Programme.
|
27 |
Refined version of the huBERTPlain ('uvegesistvan/huBERTPlain') model.
|
28 |
Trainig data cleaned further:
|
29 |
+
* Minor corrections in sentence segmentation results.
|
30 |
+
* Train data filtered: sentence pairs (original - rephrased) filtered out in each document, where two sentences' Levenstein distance was less then 3. These assumed to be spelling corrections, therefore not helping Plain Language classification.
|
31 |
|
32 |
## Intended uses & limitations
|
33 |
|