File size: 2,623 Bytes
ff634e6 a7ca9ac ff634e6 a7ca9ac ff634e6 a7ca9ac ff634e6 a7ca9ac ff634e6 a7ca9ac ff634e6 a7ca9ac ff634e6 a7ca9ac ff634e6 a7ca9ac ff634e6 a7ca9ac ff634e6 a7ca9ac ff634e6 a7ca9ac ff634e6 a7ca9ac ff634e6 a7ca9ac ff634e6 a7ca9ac ff634e6 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 |
---
license: cc-by-nc-4.0
language:
- hu
- en
metrics:
- accuracy
- f1
model-index:
- name: Hun_Eng_RoBERTa_base_Plain
results:
- task:
type: text-classification
metrics:
- type: accuracy
value: 0.75 (hu) / 0.65 (en)
- type: f1
value: 0.74 (hu) / 0.64 (en)
widget:
- text: "A tanúsítvány meghatározott adatainak a 2008/118/EK irányelv IV. fejezete szerinti szállításához szükséges adminisztratív okmányban..."
example_title: "Incomprehensible"
- text: "Az AEO-engedély birtokosainak listáján – keresésre – megjelenő információk: az engedélyes neve, az engedélyt kibocsátó ország..."
example_title: "Comprehensible"
---
## Model description
Cased fine-tuned `XLM-RoBERTa-base` model for Hungarian and English, trained on datasets provided by the National Tax and Customs Administration - Hungary (NAV) and translated versions of the same dataset using Google Translate API.
## Intended uses & limitations
The model is designed to classify sentences as either "comprehensible" or "not comprehensible" (according to Plain Language guidelines):
* **Label_0** - "comprehensible" - The sentence is in Plain Language.
* **Label_1** - "not comprehensible" - The sentence is **not** in Plain Language.
## Training
Fine-tuned version of the original `xlm-roberta-base` model, trained on a dataset of Hungarian legal and administrative texts. The model was also trained on the translated version of this dataset (via Google Translate API) for English classification.
## Eval results
### Hungarian Results:
| Class | Precision | Recall | F1-Score |
| ----- | --------- | ------ | -------- |
| **Comprehensible / Label_0** | **0.82** | **0.62** | **0.70** |
| **Not comprehensible / Label_1** | **0.71** | **0.88** | **0.78** |
| **accuracy** | | | **0.75** |
| **macro avg** | **0.77** | **0.75** | **0.74** |
| **weighted avg** | **0.76** | **0.75** | **0.74** |
### English Results:
| Class | Precision | Recall | F1-Score |
| ----- | --------- | ------ | -------- |
| **Comprehensible / Label_0** | **0.70** | **0.50** | **0.58** |
| **Not comprehensible / Label_1** | **0.63** | **0.80** | **0.70** |
| **accuracy** | | | **0.65** |
| **macro avg** | **0.66** | **0.65** | **0.64** |
| **weighted avg** | **0.66** | **0.65** | **0.64** |
## Usage
```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification
tokenizer = AutoTokenizer.from_pretrained("uvegesistvan/Hun_Eng_RoBERTa_base_Plain")
model = AutoModelForSequenceClassification.from_pretrained("uvegesistvan/Hun_Eng_RoBERTa_base_Plain")
```
|