File size: 3,096 Bytes
9f74b24 7d4e785 0ab0bcd 5282cf8 7d4e785 e5a9824 f2bc8c0 8c0a822 f2bc8c0 e5a9824 7d4e785 b0d6224 7d4e785 e5a9824 7d4e785 cfc0f6e c400a42 cfc0f6e c400a42 cfc0f6e c400a42 cfc0f6e b0d6224 8c0a822 79f5310 7d4e785 e13ff7a 79f5310 e13ff7a e5a9824 c400a42 589cbb7 b726268 791abb8 589cbb7 2455542 e13ff7a |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 |
---
license: apache-2.0
datasets:
- papluca/language-identification
language:
- en
- de
- fr
- es
metrics:
- precision
- recall
- f1
- accuracy
pipeline_tag: text-classification
---
# German, English, French and Spanish Language Detector
The GEFS-language-detector model outperformed by achieving an impressive F1 score close to 100%. This result significantly exceeds typical benchmarks and underscores the model's accuracy and reliability in identifying languages.
This is a fined tuned model by using the dataset of papluca [Language Identification](https://huggingface.co/datasets/papluca/language-identification#additional-information) and the base model [xlm-roberta-base](https://huggingface.co/xlm-roberta-base) .
## Predicted output:
Model will return the language detection in the language codes like:
```
- de as German
- en as English
- fr as French
- es as Spanish
```
## Supported languages
Currently this model support 4 languages but in future more languages will be added.
Following languages supported by the model:
- German (de)
- English (en)
- French (fr)
- Spanish (es)
# Use a pipeline as a high-level helper
```python
from transformers import pipeline
text=["Mir gefällt die Art und Weise, Sprachen zu erkennen",
"I like the way to detect languages",
"Me gusta la forma de detectar idiomas",
"J'aime la façon de détecter les langues"]
pipe = pipeline("text-classification", model="ImranzamanML/GEFS-language-detector")
lang_detect=pipe(text, top_k=1)
print("The detected language is", lang_detect)
```
# Load model directly
```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification
tokenizer = AutoTokenizer.from_pretrained("ImranzamanML/GEFS-language-detector")
model = AutoModelForSequenceClassification.from_pretrained("ImranzamanML/GEFS-language-detector")
```
## Model Training
Epoch Training Loss Validation Loss
1 0.002600 0.000148
2 0.001000 0.000015
3 0.000000 0.000011
4 0.001800 0.000009
5 0.002700 0.000016
6 0.001600 0.000012
7 0.001300 0.000009
8 0.001200 0.000008
9 0.000900 0.000007
10 0.000900 0.000007
## Testing Results
```
Language Precision Recall F1 Accuracy
de 0.9997 0.9998 0.9998 0.9999
en 1.0000 1.0000 1.0000 1.0000
fr 0.9995 0.9996 0.9996 0.9996
es 0.9994 0.9996 0.9995 0.9996
```
## About Author
**Name**: Muhammad Imran Zaman
**Company**: [Theum AG](https://theum.com/en/index.htm?t=)
**Role**: Lead Machine Learning Engineer
**Professional Links**:
- Kaggle: [Profile](https://www.kaggle.com/muhammadimran112233)
- LinkedIn: [Profile](linkedin.com/in/muhammad-imran-zaman)
- Google Scholar: [Profile](https://scholar.google.com/citations?user=ulVFpy8AAAAJ&hl=en)
- YouTube: [Channel](https://www.youtube.com/@consolioo)
- GitHub: [Channel](https://github.com/Imran-ml)
|