Description

This model was developed by Kundyz Maksutova, PhD Candidate, as part of research on question-answering systems in the Kazakh language. It is a fine-tuned version of FacebookAI/xlm-roberta-large on the Kundyzka/informatics_kaz dataset, specifically optimized for handling questions in the domain of computer science.

Key Features:

Base Model: FacebookAI/xlm-roberta-large
Dataset: Kundyzka/informatics_kaz
Language: Kazakh (kk)
Task: Question Answering
Performance:
- Before Training:
  - F1 Score: 26.950
  - Exact Match: 13.116
- After Training:
  - F1 Score: 70.127
  - Exact Match: 49.740

Dataset:

The Kundyzka/informatics_kaz dataset is designed to provide a diverse set of questions and answers in Kazakh, specifically covering topics in computer science. This dataset ensures that the model effectively handles domain-specific queries and terminology.

Intended Use:

This model is intended for answering questions in the Kazakh language, with potential applications in:

Educational Platforms: Assisting students with computer science-related questions.
Research Projects: Supporting the study and development of Kazakh natural language processing tools.
AI Applications: Enhancing chatbots or intelligent systems requiring domain-specific question-answering capabilities.

Limitations and Ethical Considerations:

Domain-Specific Bias: The model performs best on computer science queries and may not generalize well to other domains.
Dataset Bias: The dataset may introduce biases that affect model predictions.
Language Support: The model is optimized for Kazakh and does not handle other languages.

Tags:

computerscience
question-answering
Kazakh

This model represents a significant contribution to improving natural language processing tools for low-resource languages like Kazakh. For further details or customization, refer to the model repository.

Kundyzka
/

XLM-Roberta-large-informatics-kaz