|
--- |
|
license: llama3 |
|
language: |
|
- ko |
|
tags: |
|
- korean |
|
- llama3 |
|
- instruction-tuning |
|
- dora |
|
datasets: |
|
- Acyrl |
|
- llm-kr-eval |
|
- Counter-MT-bench |
|
base_model: |
|
- meta-llama/Meta-Llama-3-8B |
|
pipeline_tag: text-generation |
|
--- |
|
|
|
# A-LLM: Korean Language Model based on Llama-3 |
|
|
|
## Introduction |
|
A-LLM is a Korean language model built on Meta's Llama-3-8B architecture, specifically optimized for Korean language understanding and generation. The model was trained using the DoRA (Weight-Decomposed Low-Rank Adaptation) methodology on a comprehensive Korean dataset, achieving state-of-the-art performance among open-source Korean language models. |
|
|
|
|
|
## Performance Benchmarks |
|
### Horangi Korean LLM Leaderboard |
|
The model's performance was evaluated using the Horangi Korean LLM Leaderboard |
|
, which combines two major evaluation frameworks normalized to a 1.0 scale and averages their scores. |
|
|
|
#### 1. LLM-KR-EVAL |
|
A comprehensive benchmark that measures fundamental NLP capabilities across 5 core tasks: |
|
- Natural Language Inference (NLI) |
|
- Question Answering (QA) |
|
- Reading Comprehension (RC) |
|
- Entity Linking (EL) |
|
- Fundamental Analysis (FA) |
|
|
|
The benchmark comprises 10 different datasets distributed across these tasks, providing a thorough assessment of Korean language understanding and processing capabilities. |
|
|
|
#### 2. MT-Bench |
|
A diverse evaluation framework consisting of 80 questions (10 questions each from 8 categories), evaluated using GPT-4 as the judge. Categories include: |
|
- Writing |
|
- Roleplay |
|
- Extraction |
|
- Reasoning |
|
- Math |
|
- Coding |
|
- Knowledge (STEM) |
|
- Knowledge (Humanities/social science) |
|
|
|
### Performance Results |
|
|
|
| Model | Total Score | AVG_llm_kr_eval | AVG_mtbench | |
|
|-------|-------------|-----------------|-------------| |
|
| A-LLM (Ours) | 0.6675 | 0.5937 | 7.413 | |
|
| GPT-4 | 0.7363 | 0.6158 | 8.569 | |
|
| Mixtral-8x7B | 0.5843 | 0.4304 | 7.381 | |
|
| KULLM3 | 0.5764 | 0.5204 | 6.325 | |
|
| SOLAR-1-mini | 0.5173 | 0.37 | 6.647 | |
|
|
|
Our model achieves state-of-the-art performance among open-source Korean language models, demonstrating strong capabilities across both general language understanding (LLM-KR-EVAL) and diverse task-specific applications (MT-Bench). |
|
|
|
### Model Components |
|
This repository provides: |
|
- Tokenizer configuration |
|
- Model weights in safetensor format |
|
|
|
## Usage Instructions |
|
|
|
### Prerequisites |
|
- Python 3.8 or higher |
|
- PyTorch 2.0 or higher |
|
- Transformers library |
|
|