CallMeDaniel
commited on
Update README.md
Browse files
README.md
CHANGED
@@ -5,45 +5,56 @@ language:
|
|
5 |
- vi
|
6 |
---
|
7 |
|
8 |
-
#
|
9 |
|
10 |
-
|
11 |
|
12 |
|
13 |
|
14 |
## Model Details
|
15 |
|
|
|
|
|
16 |
### Model Description
|
17 |
|
18 |
<!-- Provide a longer summary of what this model is. -->
|
19 |
|
20 |
|
21 |
|
22 |
-
- **Developed by:** [
|
23 |
-
- **
|
24 |
-
- **
|
25 |
-
- **Model type:** [More Information Needed]
|
26 |
-
- **Language(s) (NLP):** [More Information Needed]
|
27 |
- **License:** [More Information Needed]
|
28 |
-
- **Finetuned from model [optional]:** [
|
|
|
29 |
|
30 |
-
### Model Sources [optional]
|
31 |
|
32 |
-
|
33 |
|
34 |
-
|
35 |
-
- **Paper [optional]:** [More Information Needed]
|
36 |
-
- **Demo [optional]:** [More Information Needed]
|
37 |
|
38 |
-
|
|
|
|
|
39 |
|
40 |
-
|
|
|
41 |
|
42 |
-
|
|
|
|
|
|
|
43 |
|
44 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
45 |
|
46 |
-
[More Information Needed]
|
47 |
|
48 |
### Downstream Use [optional]
|
49 |
|
@@ -59,9 +70,9 @@ language:
|
|
59 |
|
60 |
## Bias, Risks, and Limitations
|
61 |
|
62 |
-
|
63 |
-
|
64 |
-
|
65 |
|
66 |
### Recommendations
|
67 |
|
@@ -79,17 +90,8 @@ Use the code below to get started with the model.
|
|
79 |
|
80 |
### Training Data
|
81 |
|
82 |
-
|
83 |
|
84 |
-
[More Information Needed]
|
85 |
-
|
86 |
-
### Training Procedure
|
87 |
-
|
88 |
-
<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
|
89 |
-
|
90 |
-
#### Preprocessing [optional]
|
91 |
-
|
92 |
-
[More Information Needed]
|
93 |
|
94 |
|
95 |
#### Training Hyperparameters
|
@@ -128,7 +130,6 @@ Use the code below to get started with the model.
|
|
128 |
|
129 |
### Results
|
130 |
|
131 |
-
[More Information Needed]
|
132 |
|
133 |
#### Summary
|
134 |
|
@@ -152,71 +153,81 @@ Carbon emissions can be estimated using the [Machine Learning Impact calculator]
|
|
152 |
- **Compute Region:** [More Information Needed]
|
153 |
- **Carbon Emitted:** [More Information Needed]
|
154 |
|
155 |
-
## Technical Specifications [optional]
|
156 |
|
157 |
### Model Architecture and Objective
|
158 |
|
159 |
[More Information Needed]
|
160 |
|
161 |
-
|
|
|
162 |
|
163 |
-
|
164 |
-
|
165 |
-
|
166 |
-
|
167 |
-
|
168 |
-
|
169 |
-
|
170 |
-
|
171 |
-
|
|
|
172 |
|
173 |
-
##
|
174 |
-
|
175 |
-
<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
|
176 |
-
|
177 |
-
**BibTeX:**
|
178 |
|
179 |
-
[More Information Needed]
|
180 |
|
181 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
182 |
|
183 |
-
|
184 |
|
185 |
-
## Glossary [optional]
|
186 |
|
187 |
-
|
188 |
|
189 |
-
[More Information Needed]
|
190 |
|
191 |
-
##
|
192 |
|
193 |
-
|
194 |
|
195 |
-
##
|
196 |
|
197 |
-
|
|
|
|
|
|
|
|
|
198 |
|
199 |
-
##
|
200 |
|
201 |
-
|
202 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
203 |
|
204 |
-
|
205 |
|
|
|
206 |
|
207 |
-
|
208 |
-
- quant_method: bitsandbytes
|
209 |
-
- load_in_8bit: True
|
210 |
-
- load_in_4bit: False
|
211 |
-
- llm_int8_threshold: 6.0
|
212 |
-
- llm_int8_skip_modules: None
|
213 |
-
- llm_int8_enable_fp32_cpu_offload: False
|
214 |
-
- llm_int8_has_fp16_weight: False
|
215 |
-
- bnb_4bit_quant_type: fp4
|
216 |
-
- bnb_4bit_use_double_quant: False
|
217 |
-
- bnb_4bit_compute_dtype: float32
|
218 |
|
219 |
-
### Framework versions
|
220 |
|
|
|
221 |
|
222 |
-
-
|
|
|
|
|
|
5 |
- vi
|
6 |
---
|
7 |
|
8 |
+
# Vietnamese Fine-tuned Llama-2-7b-chat-hf
|
9 |
|
10 |
+
This repository contains a Vietnamese-tuned version of the `Llama-2-7b-chat-hf` model, which has been fine-tuned on Vietnamese datasets using LoRA (Low-Rank Adaptation) techniques.
|
11 |
|
12 |
|
13 |
|
14 |
## Model Details
|
15 |
|
16 |
+
This model is a fine-tuned version of the Llama-2-7b-chat-hf model, specifically adapted for improved performance on Vietnamese language tasks. It uses LoRA fine-tuning to efficiently adapt the large language model to Vietnamese data while maintaining much of the original model's general knowledge and capabilities.
|
17 |
+
|
18 |
### Model Description
|
19 |
|
20 |
<!-- Provide a longer summary of what this model is. -->
|
21 |
|
22 |
|
23 |
|
24 |
+
- **Developed by:** [Daniel Du](https://github.com/danghoangnhan)
|
25 |
+
- **Model type:** Large Language Model
|
26 |
+
- **Language(s) (NLP):** Vietnamese
|
|
|
|
|
27 |
- **License:** [More Information Needed]
|
28 |
+
- **Finetuned from model [optional]:** [meta-llama/Llama-2-7b-chat-hf](https://huggingface.co/meta-llama/Llama-2-7b-chat-hf)
|
29 |
+
- **Language:** Vietnamese
|
30 |
|
|
|
31 |
|
32 |
+
### Direct Use
|
33 |
|
34 |
+
You can use this model directly with the Hugging Face Transformers library:
|
|
|
|
|
35 |
|
36 |
+
```python
|
37 |
+
from transformers import AutoTokenizer, AutoModelForCausalLM
|
38 |
+
from peft import PeftModel, PeftConfig
|
39 |
|
40 |
+
# Load the base model
|
41 |
+
base_model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-2-7b-chat-hf")
|
42 |
|
43 |
+
# Load the LoRA configuration and model
|
44 |
+
peft_model_id = "CallMeMrFern/Llama-2-7b-chat-hf_vn"
|
45 |
+
config = PeftConfig.from_pretrained(peft_model_id)
|
46 |
+
model = PeftModel.from_pretrained(base_model, peft_model_id)
|
47 |
|
48 |
+
# Load the tokenizer
|
49 |
+
tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-2-7b-chat-hf")
|
50 |
+
|
51 |
+
# Example usage
|
52 |
+
input_text = "Xin chào, hôm nay thời tiết thế nào?"
|
53 |
+
inputs = tokenizer(input_text, return_tensors="pt")
|
54 |
+
outputs = model.generate(**inputs, max_length=100)
|
55 |
+
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
|
56 |
+
```
|
57 |
|
|
|
58 |
|
59 |
### Downstream Use [optional]
|
60 |
|
|
|
70 |
|
71 |
## Bias, Risks, and Limitations
|
72 |
|
73 |
+
- This model is specifically fine-tuned for Vietnamese and may not perform as well on other languages.
|
74 |
+
- The model inherits limitations from the base Llama-2-7b-chat-hf model.
|
75 |
+
- Performance may vary depending on the specific task and domain.
|
76 |
|
77 |
### Recommendations
|
78 |
|
|
|
90 |
|
91 |
### Training Data
|
92 |
|
93 |
+
Dataset: alpaca_translate_GPT_35_10_20k.json (Vietnamese translation of the Alpaca dataset)
|
94 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
95 |
|
96 |
|
97 |
#### Training Hyperparameters
|
|
|
130 |
|
131 |
### Results
|
132 |
|
|
|
133 |
|
134 |
#### Summary
|
135 |
|
|
|
153 |
- **Compute Region:** [More Information Needed]
|
154 |
- **Carbon Emitted:** [More Information Needed]
|
155 |
|
|
|
156 |
|
157 |
### Model Architecture and Objective
|
158 |
|
159 |
[More Information Needed]
|
160 |
|
161 |
+
## Citation
|
162 |
+
If you use this model in your research, please cite:
|
163 |
|
164 |
+
```
|
165 |
+
@misc{vietnamese_llama2_7b_chat,
|
166 |
+
author = {[Your Name]},
|
167 |
+
title = {Vietnamese Fine-tuned Llama-2-7b-chat-hf},
|
168 |
+
year = {2023},
|
169 |
+
publisher = {GitHub},
|
170 |
+
journal = {GitHub repository},
|
171 |
+
howpublished = {\url{https://huggingface.co/CallMeMrFern/Llama-2-7b-chat-hf_vn}}
|
172 |
+
}
|
173 |
+
```
|
174 |
|
175 |
+
## Training procedure
|
|
|
|
|
|
|
|
|
176 |
|
|
|
177 |
|
178 |
+
The following `bitsandbytes` quantization config was used during training:
|
179 |
+
- quant_method: bitsandbytes
|
180 |
+
- load_in_8bit: True
|
181 |
+
- load_in_4bit: False
|
182 |
+
- llm_int8_threshold: 6.0
|
183 |
+
- llm_int8_skip_modules: None
|
184 |
+
- llm_int8_enable_fp32_cpu_offload: False
|
185 |
+
- llm_int8_has_fp16_weight: False
|
186 |
+
- bnb_4bit_quant_type: fp4
|
187 |
+
- bnb_4bit_use_double_quant: False
|
188 |
+
- bnb_4bit_compute_dtype: float32
|
189 |
|
190 |
+
### Framework versions
|
191 |
|
|
|
192 |
|
193 |
+
- PEFT 0.6.3.dev0
|
194 |
|
|
|
195 |
|
196 |
+
## Model Description
|
197 |
|
198 |
+
This model is a fine-tuned version of the Llama-2-7b-chat-hf model, specifically adapted for improved performance on Vietnamese language tasks. It uses LoRA fine-tuning to efficiently adapt the large language model to Vietnamese data while maintaining much of the original model's general knowledge and capabilities.
|
199 |
|
200 |
+
## Fine-tuning Details
|
201 |
|
202 |
+
- **Fine-tuning Method:** LoRA (Low-Rank Adaptation)
|
203 |
+
- **LoRA Config:**
|
204 |
+
- Target Modules: `["q_proj", "v_proj"]`
|
205 |
+
- Precision: 8-bit
|
206 |
+
- **Dataset:** `alpaca_translate_GPT_35_10_20k.json` (Vietnamese translation of the Alpaca dataset)
|
207 |
|
208 |
+
## Training Procedure
|
209 |
|
210 |
+
The model was fine-tuned using the following command:
|
211 |
|
212 |
+
```bash
|
213 |
+
python finetune/lora.py \
|
214 |
+
--base_model meta-llama/Llama-2-7b-chat-hf \
|
215 |
+
--model_type llama \
|
216 |
+
--data_dir data/general/alpaca_translate_GPT_35_10_20k.json \
|
217 |
+
--output_dir finetuned/meta-llama/Llama-2-7b-chat-hf \
|
218 |
+
--lora_target_modules '["q_proj", "v_proj"]' \
|
219 |
+
--micro_batch_size 1
|
220 |
+
```
|
221 |
|
222 |
+
For multi-GPU training, a distributed training approach was used.
|
223 |
|
224 |
+
## Evaluation Results
|
225 |
|
226 |
+
[Include any evaluation results, perplexity scores, or benchmark performances here]
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
227 |
|
|
|
228 |
|
229 |
+
## Acknowledgements
|
230 |
|
231 |
+
- This project is part of the TF07 Course offered by ProtonX.
|
232 |
+
- We thank the creators of the original Llama-2-7b-chat-hf model and the Hugging Face team for their tools and resources.
|
233 |
+
- Appreciation to [VietnamAIHub/Vietnamese_LLMs](https://github.com/VietnamAIHub/Vietnamese_LLMs) for the translated dataset.
|