CallMeDaniel commited on
Commit
79dbd50
·
verified ·
1 Parent(s): b3b247d

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +85 -74
README.md CHANGED
@@ -5,45 +5,56 @@ language:
5
  - vi
6
  ---
7
 
8
- # Model Card for Model ID
9
 
10
- <!-- Provide a quick summary of what the model is/does. -->
11
 
12
 
13
 
14
  ## Model Details
15
 
 
 
16
  ### Model Description
17
 
18
  <!-- Provide a longer summary of what this model is. -->
19
 
20
 
21
 
22
- - **Developed by:** [More Information Needed]
23
- - **Funded by [optional]:** [More Information Needed]
24
- - **Shared by [optional]:** [More Information Needed]
25
- - **Model type:** [More Information Needed]
26
- - **Language(s) (NLP):** [More Information Needed]
27
  - **License:** [More Information Needed]
28
- - **Finetuned from model [optional]:** [More Information Needed]
 
29
 
30
- ### Model Sources [optional]
31
 
32
- <!-- Provide the basic links for the model. -->
33
 
34
- - **Repository:** [More Information Needed]
35
- - **Paper [optional]:** [More Information Needed]
36
- - **Demo [optional]:** [More Information Needed]
37
 
38
- ## Uses
 
 
39
 
40
- <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
 
41
 
42
- ### Direct Use
 
 
 
43
 
44
- <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
 
 
 
 
 
 
 
 
45
 
46
- [More Information Needed]
47
 
48
  ### Downstream Use [optional]
49
 
@@ -59,9 +70,9 @@ language:
59
 
60
  ## Bias, Risks, and Limitations
61
 
62
- <!-- This section is meant to convey both technical and sociotechnical limitations. -->
63
-
64
- [More Information Needed]
65
 
66
  ### Recommendations
67
 
@@ -79,17 +90,8 @@ Use the code below to get started with the model.
79
 
80
  ### Training Data
81
 
82
- <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
83
 
84
- [More Information Needed]
85
-
86
- ### Training Procedure
87
-
88
- <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
89
-
90
- #### Preprocessing [optional]
91
-
92
- [More Information Needed]
93
 
94
 
95
  #### Training Hyperparameters
@@ -128,7 +130,6 @@ Use the code below to get started with the model.
128
 
129
  ### Results
130
 
131
- [More Information Needed]
132
 
133
  #### Summary
134
 
@@ -152,71 +153,81 @@ Carbon emissions can be estimated using the [Machine Learning Impact calculator]
152
  - **Compute Region:** [More Information Needed]
153
  - **Carbon Emitted:** [More Information Needed]
154
 
155
- ## Technical Specifications [optional]
156
 
157
  ### Model Architecture and Objective
158
 
159
  [More Information Needed]
160
 
161
- ### Compute Infrastructure
 
162
 
163
- [More Information Needed]
164
-
165
- #### Hardware
166
-
167
- [More Information Needed]
168
-
169
- #### Software
170
-
171
- [More Information Needed]
 
172
 
173
- ## Citation [optional]
174
-
175
- <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
176
-
177
- **BibTeX:**
178
 
179
- [More Information Needed]
180
 
181
- **APA:**
 
 
 
 
 
 
 
 
 
 
182
 
183
- [More Information Needed]
184
 
185
- ## Glossary [optional]
186
 
187
- <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
188
 
189
- [More Information Needed]
190
 
191
- ## More Information [optional]
192
 
193
- [More Information Needed]
194
 
195
- ## Model Card Authors [optional]
196
 
197
- [More Information Needed]
 
 
 
 
198
 
199
- ## Model Card Contact
200
 
201
- [More Information Needed]
202
 
 
 
 
 
 
 
 
 
 
203
 
204
- ## Training procedure
205
 
 
206
 
207
- The following `bitsandbytes` quantization config was used during training:
208
- - quant_method: bitsandbytes
209
- - load_in_8bit: True
210
- - load_in_4bit: False
211
- - llm_int8_threshold: 6.0
212
- - llm_int8_skip_modules: None
213
- - llm_int8_enable_fp32_cpu_offload: False
214
- - llm_int8_has_fp16_weight: False
215
- - bnb_4bit_quant_type: fp4
216
- - bnb_4bit_use_double_quant: False
217
- - bnb_4bit_compute_dtype: float32
218
 
219
- ### Framework versions
220
 
 
221
 
222
- - PEFT 0.6.3.dev0
 
 
 
5
  - vi
6
  ---
7
 
8
+ # Vietnamese Fine-tuned Llama-2-7b-chat-hf
9
 
10
+ This repository contains a Vietnamese-tuned version of the `Llama-2-7b-chat-hf` model, which has been fine-tuned on Vietnamese datasets using LoRA (Low-Rank Adaptation) techniques.
11
 
12
 
13
 
14
  ## Model Details
15
 
16
+ This model is a fine-tuned version of the Llama-2-7b-chat-hf model, specifically adapted for improved performance on Vietnamese language tasks. It uses LoRA fine-tuning to efficiently adapt the large language model to Vietnamese data while maintaining much of the original model's general knowledge and capabilities.
17
+
18
  ### Model Description
19
 
20
  <!-- Provide a longer summary of what this model is. -->
21
 
22
 
23
 
24
+ - **Developed by:** [Daniel Du](https://github.com/danghoangnhan)
25
+ - **Model type:** Large Language Model
26
+ - **Language(s) (NLP):** Vietnamese
 
 
27
  - **License:** [More Information Needed]
28
+ - **Finetuned from model [optional]:** [meta-llama/Llama-2-7b-chat-hf](https://huggingface.co/meta-llama/Llama-2-7b-chat-hf)
29
+ - **Language:** Vietnamese
30
 
 
31
 
32
+ ### Direct Use
33
 
34
+ You can use this model directly with the Hugging Face Transformers library:
 
 
35
 
36
+ ```python
37
+ from transformers import AutoTokenizer, AutoModelForCausalLM
38
+ from peft import PeftModel, PeftConfig
39
 
40
+ # Load the base model
41
+ base_model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-2-7b-chat-hf")
42
 
43
+ # Load the LoRA configuration and model
44
+ peft_model_id = "CallMeMrFern/Llama-2-7b-chat-hf_vn"
45
+ config = PeftConfig.from_pretrained(peft_model_id)
46
+ model = PeftModel.from_pretrained(base_model, peft_model_id)
47
 
48
+ # Load the tokenizer
49
+ tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-2-7b-chat-hf")
50
+
51
+ # Example usage
52
+ input_text = "Xin chào, hôm nay thời tiết thế nào?"
53
+ inputs = tokenizer(input_text, return_tensors="pt")
54
+ outputs = model.generate(**inputs, max_length=100)
55
+ print(tokenizer.decode(outputs[0], skip_special_tokens=True))
56
+ ```
57
 
 
58
 
59
  ### Downstream Use [optional]
60
 
 
70
 
71
  ## Bias, Risks, and Limitations
72
 
73
+ - This model is specifically fine-tuned for Vietnamese and may not perform as well on other languages.
74
+ - The model inherits limitations from the base Llama-2-7b-chat-hf model.
75
+ - Performance may vary depending on the specific task and domain.
76
 
77
  ### Recommendations
78
 
 
90
 
91
  ### Training Data
92
 
93
+ Dataset: alpaca_translate_GPT_35_10_20k.json (Vietnamese translation of the Alpaca dataset)
94
 
 
 
 
 
 
 
 
 
 
95
 
96
 
97
  #### Training Hyperparameters
 
130
 
131
  ### Results
132
 
 
133
 
134
  #### Summary
135
 
 
153
  - **Compute Region:** [More Information Needed]
154
  - **Carbon Emitted:** [More Information Needed]
155
 
 
156
 
157
  ### Model Architecture and Objective
158
 
159
  [More Information Needed]
160
 
161
+ ## Citation
162
+ If you use this model in your research, please cite:
163
 
164
+ ```
165
+ @misc{vietnamese_llama2_7b_chat,
166
+ author = {[Your Name]},
167
+ title = {Vietnamese Fine-tuned Llama-2-7b-chat-hf},
168
+ year = {2023},
169
+ publisher = {GitHub},
170
+ journal = {GitHub repository},
171
+ howpublished = {\url{https://huggingface.co/CallMeMrFern/Llama-2-7b-chat-hf_vn}}
172
+ }
173
+ ```
174
 
175
+ ## Training procedure
 
 
 
 
176
 
 
177
 
178
+ The following `bitsandbytes` quantization config was used during training:
179
+ - quant_method: bitsandbytes
180
+ - load_in_8bit: True
181
+ - load_in_4bit: False
182
+ - llm_int8_threshold: 6.0
183
+ - llm_int8_skip_modules: None
184
+ - llm_int8_enable_fp32_cpu_offload: False
185
+ - llm_int8_has_fp16_weight: False
186
+ - bnb_4bit_quant_type: fp4
187
+ - bnb_4bit_use_double_quant: False
188
+ - bnb_4bit_compute_dtype: float32
189
 
190
+ ### Framework versions
191
 
 
192
 
193
+ - PEFT 0.6.3.dev0
194
 
 
195
 
196
+ ## Model Description
197
 
198
+ This model is a fine-tuned version of the Llama-2-7b-chat-hf model, specifically adapted for improved performance on Vietnamese language tasks. It uses LoRA fine-tuning to efficiently adapt the large language model to Vietnamese data while maintaining much of the original model's general knowledge and capabilities.
199
 
200
+ ## Fine-tuning Details
201
 
202
+ - **Fine-tuning Method:** LoRA (Low-Rank Adaptation)
203
+ - **LoRA Config:**
204
+ - Target Modules: `["q_proj", "v_proj"]`
205
+ - Precision: 8-bit
206
+ - **Dataset:** `alpaca_translate_GPT_35_10_20k.json` (Vietnamese translation of the Alpaca dataset)
207
 
208
+ ## Training Procedure
209
 
210
+ The model was fine-tuned using the following command:
211
 
212
+ ```bash
213
+ python finetune/lora.py \
214
+ --base_model meta-llama/Llama-2-7b-chat-hf \
215
+ --model_type llama \
216
+ --data_dir data/general/alpaca_translate_GPT_35_10_20k.json \
217
+ --output_dir finetuned/meta-llama/Llama-2-7b-chat-hf \
218
+ --lora_target_modules '["q_proj", "v_proj"]' \
219
+ --micro_batch_size 1
220
+ ```
221
 
222
+ For multi-GPU training, a distributed training approach was used.
223
 
224
+ ## Evaluation Results
225
 
226
+ [Include any evaluation results, perplexity scores, or benchmark performances here]
 
 
 
 
 
 
 
 
 
 
227
 
 
228
 
229
+ ## Acknowledgements
230
 
231
+ - This project is part of the TF07 Course offered by ProtonX.
232
+ - We thank the creators of the original Llama-2-7b-chat-hf model and the Hugging Face team for their tools and resources.
233
+ - Appreciation to [VietnamAIHub/Vietnamese_LLMs](https://github.com/VietnamAIHub/Vietnamese_LLMs) for the translated dataset.