Carlos Rosas commited on
Commit
dc49182
1 Parent(s): 3e546be

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +29 -21
README.md CHANGED
@@ -10,27 +10,35 @@ The model was fine-tuned on a specialized corpus consisting of:
10
  2. Retrieved documents: For each synthetic query, relevant documents were retrieved using the BM25 ranking algorithm.
11
  3. Generated answers: Responses to the synthetic queries were created based on the retrieved documents.
12
 
13
- ### Training Hyperparameters
14
- - Max Steps: 3000
15
- - Learning Rate: 3e-4
16
- - Batch Size: 2 per device
17
- - Gradient Accumulation Steps: 4
18
- - Max Sequence Length: 8192
19
- - Weight Decay: 0.001
20
- - Warmup Ratio: 0.03
21
- - LR Scheduler: Linear
22
- - Optimizer: paged_adamw_32bit
23
-
24
- ### LoRA Configuration
25
- - LoRA Alpha: 16
26
- - LoRA Dropout: 0.1
27
- - LoRA R: 64
28
- - Target Modules: ["gate_proj", "down_proj", "up_proj", "q_proj", "v_proj", "k_proj", "o_proj"]
29
-
30
- ### Quantization
31
- - Quantization: 4-bit
32
- - Quantization Type: nf4
33
- - Compute Dtype: float16
 
 
 
 
 
 
 
 
34
 
35
  ## Usage
36
 
 
10
  2. Retrieved documents: For each synthetic query, relevant documents were retrieved using the BM25 ranking algorithm.
11
  3. Generated answers: Responses to the synthetic queries were created based on the retrieved documents.
12
 
13
+ ```yaml
14
+ Training Hyperparameters:
15
+ Max Steps: 3000
16
+ Learning Rate: 3e-4
17
+ Batch Size: 2 per device
18
+ Gradient Accumulation Steps: 4
19
+ Max Sequence Length: 8192
20
+ Weight Decay: 0.001
21
+ Warmup Ratio: 0.03
22
+ LR Scheduler: Linear
23
+ Optimizer: paged_adamw_32bit
24
+
25
+ LoRA Configuration:
26
+ LoRA Alpha: 16
27
+ LoRA Dropout: 0.1
28
+ LoRA R: 64
29
+ Target Modules:
30
+ - gate_proj
31
+ - down_proj
32
+ - up_proj
33
+ - q_proj
34
+ - v_proj
35
+ - k_proj
36
+ - o_proj
37
+
38
+ Quantization:
39
+ Quantization: 4-bit
40
+ Quantization Type: nf4
41
+ Compute Dtype: float16
42
 
43
  ## Usage
44