Update README.md
Browse files
README.md
CHANGED
@@ -10,6 +10,105 @@ tags:
|
|
10 |
- llama
|
11 |
- trl
|
12 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
13 |
|
14 |
# Uploaded model
|
15 |
|
|
|
10 |
- llama
|
11 |
- trl
|
12 |
---
|
13 |
+
DATASET
|
14 |
+
------------------------------
|
15 |
+
- **What's new?:** Use the version 3.2 of dataset (Langfuse + AWS) that has better quality:
|
16 |
+
- Remove all the 10, 15 question count, just focus on 5 question count
|
17 |
+
- Fix all the Vietnamese quiz (make sure the output is Vietnamese)
|
18 |
+
- Fix some lazy duplicated topic (Biglead, Computing)
|
19 |
+
- Remove Paragraph, replace Paragraph with MCQ for all data points
|
20 |
+
- Train using the default training config (60 step, linear lr)
|
21 |
+
|
22 |
+
TRAINING
|
23 |
+
------------------------------
|
24 |
+
- Overview:
|
25 |
+
![image/png](https://cdn-uploads.huggingface.co/production/uploads/64952a1e5ba8e6c66e1a0fa8/QBR1IUoD7REKoGG_kJtRS.png)
|
26 |
+
- Use low rank 8 to avoid overfitting and keep the generalization of model
|
27 |
+
|
28 |
+
Step Training Loss
|
29 |
+
1 1.216600
|
30 |
+
2 1.181100
|
31 |
+
3 1.236900
|
32 |
+
4 1.157100
|
33 |
+
5 1.184100
|
34 |
+
6 1.103500
|
35 |
+
7 1.150900
|
36 |
+
8 1.112900
|
37 |
+
9 1.074600
|
38 |
+
10 1.095700
|
39 |
+
11 0.966400
|
40 |
+
12 0.977000
|
41 |
+
13 1.004500
|
42 |
+
14 0.931500
|
43 |
+
15 0.869900
|
44 |
+
16 0.886300
|
45 |
+
17 0.900000
|
46 |
+
18 0.792500
|
47 |
+
19 0.814200
|
48 |
+
20 0.808900
|
49 |
+
21 0.815200
|
50 |
+
22 0.771100
|
51 |
+
23 0.800000
|
52 |
+
24 0.782500
|
53 |
+
25 0.772700
|
54 |
+
26 0.698300
|
55 |
+
27 0.759500
|
56 |
+
28 0.718500
|
57 |
+
29 0.711400
|
58 |
+
30 0.759400
|
59 |
+
31 0.717000
|
60 |
+
32 0.708700
|
61 |
+
33 0.726800
|
62 |
+
34 0.724500
|
63 |
+
35 0.747800
|
64 |
+
36 0.715600
|
65 |
+
37 0.708100
|
66 |
+
38 0.648300
|
67 |
+
39 0.677900
|
68 |
+
40 0.685600
|
69 |
+
41 0.726100
|
70 |
+
42 0.687300
|
71 |
+
43 0.663100
|
72 |
+
44 0.628600
|
73 |
+
45 0.663300
|
74 |
+
46 0.683500
|
75 |
+
47 0.673800
|
76 |
+
48 0.651100
|
77 |
+
49 0.683700
|
78 |
+
50 0.702400
|
79 |
+
51 0.664400
|
80 |
+
52 0.671800
|
81 |
+
53 0.673000
|
82 |
+
54 0.704000
|
83 |
+
55 0.621100
|
84 |
+
56 0.668200
|
85 |
+
57 0.686000
|
86 |
+
58 0.639500
|
87 |
+
59 0.665400
|
88 |
+
60 0.680900
|
89 |
+
|
90 |
+
- 4757.667 seconds used for training.
|
91 |
+
- 79.29 minutes used for training.
|
92 |
+
- Peak reserved memory = 13.857 GB.
|
93 |
+
- Peak reserved memory for training = 12.73 GB.
|
94 |
+
- Peak reserved memory % of max memory = 93.959 %.
|
95 |
+
- Peak reserved memory for training % of max memory = 86.317 %.
|
96 |
+
- Final loss = 0.680900
|
97 |
+
- View full training here: https://wandb.ai/vietphuongnguyen2602-rockship/huggingface/runs/ns2ym0hr
|
98 |
+
|
99 |
+
|
100 |
+
FINAL BENCHMARKING
|
101 |
+
------------------------------
|
102 |
+
- **Time to First Token (TTFT):** 0.002s
|
103 |
+
- **Time Per Output Token (TPOT):** 40.85ms/token
|
104 |
+
- **Throughput (token/s):** 25.66token/s
|
105 |
+
- **Average Token Latency (ms/token):** 40.90ms/token
|
106 |
+
- **Total Generation Time:** 63.015s
|
107 |
+
- **Input Tokenization Time:** 0.008s
|
108 |
+
- **Input Tokens:** 1909
|
109 |
+
- **Output Tokens:** 984
|
110 |
+
- **Total Tokens:** 2892
|
111 |
+
- **Memory Usage (GPU):** 1.49GB
|
112 |
|
113 |
# Uploaded model
|
114 |
|