GuelGaMesh01
commited on
Commit
•
dcc58c2
1
Parent(s):
ded882c
Update README.md
Browse files
README.md
CHANGED
@@ -109,22 +109,11 @@ Epochs: 3
|
|
109 |
Max Sequence Length: 2500 tokens
|
110 |
Optimizer: paged_adamw_8bit
|
111 |
|
112 |
-
#### Training Hyperparameters
|
113 |
-
|
114 |
-
- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
|
115 |
-
|
116 |
-
#### Speeds, Sizes, Times [optional]
|
117 |
-
|
118 |
-
<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
|
119 |
-
|
120 |
-
- **Training Time:** Approximately 30 minutes for 150 steps with fp16 mixed precision.
|
121 |
-
- **Checkpoint Size:** The model checkpoints are approximately 15 GB.
|
122 |
|
123 |
## Evaluation
|
124 |
|
125 |
<!-- This section describes the evaluation protocols and provides the results. -->
|
126 |
|
127 |
-
### Testing Data, Factors & Metrics
|
128 |
|
129 |
#### Testing Data
|
130 |
|
@@ -133,11 +122,6 @@ Optimizer: paged_adamw_8bit
|
|
133 |
The model was evaluated using a split from the training data,
|
134 |
specifically a 10% test split of the original training dataset.
|
135 |
|
136 |
-
#### Factors
|
137 |
-
|
138 |
-
<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
|
139 |
-
|
140 |
-
[More Information Needed]
|
141 |
|
142 |
#### Metrics
|
143 |
|
|
|
109 |
Max Sequence Length: 2500 tokens
|
110 |
Optimizer: paged_adamw_8bit
|
111 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
112 |
|
113 |
## Evaluation
|
114 |
|
115 |
<!-- This section describes the evaluation protocols and provides the results. -->
|
116 |
|
|
|
117 |
|
118 |
#### Testing Data
|
119 |
|
|
|
122 |
The model was evaluated using a split from the training data,
|
123 |
specifically a 10% test split of the original training dataset.
|
124 |
|
|
|
|
|
|
|
|
|
|
|
125 |
|
126 |
#### Metrics
|
127 |
|