yulan-team
/

YuLan-Mini-Before-Annealing

optimizer_states

Model card Files Files and versions Community

IvanHU commited on 18 days ago

Commit

4cd27bd

·

verified ·

1 Parent(s): 0bbde8b

Update README.md

Files changed (1) hide show

README.md +3 -3

README.md CHANGED Viewed

@@ -35,14 +35,14 @@ This version includes the optimizer, allowing you to resume training using the H
 ## Continual Training Tutorial
-### Step 1: Modify the `config.json`
-Due to the implementation of Hugging Face Trainer, certain parameters are stored in the `config.json` file and cannot be modified through the Trainer's command-line arguments. Therefore, you need to update these parameters in the `config.json` file first, particularly:
 - **`save_steps`**: The frequency of saving intermediate checkpoints.
 - **`train_batch_size`**: The batch size per GPU (equivalent to `per_device_train_batch_size` in the Trainer). We used a batch size of 1008 (approximately 4M tokens) during the stable training stage. Maintaining this same batch size is equally important for training effectiveness.
-Below is an example of a properly configured `config.json` file:
 ```json
 {

 ## Continual Training Tutorial
+### Step 1: Modify the `trainer_state.json`
+Due to the implementation of Hugging Face Trainer, certain parameters are stored in the `trainer_state.json` file and cannot be modified through the Trainer's command-line arguments. Therefore, you need to update these parameters in the `trainer_state.json` file first, particularly:
 - **`save_steps`**: The frequency of saving intermediate checkpoints.
 - **`train_batch_size`**: The batch size per GPU (equivalent to `per_device_train_batch_size` in the Trainer). We used a batch size of 1008 (approximately 4M tokens) during the stable training stage. Maintaining this same batch size is equally important for training effectiveness.
+Below is an example of a properly configured `trainer_state.json` file:
 ```json
 {