Training in progress, step 389, checkpoint

Browse files

Files changed (12) hide show

last-checkpoint/README.md +202 -0
last-checkpoint/adapter_config.json +34 -0
last-checkpoint/adapter_model.safetensors +3 -0
last-checkpoint/optimizer.pt +3 -0
last-checkpoint/rng_state.pth +3 -0
last-checkpoint/scheduler.pt +3 -0
last-checkpoint/special_tokens_map.json +30 -0
last-checkpoint/tokenizer.json +0 -0
last-checkpoint/tokenizer.model +3 -0
last-checkpoint/tokenizer_config.json +0 -0
last-checkpoint/trainer_state.json +2756 -0
last-checkpoint/training_args.bin +3 -0

last-checkpoint/README.md ADDED Viewed

	@@ -0,0 +1,202 @@

+---
+base_model: unsloth/mistral-7b-instruct-v0.3
+library_name: peft
+---
+# Model Card for Model ID
+<!-- Provide a quick summary of what the model is/does. -->
+## Model Details
+### Model Description
+<!-- Provide a longer summary of what this model is. -->
+- **Developed by:** [More Information Needed]
+- **Funded by [optional]:** [More Information Needed]
+- **Shared by [optional]:** [More Information Needed]
+- **Model type:** [More Information Needed]
+- **Language(s) (NLP):** [More Information Needed]
+- **License:** [More Information Needed]
+- **Finetuned from model [optional]:** [More Information Needed]
+### Model Sources [optional]
+<!-- Provide the basic links for the model. -->
+- **Repository:** [More Information Needed]
+- **Paper [optional]:** [More Information Needed]
+- **Demo [optional]:** [More Information Needed]
+## Uses
+<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
+### Direct Use
+<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
+[More Information Needed]
+### Downstream Use [optional]
+<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
+[More Information Needed]
+### Out-of-Scope Use
+<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
+[More Information Needed]
+## Bias, Risks, and Limitations
+<!-- This section is meant to convey both technical and sociotechnical limitations. -->
+[More Information Needed]
+### Recommendations
+<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
+Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
+## How to Get Started with the Model
+Use the code below to get started with the model.
+[More Information Needed]
+## Training Details
+### Training Data
+<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+[More Information Needed]
+### Training Procedure
+<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+#### Preprocessing [optional]
+[More Information Needed]
+#### Training Hyperparameters
+- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
+#### Speeds, Sizes, Times [optional]
+<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
+[More Information Needed]
+## Evaluation
+<!-- This section describes the evaluation protocols and provides the results. -->
+### Testing Data, Factors & Metrics
+#### Testing Data
+<!-- This should link to a Dataset Card if possible. -->
+[More Information Needed]
+#### Factors
+<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
+[More Information Needed]
+#### Metrics
+<!-- These are the evaluation metrics being used, ideally with a description of why. -->
+[More Information Needed]
+### Results
+[More Information Needed]
+#### Summary
+## Model Examination [optional]
+<!-- Relevant interpretability work for the model goes here -->
+[More Information Needed]
+## Environmental Impact
+<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
+Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
+- **Hardware Type:** [More Information Needed]
+- **Hours used:** [More Information Needed]
+- **Cloud Provider:** [More Information Needed]
+- **Compute Region:** [More Information Needed]
+- **Carbon Emitted:** [More Information Needed]
+## Technical Specifications [optional]
+### Model Architecture and Objective
+[More Information Needed]
+### Compute Infrastructure
+[More Information Needed]
+#### Hardware
+[More Information Needed]
+#### Software
+[More Information Needed]
+## Citation [optional]
+<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
+**BibTeX:**
+[More Information Needed]
+**APA:**
+[More Information Needed]
+## Glossary [optional]
+<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
+[More Information Needed]
+## More Information [optional]
+[More Information Needed]
+## Model Card Authors [optional]
+[More Information Needed]
+## Model Card Contact
+[More Information Needed]
+### Framework versions
+- PEFT 0.13.2

last-checkpoint/adapter_config.json ADDED Viewed

	@@ -0,0 +1,34 @@

+{
+  "alpha_pattern": {},
+  "auto_mapping": null,
+  "base_model_name_or_path": "unsloth/mistral-7b-instruct-v0.3",
+  "bias": "none",
+  "fan_in_fan_out": null,
+  "inference_mode": true,
+  "init_lora_weights": true,
+  "layer_replication": null,
+  "layers_pattern": null,
+  "layers_to_transform": null,
+  "loftq_config": {},
+  "lora_alpha": 16,
+  "lora_dropout": 0.05,
+  "megatron_config": null,
+  "megatron_core": "megatron.core",
+  "modules_to_save": null,
+  "peft_type": "LORA",
+  "r": 8,
+  "rank_pattern": {},
+  "revision": null,
+  "target_modules": [
+    "v_proj",
+    "k_proj",
+    "gate_proj",
+    "q_proj",
+    "o_proj",
+    "down_proj",
+    "up_proj"
+  ],
+  "task_type": "CAUSAL_LM",
+  "use_dora": false,
+  "use_rslora": false
+}

last-checkpoint/adapter_model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:dcf91469067f9eeced923f09648ab33e895e048202ef13b3186d6e281eb2cc7f
+size 83945296

last-checkpoint/optimizer.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:3c71538bb88197b52ca4e6ed5db3c576ec2d1ecedad4236ec6051f25f42ddc2c
+size 43123028

last-checkpoint/rng_state.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:8a5b5e17a55c3332ea94d79a7c64685c82f7ab052ce21c7c47f5765a76965b0f
+size 14244

last-checkpoint/scheduler.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:050c7b76044759aeaf3d206323a93cadc9c1671815bdc5293e6c90f494ced106
+size 1064

last-checkpoint/special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,30 @@

+{
+  "bos_token": {
+    "content": "<s>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "eos_token": {
+    "content": "</s>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "pad_token": {
+    "content": "[control_768]",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "unk_token": {
+    "content": "<unk>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  }
+}

last-checkpoint/tokenizer.json ADDED Viewed

The diff for this file is too large to render. See raw diff

last-checkpoint/tokenizer.model ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:37f00374dea48658ee8f5d0f21895b9bc55cb0103939607c8185bfd1c6ca1f89
+size 587404

last-checkpoint/tokenizer_config.json ADDED Viewed

The diff for this file is too large to render. See raw diff

last-checkpoint/trainer_state.json ADDED Viewed

	@@ -0,0 +1,2756 @@

+{
+  "best_metric": null,
+  "best_model_checkpoint": null,
+  "epoch": 0.01589604233496108,
+  "eval_steps": 500,
+  "global_step": 389,
+  "is_hyper_param_search": false,
+  "is_local_process_zero": true,
+  "is_world_process_zero": true,
+  "log_history": [
+    {
+      "epoch": 4.086386204360174e-05,
+      "grad_norm": 32.35245895385742,
+      "learning_rate": 2e-05,
+      "loss": 8.4638,
+      "step": 1
+    },
+    {
+      "epoch": 8.172772408720348e-05,
+      "grad_norm": 27.18036651611328,
+      "learning_rate": 4e-05,
+      "loss": 8.7385,
+      "step": 2
+    },
+    {
+      "epoch": 0.00012259158613080523,
+      "grad_norm": 22.749631881713867,
+      "learning_rate": 6e-05,
+      "loss": 7.8372,
+      "step": 3
+    },
+    {
+      "epoch": 0.00016345544817440697,
+      "grad_norm": 21.95606803894043,
+      "learning_rate": 8e-05,
+      "loss": 8.1841,
+      "step": 4
+    },
+    {
+      "epoch": 0.0002043193102180087,
+      "grad_norm": 19.77204704284668,
+      "learning_rate": 0.0001,
+      "loss": 8.6331,
+      "step": 5
+    },
+    {
+      "epoch": 0.00024518317226161047,
+      "grad_norm": 25.299869537353516,
+      "learning_rate": 9.999989716599041e-05,
+      "loss": 7.5998,
+      "step": 6
+    },
+    {
+      "epoch": 0.0002860470343052122,
+      "grad_norm": 21.570093154907227,
+      "learning_rate": 9.999958866438464e-05,
+      "loss": 7.2277,
+      "step": 7
+    },
+    {
+      "epoch": 0.00032691089634881394,
+      "grad_norm": 23.380281448364258,
+      "learning_rate": 9.999907449645163e-05,
+      "loss": 6.5516,
+      "step": 8
+    },
+    {
+      "epoch": 0.0003677747583924157,
+      "grad_norm": 14.654683113098145,
+      "learning_rate": 9.999835466430639e-05,
+      "loss": 6.8196,
+      "step": 9
+    },
+    {
+      "epoch": 0.0004086386204360174,
+      "grad_norm": 17.326505661010742,
+      "learning_rate": 9.99974291709098e-05,
+      "loss": 7.6549,
+      "step": 10
+    },
+    {
+      "epoch": 0.00044950248247961914,
+      "grad_norm": 17.01525115966797,
+      "learning_rate": 9.99962980200688e-05,
+      "loss": 7.7667,
+      "step": 11
+    },
+    {
+      "epoch": 0.0004903663445232209,
+      "grad_norm": 16.817630767822266,
+      "learning_rate": 9.999496121643616e-05,
+      "loss": 8.4052,
+      "step": 12
+    },
+    {
+      "epoch": 0.0005312302065668226,
+      "grad_norm": 15.02230453491211,
+      "learning_rate": 9.999341876551069e-05,
+      "loss": 7.5967,
+      "step": 13
+    },
+    {
+      "epoch": 0.0005720940686104244,
+      "grad_norm": 17.479883193969727,
+      "learning_rate": 9.999167067363703e-05,
+      "loss": 7.1457,
+      "step": 14
+    },
+    {
+      "epoch": 0.0006129579306540261,
+      "grad_norm": 22.608917236328125,
+      "learning_rate": 9.998971694800569e-05,
+      "loss": 6.92,
+      "step": 15
+    },
+    {
+      "epoch": 0.0006538217926976279,
+      "grad_norm": 15.77768611907959,
+      "learning_rate": 9.998755759665308e-05,
+      "loss": 6.236,
+      "step": 16
+    },
+    {
+      "epoch": 0.0006946856547412296,
+      "grad_norm": 16.468963623046875,
+      "learning_rate": 9.998519262846136e-05,
+      "loss": 6.5991,
+      "step": 17
+    },
+    {
+      "epoch": 0.0007355495167848313,
+      "grad_norm": 16.331256866455078,
+      "learning_rate": 9.998262205315851e-05,
+      "loss": 7.5169,
+      "step": 18
+    },
+    {
+      "epoch": 0.000776413378828433,
+      "grad_norm": 16.179676055908203,
+      "learning_rate": 9.997984588131826e-05,
+      "loss": 6.0861,
+      "step": 19
+    },
+    {
+      "epoch": 0.0008172772408720348,
+      "grad_norm": 15.519332885742188,
+      "learning_rate": 9.997686412435996e-05,
+      "loss": 7.3839,
+      "step": 20
+    },
+    {
+      "epoch": 0.0008581411029156366,
+      "grad_norm": 17.317489624023438,
+      "learning_rate": 9.997367679454865e-05,
+      "loss": 7.3581,
+      "step": 21
+    },
+    {
+      "epoch": 0.0008990049649592383,
+      "grad_norm": 18.62262725830078,
+      "learning_rate": 9.9970283904995e-05,
+      "loss": 6.2524,
+      "step": 22
+    },
+    {
+      "epoch": 0.0009398688270028401,
+      "grad_norm": 15.572286605834961,
+      "learning_rate": 9.99666854696552e-05,
+      "loss": 6.8967,
+      "step": 23
+    },
+    {
+      "epoch": 0.0009807326890464419,
+      "grad_norm": 15.743536949157715,
+      "learning_rate": 9.996288150333086e-05,
+      "loss": 6.5211,
+      "step": 24
+    },
+    {
+      "epoch": 0.0010215965510900434,
+      "grad_norm": 15.656659126281738,
+      "learning_rate": 9.995887202166909e-05,
+      "loss": 6.3349,
+      "step": 25
+    },
+    {
+      "epoch": 0.0010624604131336452,
+      "grad_norm": 14.97890853881836,
+      "learning_rate": 9.995465704116233e-05,
+      "loss": 5.909,
+      "step": 26
+    },
+    {
+      "epoch": 0.001103324275177247,
+      "grad_norm": 19.323623657226562,
+      "learning_rate": 9.995023657914832e-05,
+      "loss": 7.5711,
+      "step": 27
+    },
+    {
+      "epoch": 0.0011441881372208488,
+      "grad_norm": 23.951154708862305,
+      "learning_rate": 9.994561065381004e-05,
+      "loss": 8.4369,
+      "step": 28
+    },
+    {
+      "epoch": 0.0011850519992644504,
+      "grad_norm": 18.174833297729492,
+      "learning_rate": 9.994077928417551e-05,
+      "loss": 6.8328,
+      "step": 29
+    },
+    {
+      "epoch": 0.0012259158613080522,
+      "grad_norm": 21.768999099731445,
+      "learning_rate": 9.993574249011797e-05,
+      "loss": 7.5939,
+      "step": 30
+    },
+    {
+      "epoch": 0.001266779723351654,
+      "grad_norm": 16.0784969329834,
+      "learning_rate": 9.993050029235552e-05,
+      "loss": 5.6128,
+      "step": 31
+    },
+    {
+      "epoch": 0.0013076435853952558,
+      "grad_norm": 18.45024299621582,
+      "learning_rate": 9.992505271245126e-05,
+      "loss": 7.2958,
+      "step": 32
+    },
+    {
+      "epoch": 0.0013485074474388575,
+      "grad_norm": 20.15422248840332,
+      "learning_rate": 9.991939977281299e-05,
+      "loss": 6.79,
+      "step": 33
+    },
+    {
+      "epoch": 0.0013893713094824591,
+      "grad_norm": 20.988454818725586,
+      "learning_rate": 9.991354149669331e-05,
+      "loss": 7.5329,
+      "step": 34
+    },
+    {
+      "epoch": 0.001430235171526061,
+      "grad_norm": 17.833402633666992,
+      "learning_rate": 9.990747790818947e-05,
+      "loss": 7.1789,
+      "step": 35
+    },
+    {
+      "epoch": 0.0014710990335696627,
+      "grad_norm": 23.832569122314453,
+      "learning_rate": 9.990120903224311e-05,
+      "loss": 7.9279,
+      "step": 36
+    },
+    {
+      "epoch": 0.0015119628956132645,
+      "grad_norm": 22.453641891479492,
+      "learning_rate": 9.989473489464044e-05,
+      "loss": 7.3563,
+      "step": 37
+    },
+    {
+      "epoch": 0.001552826757656866,
+      "grad_norm": 18.8485050201416,
+      "learning_rate": 9.988805552201189e-05,
+      "loss": 7.8919,
+      "step": 38
+    },
+    {
+      "epoch": 0.0015936906197004678,
+      "grad_norm": 20.521846771240234,
+      "learning_rate": 9.988117094183214e-05,
+      "loss": 7.4264,
+      "step": 39
+    },
+    {
+      "epoch": 0.0016345544817440696,
+      "grad_norm": 23.515644073486328,
+      "learning_rate": 9.987408118241996e-05,
+      "loss": 7.8032,
+      "step": 40
+    },
+    {
+      "epoch": 0.0016754183437876714,
+      "grad_norm": 20.498987197875977,
+      "learning_rate": 9.986678627293806e-05,
+      "loss": 8.1339,
+      "step": 41
+    },
+    {
+      "epoch": 0.0017162822058312732,
+      "grad_norm": 18.774856567382812,
+      "learning_rate": 9.985928624339304e-05,
+      "loss": 7.3659,
+      "step": 42
+    },
+    {
+      "epoch": 0.0017571460678748748,
+      "grad_norm": 22.373167037963867,
+      "learning_rate": 9.985158112463522e-05,
+      "loss": 6.934,
+      "step": 43
+    },
+    {
+      "epoch": 0.0017980099299184766,
+      "grad_norm": 21.635862350463867,
+      "learning_rate": 9.984367094835856e-05,
+      "loss": 6.6595,
+      "step": 44
+    },
+    {
+      "epoch": 0.0018388737919620784,
+      "grad_norm": 21.037921905517578,
+      "learning_rate": 9.983555574710042e-05,
+      "loss": 6.406,
+      "step": 45
+    },
+    {
+      "epoch": 0.0018797376540056802,
+      "grad_norm": 24.574268341064453,
+      "learning_rate": 9.982723555424158e-05,
+      "loss": 6.7946,
+      "step": 46
+    },
+    {
+      "epoch": 0.0019206015160492817,
+      "grad_norm": 30.303829193115234,
+      "learning_rate": 9.981871040400599e-05,
+      "loss": 6.7379,
+      "step": 47
+    },
+    {
+      "epoch": 0.0019614653780928837,
+      "grad_norm": 27.70897674560547,
+      "learning_rate": 9.980998033146066e-05,
+      "loss": 7.0633,
+      "step": 48
+    },
+    {
+      "epoch": 0.002002329240136485,
+      "grad_norm": 29.71209144592285,
+      "learning_rate": 9.980104537251551e-05,
+      "loss": 7.7937,
+      "step": 49
+    },
+    {
+      "epoch": 0.002043193102180087,
+      "grad_norm": 36.392677307128906,
+      "learning_rate": 9.979190556392327e-05,
+      "loss": 9.7726,
+      "step": 50
+    },
+    {
+      "epoch": 0.0020840569642236887,
+      "grad_norm": 23.124082565307617,
+      "learning_rate": 9.978256094327923e-05,
+      "loss": 8.0964,
+      "step": 51
+    },
+    {
+      "epoch": 0.0021249208262672905,
+      "grad_norm": 18.6306209564209,
+      "learning_rate": 9.977301154902123e-05,
+      "loss": 6.6695,
+      "step": 52
+    },
+    {
+      "epoch": 0.0021657846883108922,
+      "grad_norm": 22.567228317260742,
+      "learning_rate": 9.976325742042933e-05,
+      "loss": 6.5986,
+      "step": 53
+    },
+    {
+      "epoch": 0.002206648550354494,
+      "grad_norm": 22.24195098876953,
+      "learning_rate": 9.975329859762581e-05,
+      "loss": 7.8847,
+      "step": 54
+    },
+    {
+      "epoch": 0.002247512412398096,
+      "grad_norm": 15.674145698547363,
+      "learning_rate": 9.974313512157487e-05,
+      "loss": 7.108,
+      "step": 55
+    },
+    {
+      "epoch": 0.0022883762744416976,
+      "grad_norm": 16.254152297973633,
+      "learning_rate": 9.973276703408257e-05,
+      "loss": 7.618,
+      "step": 56
+    },
+    {
+      "epoch": 0.0023292401364852994,
+      "grad_norm": 13.447061538696289,
+      "learning_rate": 9.972219437779658e-05,
+      "loss": 5.418,
+      "step": 57
+    },
+    {
+      "epoch": 0.0023701039985289008,
+      "grad_norm": 14.919565200805664,
+      "learning_rate": 9.971141719620604e-05,
+      "loss": 6.4513,
+      "step": 58
+    },
+    {
+      "epoch": 0.0024109678605725026,
+      "grad_norm": 12.961139678955078,
+      "learning_rate": 9.970043553364139e-05,
+      "loss": 5.9823,
+      "step": 59
+    },
+    {
+      "epoch": 0.0024518317226161043,
+      "grad_norm": 13.304807662963867,
+      "learning_rate": 9.968924943527417e-05,
+      "loss": 6.8065,
+      "step": 60
+    },
+    {
+      "epoch": 0.002492695584659706,
+      "grad_norm": 19.168649673461914,
+      "learning_rate": 9.967785894711682e-05,
+      "loss": 6.7415,
+      "step": 61
+    },
+    {
+      "epoch": 0.002533559446703308,
+      "grad_norm": 14.38369369506836,
+      "learning_rate": 9.966626411602253e-05,
+      "loss": 7.383,
+      "step": 62
+    },
+    {
+      "epoch": 0.0025744233087469097,
+      "grad_norm": 13.37614917755127,
+      "learning_rate": 9.965446498968503e-05,
+      "loss": 6.0576,
+      "step": 63
+    },
+    {
+      "epoch": 0.0026152871707905115,
+      "grad_norm": 13.369189262390137,
+      "learning_rate": 9.964246161663835e-05,
+      "loss": 6.5152,
+      "step": 64
+    },
+    {
+      "epoch": 0.0026561510328341133,
+      "grad_norm": 12.1449613571167,
+      "learning_rate": 9.963025404625672e-05,
+      "loss": 5.7253,
+      "step": 65
+    },
+    {
+      "epoch": 0.002697014894877715,
+      "grad_norm": 15.389847755432129,
+      "learning_rate": 9.961784232875426e-05,
+      "loss": 6.1741,
+      "step": 66
+    },
+    {
+      "epoch": 0.0027378787569213164,
+      "grad_norm": 11.978641510009766,
+      "learning_rate": 9.960522651518484e-05,
+      "loss": 5.884,
+      "step": 67
+    },
+    {
+      "epoch": 0.0027787426189649182,
+      "grad_norm": 15.073978424072266,
+      "learning_rate": 9.959240665744186e-05,
+      "loss": 7.4091,
+      "step": 68
+    },
+    {
+      "epoch": 0.00281960648100852,
+      "grad_norm": 15.136378288269043,
+      "learning_rate": 9.9579382808258e-05,
+      "loss": 6.3326,
+      "step": 69
+    },
+    {
+      "epoch": 0.002860470343052122,
+      "grad_norm": 15.58938980102539,
+      "learning_rate": 9.956615502120504e-05,
+      "loss": 5.9821,
+      "step": 70
+    },
+    {
+      "epoch": 0.0029013342050957236,
+      "grad_norm": 12.80412483215332,
+      "learning_rate": 9.955272335069363e-05,
+      "loss": 5.5844,
+      "step": 71
+    },
+    {
+      "epoch": 0.0029421980671393254,
+      "grad_norm": 17.568445205688477,
+      "learning_rate": 9.953908785197312e-05,
+      "loss": 6.998,
+      "step": 72
+    },
+    {
+      "epoch": 0.002983061929182927,
+      "grad_norm": 19.638111114501953,
+      "learning_rate": 9.952524858113116e-05,
+      "loss": 7.2029,
+      "step": 73
+    },
+    {
+      "epoch": 0.003023925791226529,
+      "grad_norm": 15.820956230163574,
+      "learning_rate": 9.951120559509373e-05,
+      "loss": 6.6457,
+      "step": 74
+    },
+    {
+      "epoch": 0.0030647896532701308,
+      "grad_norm": 15.655069351196289,
+      "learning_rate": 9.949695895162463e-05,
+      "loss": 6.9675,
+      "step": 75
+    },
+    {
+      "epoch": 0.003105653515313732,
+      "grad_norm": 15.471683502197266,
+      "learning_rate": 9.948250870932547e-05,
+      "loss": 6.711,
+      "step": 76
+    },
+    {
+      "epoch": 0.003146517377357334,
+      "grad_norm": 16.202917098999023,
+      "learning_rate": 9.94678549276353e-05,
+      "loss": 5.8695,
+      "step": 77
+    },
+    {
+      "epoch": 0.0031873812394009357,
+      "grad_norm": 17.095972061157227,
+      "learning_rate": 9.945299766683041e-05,
+      "loss": 7.0255,
+      "step": 78
+    },
+    {
+      "epoch": 0.0032282451014445375,
+      "grad_norm": 13.527962684631348,
+      "learning_rate": 9.943793698802407e-05,
+      "loss": 5.2299,
+      "step": 79
+    },
+    {
+      "epoch": 0.0032691089634881393,
+      "grad_norm": 14.281603813171387,
+      "learning_rate": 9.942267295316625e-05,
+      "loss": 6.2512,
+      "step": 80
+    },
+    {
+      "epoch": 0.003309972825531741,
+      "grad_norm": 14.560650825500488,
+      "learning_rate": 9.940720562504346e-05,
+      "loss": 7.0633,
+      "step": 81
+    },
+    {
+      "epoch": 0.003350836687575343,
+      "grad_norm": 14.220345497131348,
+      "learning_rate": 9.939153506727839e-05,
+      "loss": 6.0255,
+      "step": 82
+    },
+    {
+      "epoch": 0.0033917005496189446,
+      "grad_norm": 14.54580020904541,
+      "learning_rate": 9.937566134432967e-05,
+      "loss": 6.7001,
+      "step": 83
+    },
+    {
+      "epoch": 0.0034325644116625464,
+      "grad_norm": 13.996087074279785,
+      "learning_rate": 9.935958452149168e-05,
+      "loss": 6.1261,
+      "step": 84
+    },
+    {
+      "epoch": 0.0034734282737061478,
+      "grad_norm": 22.87812614440918,
+      "learning_rate": 9.934330466489414e-05,
+      "loss": 7.9368,
+      "step": 85
+    },
+    {
+      "epoch": 0.0035142921357497496,
+      "grad_norm": 17.530309677124023,
+      "learning_rate": 9.9326821841502e-05,
+      "loss": 7.0329,
+      "step": 86
+    },
+    {
+      "epoch": 0.0035551559977933514,
+      "grad_norm": 14.398816108703613,
+      "learning_rate": 9.931013611911505e-05,
+      "loss": 5.4819,
+      "step": 87
+    },
+    {
+      "epoch": 0.003596019859836953,
+      "grad_norm": 17.408000946044922,
+      "learning_rate": 9.929324756636766e-05,
+      "loss": 6.2423,
+      "step": 88
+    },
+    {
+      "epoch": 0.003636883721880555,
+      "grad_norm": 19.482257843017578,
+      "learning_rate": 9.927615625272856e-05,
+      "loss": 7.5902,
+      "step": 89
+    },
+    {
+      "epoch": 0.0036777475839241567,
+      "grad_norm": 18.543699264526367,
+      "learning_rate": 9.925886224850047e-05,
+      "loss": 6.5135,
+      "step": 90
+    },
+    {
+      "epoch": 0.0037186114459677585,
+      "grad_norm": 16.311019897460938,
+      "learning_rate": 9.924136562481984e-05,
+      "loss": 6.8555,
+      "step": 91
+    },
+    {
+      "epoch": 0.0037594753080113603,
+      "grad_norm": 22.563644409179688,
+      "learning_rate": 9.922366645365663e-05,
+      "loss": 7.355,
+      "step": 92
+    },
+    {
+      "epoch": 0.003800339170054962,
+      "grad_norm": 20.504785537719727,
+      "learning_rate": 9.920576480781389e-05,
+      "loss": 5.9183,
+      "step": 93
+    },
+    {
+      "epoch": 0.0038412030320985635,
+      "grad_norm": 25.04275894165039,
+      "learning_rate": 9.918766076092754e-05,
+      "loss": 6.8876,
+      "step": 94
+    },
+    {
+      "epoch": 0.0038820668941421652,
+      "grad_norm": 26.663280487060547,
+      "learning_rate": 9.916935438746604e-05,
+      "loss": 6.1629,
+      "step": 95
+    },
+    {
+      "epoch": 0.0039229307561857675,
+      "grad_norm": 24.873512268066406,
+      "learning_rate": 9.915084576273013e-05,
+      "loss": 7.2213,
+      "step": 96
+    },
+    {
+      "epoch": 0.003963794618229369,
+      "grad_norm": 24.14523696899414,
+      "learning_rate": 9.91321349628524e-05,
+      "loss": 8.0812,
+      "step": 97
+    },
+    {
+      "epoch": 0.00400465848027297,
+      "grad_norm": 25.039981842041016,
+      "learning_rate": 9.911322206479719e-05,
+      "loss": 6.9913,
+      "step": 98
+    },
+    {
+      "epoch": 0.004045522342316572,
+      "grad_norm": 29.90752601623535,
+      "learning_rate": 9.909410714635999e-05,
+      "loss": 8.0443,
+      "step": 99
+    },
+    {
+      "epoch": 0.004086386204360174,
+      "grad_norm": 32.809085845947266,
+      "learning_rate": 9.907479028616739e-05,
+      "loss": 7.7016,
+      "step": 100
+    },
+    {
+      "epoch": 0.0041272500664037755,
+      "grad_norm": 14.157573699951172,
+      "learning_rate": 9.90552715636766e-05,
+      "loss": 6.3563,
+      "step": 101
+    },
+    {
+      "epoch": 0.004168113928447377,
+      "grad_norm": 15.851997375488281,
+      "learning_rate": 9.903555105917514e-05,
+      "loss": 7.8688,
+      "step": 102
+    },
+    {
+      "epoch": 0.004208977790490979,
+      "grad_norm": 15.693613052368164,
+      "learning_rate": 9.901562885378057e-05,
+      "loss": 6.3947,
+      "step": 103
+    },
+    {
+      "epoch": 0.004249841652534581,
+      "grad_norm": 15.1848726272583,
+      "learning_rate": 9.899550502944009e-05,
+      "loss": 6.8641,
+      "step": 104
+    },
+    {
+      "epoch": 0.004290705514578183,
+      "grad_norm": 14.605127334594727,
+      "learning_rate": 9.897517966893023e-05,
+      "loss": 6.8249,
+      "step": 105
+    },
+    {
+      "epoch": 0.0043315693766217845,
+      "grad_norm": 14.972675323486328,
+      "learning_rate": 9.895465285585655e-05,
+      "loss": 5.3184,
+      "step": 106
+    },
+    {
+      "epoch": 0.004372433238665386,
+      "grad_norm": 12.17654037475586,
+      "learning_rate": 9.89339246746532e-05,
+      "loss": 5.4645,
+      "step": 107
+    },
+    {
+      "epoch": 0.004413297100708988,
+      "grad_norm": 12.122211456298828,
+      "learning_rate": 9.891299521058268e-05,
+      "loss": 5.6194,
+      "step": 108
+    },
+    {
+      "epoch": 0.00445416096275259,
+      "grad_norm": 12.825034141540527,
+      "learning_rate": 9.889186454973543e-05,
+      "loss": 5.8638,
+      "step": 109
+    },
+    {
+      "epoch": 0.004495024824796192,
+      "grad_norm": 12.313605308532715,
+      "learning_rate": 9.887053277902942e-05,
+      "loss": 5.9993,
+      "step": 110
+    },
+    {
+      "epoch": 0.0045358886868397934,
+      "grad_norm": 16.425357818603516,
+      "learning_rate": 9.884899998620998e-05,
+      "loss": 7.2859,
+      "step": 111
+    },
+    {
+      "epoch": 0.004576752548883395,
+      "grad_norm": 13.425078392028809,
+      "learning_rate": 9.88272662598492e-05,
+      "loss": 5.4946,
+      "step": 112
+    },
+    {
+      "epoch": 0.004617616410926997,
+      "grad_norm": 12.7623872756958,
+      "learning_rate": 9.880533168934575e-05,
+      "loss": 6.1761,
+      "step": 113
+    },
+    {
+      "epoch": 0.004658480272970599,
+      "grad_norm": 15.71284294128418,
+      "learning_rate": 9.878319636492441e-05,
+      "loss": 6.9623,
+      "step": 114
+    },
+    {
+      "epoch": 0.004699344135014201,
+      "grad_norm": 12.192813873291016,
+      "learning_rate": 9.876086037763575e-05,
+      "loss": 5.2644,
+      "step": 115
+    },
+    {
+      "epoch": 0.0047402079970578015,
+      "grad_norm": 14.656291007995605,
+      "learning_rate": 9.873832381935575e-05,
+      "loss": 6.2134,
+      "step": 116
+    },
+    {
+      "epoch": 0.004781071859101403,
+      "grad_norm": 13.87183666229248,
+      "learning_rate": 9.871558678278537e-05,
+      "loss": 6.0312,
+      "step": 117
+    },
+    {
+      "epoch": 0.004821935721145005,
+      "grad_norm": 14.280610084533691,
+      "learning_rate": 9.869264936145027e-05,
+      "loss": 6.3556,
+      "step": 118
+    },
+    {
+      "epoch": 0.004862799583188607,
+      "grad_norm": 13.133746147155762,
+      "learning_rate": 9.866951164970028e-05,
+      "loss": 6.2905,
+      "step": 119
+    },
+    {
+      "epoch": 0.004903663445232209,
+      "grad_norm": 32.18925094604492,
+      "learning_rate": 9.86461737427092e-05,
+      "loss": 6.5927,
+      "step": 120
+    },
+    {
+      "epoch": 0.0049445273072758105,
+      "grad_norm": 12.16728687286377,
+      "learning_rate": 9.862263573647422e-05,
+      "loss": 6.272,
+      "step": 121
+    },
+    {
+      "epoch": 0.004985391169319412,
+      "grad_norm": 14.562088966369629,
+      "learning_rate": 9.859889772781565e-05,
+      "loss": 6.9461,
+      "step": 122
+    },
+    {
+      "epoch": 0.005026255031363014,
+      "grad_norm": 18.74805450439453,
+      "learning_rate": 9.857495981437648e-05,
+      "loss": 6.2463,
+      "step": 123
+    },
+    {
+      "epoch": 0.005067118893406616,
+      "grad_norm": 19.05704689025879,
+      "learning_rate": 9.855082209462197e-05,
+      "loss": 7.1299,
+      "step": 124
+    },
+    {
+      "epoch": 0.005107982755450218,
+      "grad_norm": 14.966846466064453,
+      "learning_rate": 9.852648466783927e-05,
+      "loss": 6.7958,
+      "step": 125
+    },
+    {
+      "epoch": 0.005148846617493819,
+      "grad_norm": 14.65507698059082,
+      "learning_rate": 9.850194763413696e-05,
+      "loss": 6.9319,
+      "step": 126
+    },
+    {
+      "epoch": 0.005189710479537421,
+      "grad_norm": 13.518552780151367,
+      "learning_rate": 9.847721109444473e-05,
+      "loss": 5.5841,
+      "step": 127
+    },
+    {
+      "epoch": 0.005230574341581023,
+      "grad_norm": 15.542891502380371,
+      "learning_rate": 9.845227515051286e-05,
+      "loss": 6.6425,
+      "step": 128
+    },
+    {
+      "epoch": 0.005271438203624625,
+      "grad_norm": 20.072572708129883,
+      "learning_rate": 9.84271399049119e-05,
+      "loss": 6.2362,
+      "step": 129
+    },
+    {
+      "epoch": 0.005312302065668227,
+      "grad_norm": 15.439754486083984,
+      "learning_rate": 9.840180546103215e-05,
+      "loss": 7.0394,
+      "step": 130
+    },
+    {
+      "epoch": 0.005353165927711828,
+      "grad_norm": 15.581273078918457,
+      "learning_rate": 9.837627192308332e-05,
+      "loss": 7.1262,
+      "step": 131
+    },
+    {
+      "epoch": 0.00539402978975543,
+      "grad_norm": 14.833819389343262,
+      "learning_rate": 9.835053939609407e-05,
+      "loss": 6.9087,
+      "step": 132
+    },
+    {
+      "epoch": 0.005434893651799032,
+      "grad_norm": 15.577768325805664,
+      "learning_rate": 9.832460798591151e-05,
+      "loss": 5.5607,
+      "step": 133
+    },
+    {
+      "epoch": 0.005475757513842633,
+      "grad_norm": 20.256847381591797,
+      "learning_rate": 9.829847779920092e-05,
+      "loss": 7.8506,
+      "step": 134
+    },
+    {
+      "epoch": 0.005516621375886235,
+      "grad_norm": 16.520177841186523,
+      "learning_rate": 9.827214894344514e-05,
+      "loss": 7.1765,
+      "step": 135
+    },
+    {
+      "epoch": 0.0055574852379298365,
+      "grad_norm": 17.26201629638672,
+      "learning_rate": 9.824562152694427e-05,
+      "loss": 7.3264,
+      "step": 136
+    },
+    {
+      "epoch": 0.005598349099973438,
+      "grad_norm": 18.10712432861328,
+      "learning_rate": 9.821889565881514e-05,
+      "loss": 8.0457,
+      "step": 137
+    },
+    {
+      "epoch": 0.00563921296201704,
+      "grad_norm": 16.80295181274414,
+      "learning_rate": 9.819197144899085e-05,
+      "loss": 6.7775,
+      "step": 138
+    },
+    {
+      "epoch": 0.005680076824060642,
+      "grad_norm": 18.972105026245117,
+      "learning_rate": 9.816484900822038e-05,
+      "loss": 6.746,
+      "step": 139
+    },
+    {
+      "epoch": 0.005720940686104244,
+      "grad_norm": 34.66987228393555,
+      "learning_rate": 9.813752844806813e-05,
+      "loss": 7.5608,
+      "step": 140
+    },
+    {
+      "epoch": 0.005761804548147845,
+      "grad_norm": 18.763248443603516,
+      "learning_rate": 9.811000988091338e-05,
+      "loss": 7.1654,
+      "step": 141
+    },
+    {
+      "epoch": 0.005802668410191447,
+      "grad_norm": 18.545637130737305,
+      "learning_rate": 9.808229341994995e-05,
+      "loss": 7.113,
+      "step": 142
+    },
+    {
+      "epoch": 0.005843532272235049,
+      "grad_norm": 22.365375518798828,
+      "learning_rate": 9.805437917918559e-05,
+      "loss": 7.0313,
+      "step": 143
+    },
+    {
+      "epoch": 0.005884396134278651,
+      "grad_norm": 16.55803871154785,
+      "learning_rate": 9.802626727344165e-05,
+      "loss": 6.1085,
+      "step": 144
+    },
+    {
+      "epoch": 0.0059252599963222526,
+      "grad_norm": 21.393871307373047,
+      "learning_rate": 9.799795781835252e-05,
+      "loss": 7.158,
+      "step": 145
+    },
+    {
+      "epoch": 0.005966123858365854,
+      "grad_norm": 21.92884063720703,
+      "learning_rate": 9.796945093036523e-05,
+      "loss": 6.3873,
+      "step": 146
+    },
+    {
+      "epoch": 0.006006987720409456,
+      "grad_norm": 27.711198806762695,
+      "learning_rate": 9.794074672673883e-05,
+      "loss": 6.8561,
+      "step": 147
+    },
+    {
+      "epoch": 0.006047851582453058,
+      "grad_norm": 23.660367965698242,
+      "learning_rate": 9.791184532554409e-05,
+      "loss": 5.8218,
+      "step": 148
+    },
+    {
+      "epoch": 0.00608871544449666,
+      "grad_norm": 22.727230072021484,
+      "learning_rate": 9.788274684566289e-05,
+      "loss": 6.5743,
+      "step": 149
+    },
+    {
+      "epoch": 0.0061295793065402615,
+      "grad_norm": 36.77485275268555,
+      "learning_rate": 9.785345140678775e-05,
+      "loss": 9.6315,
+      "step": 150
+    },
+    {
+      "epoch": 0.006170443168583862,
+      "grad_norm": 16.30278968811035,
+      "learning_rate": 9.782395912942135e-05,
+      "loss": 7.8331,
+      "step": 151
+    },
+    {
+      "epoch": 0.006211307030627464,
+      "grad_norm": 11.499253273010254,
+      "learning_rate": 9.77942701348761e-05,
+      "loss": 5.9781,
+      "step": 152
+    },
+    {
+      "epoch": 0.006252170892671066,
+      "grad_norm": 14.270246505737305,
+      "learning_rate": 9.776438454527351e-05,
+      "loss": 6.5819,
+      "step": 153
+    },
+    {
+      "epoch": 0.006293034754714668,
+      "grad_norm": 12.351813316345215,
+      "learning_rate": 9.773430248354376e-05,
+      "loss": 5.3348,
+      "step": 154
+    },
+    {
+      "epoch": 0.00633389861675827,
+      "grad_norm": 14.202811241149902,
+      "learning_rate": 9.770402407342523e-05,
+      "loss": 6.7854,
+      "step": 155
+    },
+    {
+      "epoch": 0.006374762478801871,
+      "grad_norm": 15.483548164367676,
+      "learning_rate": 9.767354943946395e-05,
+      "loss": 6.5654,
+      "step": 156
+    },
+    {
+      "epoch": 0.006415626340845473,
+      "grad_norm": 11.232331275939941,
+      "learning_rate": 9.764287870701305e-05,
+      "loss": 5.0283,
+      "step": 157
+    },
+    {
+      "epoch": 0.006456490202889075,
+      "grad_norm": 9.863425254821777,
+      "learning_rate": 9.761201200223231e-05,
+      "loss": 5.3844,
+      "step": 158
+    },
+    {
+      "epoch": 0.006497354064932677,
+      "grad_norm": 13.30897045135498,
+      "learning_rate": 9.758094945208763e-05,
+      "loss": 5.4593,
+      "step": 159
+    },
+    {
+      "epoch": 0.0065382179269762785,
+      "grad_norm": 14.575927734375,
+      "learning_rate": 9.754969118435042e-05,
+      "loss": 6.2094,
+      "step": 160
+    },
+    {
+      "epoch": 0.00657908178901988,
+      "grad_norm": 13.29054069519043,
+      "learning_rate": 9.751823732759726e-05,
+      "loss": 7.1374,
+      "step": 161
+    },
+    {
+      "epoch": 0.006619945651063482,
+      "grad_norm": 12.356842994689941,
+      "learning_rate": 9.748658801120916e-05,
+      "loss": 5.864,
+      "step": 162
+    },
+    {
+      "epoch": 0.006660809513107084,
+      "grad_norm": 14.566633224487305,
+      "learning_rate": 9.745474336537119e-05,
+      "loss": 6.481,
+      "step": 163
+    },
+    {
+      "epoch": 0.006701673375150686,
+      "grad_norm": 14.574312210083008,
+      "learning_rate": 9.742270352107185e-05,
+      "loss": 6.7697,
+      "step": 164
+    },
+    {
+      "epoch": 0.0067425372371942875,
+      "grad_norm": 14.898052215576172,
+      "learning_rate": 9.739046861010255e-05,
+      "loss": 7.1438,
+      "step": 165
+    },
+    {
+      "epoch": 0.006783401099237889,
+      "grad_norm": 16.09490394592285,
+      "learning_rate": 9.735803876505711e-05,
+      "loss": 6.1778,
+      "step": 166
+    },
+    {
+      "epoch": 0.006824264961281491,
+      "grad_norm": 20.63564682006836,
+      "learning_rate": 9.732541411933115e-05,
+      "loss": 6.2028,
+      "step": 167
+    },
+    {
+      "epoch": 0.006865128823325093,
+      "grad_norm": 22.837800979614258,
+      "learning_rate": 9.729259480712162e-05,
+      "loss": 6.828,
+      "step": 168
+    },
+    {
+      "epoch": 0.006905992685368694,
+      "grad_norm": 16.11451530456543,
+      "learning_rate": 9.725958096342616e-05,
+      "loss": 7.4898,
+      "step": 169
+    },
+    {
+      "epoch": 0.0069468565474122956,
+      "grad_norm": 12.31192398071289,
+      "learning_rate": 9.722637272404262e-05,
+      "loss": 6.4009,
+      "step": 170
+    },
+    {
+      "epoch": 0.006987720409455897,
+      "grad_norm": 14.875618934631348,
+      "learning_rate": 9.719297022556845e-05,
+      "loss": 6.2035,
+      "step": 171
+    },
+    {
+      "epoch": 0.007028584271499499,
+      "grad_norm": 13.63744068145752,
+      "learning_rate": 9.715937360540017e-05,
+      "loss": 6.0225,
+      "step": 172
+    },
+    {
+      "epoch": 0.007069448133543101,
+      "grad_norm": 21.234041213989258,
+      "learning_rate": 9.712558300173279e-05,
+      "loss": 7.4878,
+      "step": 173
+    },
+    {
+      "epoch": 0.007110311995586703,
+      "grad_norm": 14.23675537109375,
+      "learning_rate": 9.709159855355922e-05,
+      "loss": 6.515,
+      "step": 174
+    },
+    {
+      "epoch": 0.0071511758576303045,
+      "grad_norm": 17.055084228515625,
+      "learning_rate": 9.705742040066976e-05,
+      "loss": 6.3619,
+      "step": 175
+    },
+    {
+      "epoch": 0.007192039719673906,
+      "grad_norm": 14.412619590759277,
+      "learning_rate": 9.702304868365147e-05,
+      "loss": 7.1881,
+      "step": 176
+    },
+    {
+      "epoch": 0.007232903581717508,
+      "grad_norm": 18.581859588623047,
+      "learning_rate": 9.698848354388759e-05,
+      "loss": 6.9419,
+      "step": 177
+    },
+    {
+      "epoch": 0.00727376744376111,
+      "grad_norm": 13.657668113708496,
+      "learning_rate": 9.695372512355703e-05,
+      "loss": 7.4206,
+      "step": 178
+    },
+    {
+      "epoch": 0.007314631305804712,
+      "grad_norm": 14.230124473571777,
+      "learning_rate": 9.691877356563366e-05,
+      "loss": 5.8499,
+      "step": 179
+    },
+    {
+      "epoch": 0.0073554951678483135,
+      "grad_norm": 16.437936782836914,
+      "learning_rate": 9.688362901388586e-05,
+      "loss": 7.6793,
+      "step": 180
+    },
+    {
+      "epoch": 0.007396359029891915,
+      "grad_norm": 18.00482749938965,
+      "learning_rate": 9.684829161287583e-05,
+      "loss": 7.3289,
+      "step": 181
+    },
+    {
+      "epoch": 0.007437222891935517,
+      "grad_norm": 17.923187255859375,
+      "learning_rate": 9.681276150795903e-05,
+      "loss": 7.1,
+      "step": 182
+    },
+    {
+      "epoch": 0.007478086753979119,
+      "grad_norm": 17.70640754699707,
+      "learning_rate": 9.677703884528362e-05,
+      "loss": 7.6045,
+      "step": 183
+    },
+    {
+      "epoch": 0.007518950616022721,
+      "grad_norm": 13.995911598205566,
+      "learning_rate": 9.674112377178975e-05,
+      "loss": 5.7522,
+      "step": 184
+    },
+    {
+      "epoch": 0.007559814478066322,
+      "grad_norm": 17.230287551879883,
+      "learning_rate": 9.670501643520905e-05,
+      "loss": 7.014,
+      "step": 185
+    },
+    {
+      "epoch": 0.007600678340109924,
+      "grad_norm": 17.801237106323242,
+      "learning_rate": 9.666871698406403e-05,
+      "loss": 7.3874,
+      "step": 186
+    },
+    {
+      "epoch": 0.007641542202153525,
+      "grad_norm": 16.83843421936035,
+      "learning_rate": 9.66322255676674e-05,
+      "loss": 6.0555,
+      "step": 187
+    },
+    {
+      "epoch": 0.007682406064197127,
+      "grad_norm": 16.625722885131836,
+      "learning_rate": 9.659554233612153e-05,
+      "loss": 6.8826,
+      "step": 188
+    },
+    {
+      "epoch": 0.007723269926240729,
+      "grad_norm": 13.392539978027344,
+      "learning_rate": 9.655866744031777e-05,
+      "loss": 5.4981,
+      "step": 189
+    },
+    {
+      "epoch": 0.0077641337882843305,
+      "grad_norm": 15.59808349609375,
+      "learning_rate": 9.652160103193583e-05,
+      "loss": 7.0037,
+      "step": 190
+    },
+    {
+      "epoch": 0.007804997650327932,
+      "grad_norm": 21.588523864746094,
+      "learning_rate": 9.648434326344322e-05,
+      "loss": 7.1183,
+      "step": 191
+    },
+    {
+      "epoch": 0.007845861512371535,
+      "grad_norm": 17.634723663330078,
+      "learning_rate": 9.644689428809456e-05,
+      "loss": 6.4932,
+      "step": 192
+    },
+    {
+      "epoch": 0.007886725374415137,
+      "grad_norm": 18.702896118164062,
+      "learning_rate": 9.640925425993101e-05,
+      "loss": 6.4525,
+      "step": 193
+    },
+    {
+      "epoch": 0.007927589236458739,
+      "grad_norm": 13.623326301574707,
+      "learning_rate": 9.637142333377953e-05,
+      "loss": 5.2151,
+      "step": 194
+    },
+    {
+      "epoch": 0.00796845309850234,
+      "grad_norm": 15.406729698181152,
+      "learning_rate": 9.633340166525238e-05,
+      "loss": 6.0974,
+      "step": 195
+    },
+    {
+      "epoch": 0.00800931696054594,
+      "grad_norm": 18.72486114501953,
+      "learning_rate": 9.629518941074639e-05,
+      "loss": 6.6543,
+      "step": 196
+    },
+    {
+      "epoch": 0.008050180822589542,
+      "grad_norm": 20.652475357055664,
+      "learning_rate": 9.625678672744232e-05,
+      "loss": 6.4423,
+      "step": 197
+    },
+    {
+      "epoch": 0.008091044684633144,
+      "grad_norm": 20.862340927124023,
+      "learning_rate": 9.621819377330424e-05,
+      "loss": 6.86,
+      "step": 198
+    },
+    {
+      "epoch": 0.008131908546676746,
+      "grad_norm": 21.109661102294922,
+      "learning_rate": 9.617941070707889e-05,
+      "loss": 5.8087,
+      "step": 199
+    },
+    {
+      "epoch": 0.008172772408720348,
+      "grad_norm": 52.470829010009766,
+      "learning_rate": 9.614043768829499e-05,
+      "loss": 9.3251,
+      "step": 200
+    },
+    {
+      "epoch": 0.00821363627076395,
+      "grad_norm": 10.258379936218262,
+      "learning_rate": 9.610127487726263e-05,
+      "loss": 5.4524,
+      "step": 201
+    },
+    {
+      "epoch": 0.008254500132807551,
+      "grad_norm": 13.210298538208008,
+      "learning_rate": 9.606192243507254e-05,
+      "loss": 5.9119,
+      "step": 202
+    },
+    {
+      "epoch": 0.008295363994851153,
+      "grad_norm": 12.499059677124023,
+      "learning_rate": 9.602238052359551e-05,
+      "loss": 5.0512,
+      "step": 203
+    },
+    {
+      "epoch": 0.008336227856894755,
+      "grad_norm": 10.580381393432617,
+      "learning_rate": 9.598264930548169e-05,
+      "loss": 5.2318,
+      "step": 204
+    },
+    {
+      "epoch": 0.008377091718938356,
+      "grad_norm": 13.957743644714355,
+      "learning_rate": 9.594272894415986e-05,
+      "loss": 6.543,
+      "step": 205
+    },
+    {
+      "epoch": 0.008417955580981958,
+      "grad_norm": 13.090479850769043,
+      "learning_rate": 9.590261960383686e-05,
+      "loss": 6.1888,
+      "step": 206
+    },
+    {
+      "epoch": 0.00845881944302556,
+      "grad_norm": 13.895710945129395,
+      "learning_rate": 9.58623214494969e-05,
+      "loss": 6.2908,
+      "step": 207
+    },
+    {
+      "epoch": 0.008499683305069162,
+      "grad_norm": 10.926653861999512,
+      "learning_rate": 9.582183464690078e-05,
+      "loss": 5.6924,
+      "step": 208
+    },
+    {
+      "epoch": 0.008540547167112764,
+      "grad_norm": 13.321636199951172,
+      "learning_rate": 9.578115936258531e-05,
+      "loss": 7.1836,
+      "step": 209
+    },
+    {
+      "epoch": 0.008581411029156365,
+      "grad_norm": 11.270588874816895,
+      "learning_rate": 9.574029576386261e-05,
+      "loss": 6.0644,
+      "step": 210
+    },
+    {
+      "epoch": 0.008622274891199967,
+      "grad_norm": 12.152750015258789,
+      "learning_rate": 9.569924401881936e-05,
+      "loss": 5.7929,
+      "step": 211
+    },
+    {
+      "epoch": 0.008663138753243569,
+      "grad_norm": 13.030095100402832,
+      "learning_rate": 9.565800429631619e-05,
+      "loss": 6.2535,
+      "step": 212
+    },
+    {
+      "epoch": 0.00870400261528717,
+      "grad_norm": 14.503040313720703,
+      "learning_rate": 9.561657676598698e-05,
+      "loss": 6.5376,
+      "step": 213
+    },
+    {
+      "epoch": 0.008744866477330773,
+      "grad_norm": 11.103546142578125,
+      "learning_rate": 9.557496159823804e-05,
+      "loss": 5.4463,
+      "step": 214
+    },
+    {
+      "epoch": 0.008785730339374374,
+      "grad_norm": 10.801084518432617,
+      "learning_rate": 9.553315896424758e-05,
+      "loss": 5.309,
+      "step": 215
+    },
+    {
+      "epoch": 0.008826594201417976,
+      "grad_norm": 15.308627128601074,
+      "learning_rate": 9.549116903596488e-05,
+      "loss": 7.3242,
+      "step": 216
+    },
+    {
+      "epoch": 0.008867458063461578,
+      "grad_norm": 14.178114891052246,
+      "learning_rate": 9.544899198610968e-05,
+      "loss": 6.6382,
+      "step": 217
+    },
+    {
+      "epoch": 0.00890832192550518,
+      "grad_norm": 12.048151969909668,
+      "learning_rate": 9.540662798817137e-05,
+      "loss": 4.8189,
+      "step": 218
+    },
+    {
+      "epoch": 0.008949185787548782,
+      "grad_norm": 13.001554489135742,
+      "learning_rate": 9.536407721640832e-05,
+      "loss": 6.0842,
+      "step": 219
+    },
+    {
+      "epoch": 0.008990049649592383,
+      "grad_norm": 13.918134689331055,
+      "learning_rate": 9.532133984584721e-05,
+      "loss": 6.1108,
+      "step": 220
+    },
+    {
+      "epoch": 0.009030913511635985,
+      "grad_norm": 14.916069030761719,
+      "learning_rate": 9.527841605228224e-05,
+      "loss": 6.9456,
+      "step": 221
+    },
+    {
+      "epoch": 0.009071777373679587,
+      "grad_norm": 15.989644050598145,
+      "learning_rate": 9.523530601227445e-05,
+      "loss": 6.5354,
+      "step": 222
+    },
+    {
+      "epoch": 0.009112641235723189,
+      "grad_norm": 15.19133472442627,
+      "learning_rate": 9.519200990315096e-05,
+      "loss": 6.737,
+      "step": 223
+    },
+    {
+      "epoch": 0.00915350509776679,
+      "grad_norm": 12.79016399383545,
+      "learning_rate": 9.514852790300427e-05,
+      "loss": 7.3384,
+      "step": 224
+    },
+    {
+      "epoch": 0.009194368959810392,
+      "grad_norm": 13.54174518585205,
+      "learning_rate": 9.510486019069153e-05,
+      "loss": 6.5956,
+      "step": 225
+    },
+    {
+      "epoch": 0.009235232821853994,
+      "grad_norm": 12.780714988708496,
+      "learning_rate": 9.506100694583378e-05,
+      "loss": 5.4059,
+      "step": 226
+    },
+    {
+      "epoch": 0.009276096683897596,
+      "grad_norm": 15.604456901550293,
+      "learning_rate": 9.501696834881518e-05,
+      "loss": 7.5434,
+      "step": 227
+    },
+    {
+      "epoch": 0.009316960545941198,
+      "grad_norm": 13.501986503601074,
+      "learning_rate": 9.49727445807824e-05,
+      "loss": 5.4853,
+      "step": 228
+    },
+    {
+      "epoch": 0.0093578244079848,
+      "grad_norm": 19.180065155029297,
+      "learning_rate": 9.492833582364371e-05,
+      "loss": 6.7472,
+      "step": 229
+    },
+    {
+      "epoch": 0.009398688270028401,
+      "grad_norm": 14.34424114227295,
+      "learning_rate": 9.488374226006836e-05,
+      "loss": 6.3557,
+      "step": 230
+    },
+    {
+      "epoch": 0.009439552132072001,
+      "grad_norm": 16.343975067138672,
+      "learning_rate": 9.483896407348569e-05,
+      "loss": 6.3415,
+      "step": 231
+    },
+    {
+      "epoch": 0.009480415994115603,
+      "grad_norm": 14.910673141479492,
+      "learning_rate": 9.479400144808457e-05,
+      "loss": 6.0064,
+      "step": 232
+    },
+    {
+      "epoch": 0.009521279856159205,
+      "grad_norm": 14.328391075134277,
+      "learning_rate": 9.474885456881248e-05,
+      "loss": 6.7544,
+      "step": 233
+    },
+    {
+      "epoch": 0.009562143718202807,
+      "grad_norm": 17.46697425842285,
+      "learning_rate": 9.470352362137478e-05,
+      "loss": 6.3016,
+      "step": 234
+    },
+    {
+      "epoch": 0.009603007580246408,
+      "grad_norm": 14.978525161743164,
+      "learning_rate": 9.4658008792234e-05,
+      "loss": 6.0921,
+      "step": 235
+    },
+    {
+      "epoch": 0.00964387144229001,
+      "grad_norm": 14.764670372009277,
+      "learning_rate": 9.461231026860904e-05,
+      "loss": 5.9274,
+      "step": 236
+    },
+    {
+      "epoch": 0.009684735304333612,
+      "grad_norm": 14.451186180114746,
+      "learning_rate": 9.456642823847439e-05,
+      "loss": 6.0484,
+      "step": 237
+    },
+    {
+      "epoch": 0.009725599166377214,
+      "grad_norm": 20.823772430419922,
+      "learning_rate": 9.45203628905594e-05,
+      "loss": 8.1228,
+      "step": 238
+    },
+    {
+      "epoch": 0.009766463028420816,
+      "grad_norm": 22.068574905395508,
+      "learning_rate": 9.447411441434741e-05,
+      "loss": 6.3468,
+      "step": 239
+    },
+    {
+      "epoch": 0.009807326890464417,
+      "grad_norm": 18.896333694458008,
+      "learning_rate": 9.442768300007511e-05,
+      "loss": 7.1779,
+      "step": 240
+    },
+    {
+      "epoch": 0.00984819075250802,
+      "grad_norm": 19.249156951904297,
+      "learning_rate": 9.43810688387316e-05,
+      "loss": 6.4586,
+      "step": 241
+    },
+    {
+      "epoch": 0.009889054614551621,
+      "grad_norm": 17.050535202026367,
+      "learning_rate": 9.433427212205774e-05,
+      "loss": 6.6014,
+      "step": 242
+    },
+    {
+      "epoch": 0.009929918476595223,
+      "grad_norm": 25.83972930908203,
+      "learning_rate": 9.428729304254531e-05,
+      "loss": 6.9396,
+      "step": 243
+    },
+    {
+      "epoch": 0.009970782338638825,
+      "grad_norm": 17.18185806274414,
+      "learning_rate": 9.424013179343617e-05,
+      "loss": 6.1457,
+      "step": 244
+    },
+    {
+      "epoch": 0.010011646200682426,
+      "grad_norm": 22.017518997192383,
+      "learning_rate": 9.419278856872155e-05,
+      "loss": 5.9994,
+      "step": 245
+    },
+    {
+      "epoch": 0.010052510062726028,
+      "grad_norm": 24.557111740112305,
+      "learning_rate": 9.414526356314117e-05,
+      "loss": 7.577,
+      "step": 246
+    },
+    {
+      "epoch": 0.01009337392476963,
+      "grad_norm": 25.834148406982422,
+      "learning_rate": 9.409755697218253e-05,
+      "loss": 7.0775,
+      "step": 247
+    },
+    {
+      "epoch": 0.010134237786813232,
+      "grad_norm": 19.930910110473633,
+      "learning_rate": 9.404966899208003e-05,
+      "loss": 6.1123,
+      "step": 248
+    },
+    {
+      "epoch": 0.010175101648856833,
+      "grad_norm": 23.818077087402344,
+      "learning_rate": 9.400159981981418e-05,
+      "loss": 6.1753,
+      "step": 249
+    },
+    {
+      "epoch": 0.010215965510900435,
+      "grad_norm": 31.135684967041016,
+      "learning_rate": 9.39533496531108e-05,
+      "loss": 7.5009,
+      "step": 250
+    },
+    {
+      "epoch": 0.010256829372944037,
+      "grad_norm": 13.721549034118652,
+      "learning_rate": 9.390491869044024e-05,
+      "loss": 6.5444,
+      "step": 251
+    },
+    {
+      "epoch": 0.010297693234987639,
+      "grad_norm": 12.433771133422852,
+      "learning_rate": 9.385630713101649e-05,
+      "loss": 6.7709,
+      "step": 252
+    },
+    {
+      "epoch": 0.01033855709703124,
+      "grad_norm": 12.28031063079834,
+      "learning_rate": 9.38075151747964e-05,
+      "loss": 5.4516,
+      "step": 253
+    },
+    {
+      "epoch": 0.010379420959074842,
+      "grad_norm": 12.136971473693848,
+      "learning_rate": 9.375854302247889e-05,
+      "loss": 6.0733,
+      "step": 254
+    },
+    {
+      "epoch": 0.010420284821118444,
+      "grad_norm": 13.550774574279785,
+      "learning_rate": 9.370939087550405e-05,
+      "loss": 6.0526,
+      "step": 255
+    },
+    {
+      "epoch": 0.010461148683162046,
+      "grad_norm": 12.804445266723633,
+      "learning_rate": 9.36600589360524e-05,
+      "loss": 6.3769,
+      "step": 256
+    },
+    {
+      "epoch": 0.010502012545205648,
+      "grad_norm": 12.12086296081543,
+      "learning_rate": 9.361054740704398e-05,
+      "loss": 6.3975,
+      "step": 257
+    },
+    {
+      "epoch": 0.01054287640724925,
+      "grad_norm": 12.262513160705566,
+      "learning_rate": 9.356085649213755e-05,
+      "loss": 6.0431,
+      "step": 258
+    },
+    {
+      "epoch": 0.010583740269292851,
+      "grad_norm": 12.090657234191895,
+      "learning_rate": 9.351098639572972e-05,
+      "loss": 6.4593,
+      "step": 259
+    },
+    {
+      "epoch": 0.010624604131336453,
+      "grad_norm": 12.461016654968262,
+      "learning_rate": 9.346093732295423e-05,
+      "loss": 6.9296,
+      "step": 260
+    },
+    {
+      "epoch": 0.010665467993380055,
+      "grad_norm": 14.135153770446777,
+      "learning_rate": 9.341070947968089e-05,
+      "loss": 6.1179,
+      "step": 261
+    },
+    {
+      "epoch": 0.010706331855423657,
+      "grad_norm": 13.785672187805176,
+      "learning_rate": 9.336030307251495e-05,
+      "loss": 7.1686,
+      "step": 262
+    },
+    {
+      "epoch": 0.010747195717467259,
+      "grad_norm": 14.264972686767578,
+      "learning_rate": 9.330971830879614e-05,
+      "loss": 6.9697,
+      "step": 263
+    },
+    {
+      "epoch": 0.01078805957951086,
+      "grad_norm": 14.527408599853516,
+      "learning_rate": 9.32589553965978e-05,
+      "loss": 6.6708,
+      "step": 264
+    },
+    {
+      "epoch": 0.010828923441554462,
+      "grad_norm": 12.697749137878418,
+      "learning_rate": 9.320801454472608e-05,
+      "loss": 6.1965,
+      "step": 265
+    },
+    {
+      "epoch": 0.010869787303598064,
+      "grad_norm": 11.870662689208984,
+      "learning_rate": 9.315689596271908e-05,
+      "loss": 5.8277,
+      "step": 266
+    },
+    {
+      "epoch": 0.010910651165641664,
+      "grad_norm": 12.278385162353516,
+      "learning_rate": 9.310559986084594e-05,
+      "loss": 6.4373,
+      "step": 267
+    },
+    {
+      "epoch": 0.010951515027685266,
+      "grad_norm": 12.940865516662598,
+      "learning_rate": 9.305412645010605e-05,
+      "loss": 5.9068,
+      "step": 268
+    },
+    {
+      "epoch": 0.010992378889728868,
+      "grad_norm": 19.228952407836914,
+      "learning_rate": 9.300247594222804e-05,
+      "loss": 6.7197,
+      "step": 269
+    },
+    {
+      "epoch": 0.01103324275177247,
+      "grad_norm": 13.695125579833984,
+      "learning_rate": 9.29506485496691e-05,
+      "loss": 6.504,
+      "step": 270
+    },
+    {
+      "epoch": 0.011074106613816071,
+      "grad_norm": 15.966194152832031,
+      "learning_rate": 9.289864448561394e-05,
+      "loss": 6.5165,
+      "step": 271
+    },
+    {
+      "epoch": 0.011114970475859673,
+      "grad_norm": 12.456382751464844,
+      "learning_rate": 9.284646396397406e-05,
+      "loss": 5.1311,
+      "step": 272
+    },
+    {
+      "epoch": 0.011155834337903275,
+      "grad_norm": 14.079570770263672,
+      "learning_rate": 9.279410719938673e-05,
+      "loss": 5.6548,
+      "step": 273
+    },
+    {
+      "epoch": 0.011196698199946876,
+      "grad_norm": 15.152005195617676,
+      "learning_rate": 9.274157440721419e-05,
+      "loss": 6.5701,
+      "step": 274
+    },
+    {
+      "epoch": 0.011237562061990478,
+      "grad_norm": 14.330810546875,
+      "learning_rate": 9.268886580354273e-05,
+      "loss": 6.8853,
+      "step": 275
+    },
+    {
+      "epoch": 0.01127842592403408,
+      "grad_norm": 16.78351402282715,
+      "learning_rate": 9.263598160518186e-05,
+      "loss": 6.6548,
+      "step": 276
+    },
+    {
+      "epoch": 0.011319289786077682,
+      "grad_norm": 13.835537910461426,
+      "learning_rate": 9.258292202966333e-05,
+      "loss": 5.5245,
+      "step": 277
+    },
+    {
+      "epoch": 0.011360153648121284,
+      "grad_norm": 13.058305740356445,
+      "learning_rate": 9.252968729524031e-05,
+      "loss": 5.0774,
+      "step": 278
+    },
+    {
+      "epoch": 0.011401017510164885,
+      "grad_norm": 28.719289779663086,
+      "learning_rate": 9.247627762088643e-05,
+      "loss": 7.8642,
+      "step": 279
+    },
+    {
+      "epoch": 0.011441881372208487,
+      "grad_norm": 18.289936065673828,
+      "learning_rate": 9.242269322629495e-05,
+      "loss": 6.0802,
+      "step": 280
+    },
+    {
+      "epoch": 0.011482745234252089,
+      "grad_norm": 17.574129104614258,
+      "learning_rate": 9.236893433187777e-05,
+      "loss": 7.0344,
+      "step": 281
+    },
+    {
+      "epoch": 0.01152360909629569,
+      "grad_norm": 16.07065773010254,
+      "learning_rate": 9.231500115876461e-05,
+      "loss": 6.1134,
+      "step": 282
+    },
+    {
+      "epoch": 0.011564472958339293,
+      "grad_norm": 23.54433822631836,
+      "learning_rate": 9.226089392880206e-05,
+      "loss": 6.2945,
+      "step": 283
+    },
+    {
+      "epoch": 0.011605336820382894,
+      "grad_norm": 15.165502548217773,
+      "learning_rate": 9.220661286455264e-05,
+      "loss": 5.9368,
+      "step": 284
+    },
+    {
+      "epoch": 0.011646200682426496,
+      "grad_norm": 17.532302856445312,
+      "learning_rate": 9.215215818929392e-05,
+      "loss": 7.3923,
+      "step": 285
+    },
+    {
+      "epoch": 0.011687064544470098,
+      "grad_norm": 14.813886642456055,
+      "learning_rate": 9.209753012701764e-05,
+      "loss": 5.4034,
+      "step": 286
+    },
+    {
+      "epoch": 0.0117279284065137,
+      "grad_norm": 14.954307556152344,
+      "learning_rate": 9.204272890242866e-05,
+      "loss": 7.0381,
+      "step": 287
+    },
+    {
+      "epoch": 0.011768792268557302,
+      "grad_norm": 15.881797790527344,
+      "learning_rate": 9.19877547409442e-05,
+      "loss": 6.0234,
+      "step": 288
+    },
+    {
+      "epoch": 0.011809656130600903,
+      "grad_norm": 16.021188735961914,
+      "learning_rate": 9.193260786869281e-05,
+      "loss": 6.2578,
+      "step": 289
+    },
+    {
+      "epoch": 0.011850519992644505,
+      "grad_norm": 17.150177001953125,
+      "learning_rate": 9.18772885125134e-05,
+      "loss": 5.8906,
+      "step": 290
+    },
+    {
+      "epoch": 0.011891383854688107,
+      "grad_norm": 15.94199275970459,
+      "learning_rate": 9.182179689995447e-05,
+      "loss": 6.2107,
+      "step": 291
+    },
+    {
+      "epoch": 0.011932247716731709,
+      "grad_norm": 16.82688331604004,
+      "learning_rate": 9.176613325927298e-05,
+      "loss": 6.5315,
+      "step": 292
+    },
+    {
+      "epoch": 0.01197311157877531,
+      "grad_norm": 27.945167541503906,
+      "learning_rate": 9.171029781943357e-05,
+      "loss": 6.7607,
+      "step": 293
+    },
+    {
+      "epoch": 0.012013975440818912,
+      "grad_norm": 21.483531951904297,
+      "learning_rate": 9.16542908101075e-05,
+      "loss": 8.4702,
+      "step": 294
+    },
+    {
+      "epoch": 0.012054839302862514,
+      "grad_norm": 17.964195251464844,
+      "learning_rate": 9.159811246167181e-05,
+      "loss": 6.2181,
+      "step": 295
+    },
+    {
+      "epoch": 0.012095703164906116,
+      "grad_norm": 16.953454971313477,
+      "learning_rate": 9.154176300520829e-05,
+      "loss": 6.9425,
+      "step": 296
+    },
+    {
+      "epoch": 0.012136567026949718,
+      "grad_norm": 21.33991050720215,
+      "learning_rate": 9.148524267250256e-05,
+      "loss": 6.8474,
+      "step": 297
+    },
+    {
+      "epoch": 0.01217743088899332,
+      "grad_norm": 22.513446807861328,
+      "learning_rate": 9.142855169604309e-05,
+      "loss": 6.7673,
+      "step": 298
+    },
+    {
+      "epoch": 0.012218294751036921,
+      "grad_norm": 21.369958877563477,
+      "learning_rate": 9.137169030902036e-05,
+      "loss": 7.0022,
+      "step": 299
+    },
+    {
+      "epoch": 0.012259158613080523,
+      "grad_norm": 21.280765533447266,
+      "learning_rate": 9.131465874532568e-05,
+      "loss": 7.1624,
+      "step": 300
+    },
+    {
+      "epoch": 0.012300022475124125,
+      "grad_norm": 12.105696678161621,
+      "learning_rate": 9.125745723955045e-05,
+      "loss": 6.1169,
+      "step": 301
+    },
+    {
+      "epoch": 0.012340886337167725,
+      "grad_norm": 13.19536018371582,
+      "learning_rate": 9.120008602698508e-05,
+      "loss": 6.6732,
+      "step": 302
+    },
+    {
+      "epoch": 0.012381750199211327,
+      "grad_norm": 11.505049705505371,
+      "learning_rate": 9.114254534361803e-05,
+      "loss": 5.8818,
+      "step": 303
+    },
+    {
+      "epoch": 0.012422614061254928,
+      "grad_norm": 12.840928077697754,
+      "learning_rate": 9.108483542613491e-05,
+      "loss": 5.8127,
+      "step": 304
+    },
+    {
+      "epoch": 0.01246347792329853,
+      "grad_norm": 13.485807418823242,
+      "learning_rate": 9.102695651191737e-05,
+      "loss": 5.8549,
+      "step": 305
+    },
+    {
+      "epoch": 0.012504341785342132,
+      "grad_norm": 14.3336763381958,
+      "learning_rate": 9.096890883904223e-05,
+      "loss": 6.4981,
+      "step": 306
+    },
+    {
+      "epoch": 0.012545205647385734,
+      "grad_norm": 13.247648239135742,
+      "learning_rate": 9.091069264628052e-05,
+      "loss": 6.1635,
+      "step": 307
+    },
+    {
+      "epoch": 0.012586069509429336,
+      "grad_norm": 18.788253784179688,
+      "learning_rate": 9.085230817309642e-05,
+      "loss": 6.6331,
+      "step": 308
+    },
+    {
+      "epoch": 0.012626933371472937,
+      "grad_norm": 16.10300064086914,
+      "learning_rate": 9.079375565964629e-05,
+      "loss": 6.9041,
+      "step": 309
+    },
+    {
+      "epoch": 0.01266779723351654,
+      "grad_norm": 13.573219299316406,
+      "learning_rate": 9.073503534677771e-05,
+      "loss": 6.8875,
+      "step": 310
+    },
+    {
+      "epoch": 0.012708661095560141,
+      "grad_norm": 13.871342658996582,
+      "learning_rate": 9.067614747602852e-05,
+      "loss": 6.5321,
+      "step": 311
+    },
+    {
+      "epoch": 0.012749524957603743,
+      "grad_norm": 14.61474323272705,
+      "learning_rate": 9.061709228962576e-05,
+      "loss": 5.843,
+      "step": 312
+    },
+    {
+      "epoch": 0.012790388819647345,
+      "grad_norm": 15.99639892578125,
+      "learning_rate": 9.055787003048466e-05,
+      "loss": 5.9015,
+      "step": 313
+    },
+    {
+      "epoch": 0.012831252681690946,
+      "grad_norm": 17.16404914855957,
+      "learning_rate": 9.049848094220773e-05,
+      "loss": 6.125,
+      "step": 314
+    },
+    {
+      "epoch": 0.012872116543734548,
+      "grad_norm": 21.616313934326172,
+      "learning_rate": 9.043892526908369e-05,
+      "loss": 7.4149,
+      "step": 315
+    },
+    {
+      "epoch": 0.01291298040577815,
+      "grad_norm": 14.225442886352539,
+      "learning_rate": 9.037920325608649e-05,
+      "loss": 6.0499,
+      "step": 316
+    },
+    {
+      "epoch": 0.012953844267821752,
+      "grad_norm": 17.34297752380371,
+      "learning_rate": 9.03193151488743e-05,
+      "loss": 7.3117,
+      "step": 317
+    },
+    {
+      "epoch": 0.012994708129865353,
+      "grad_norm": 14.32167911529541,
+      "learning_rate": 9.025926119378847e-05,
+      "loss": 6.0548,
+      "step": 318
+    },
+    {
+      "epoch": 0.013035571991908955,
+      "grad_norm": 15.502433776855469,
+      "learning_rate": 9.019904163785257e-05,
+      "loss": 5.9465,
+      "step": 319
+    },
+    {
+      "epoch": 0.013076435853952557,
+      "grad_norm": 16.106304168701172,
+      "learning_rate": 9.013865672877134e-05,
+      "loss": 6.2242,
+      "step": 320
+    },
+    {
+      "epoch": 0.013117299715996159,
+      "grad_norm": 15.240400314331055,
+      "learning_rate": 9.007810671492966e-05,
+      "loss": 7.162,
+      "step": 321
+    },
+    {
+      "epoch": 0.01315816357803976,
+      "grad_norm": 14.79165267944336,
+      "learning_rate": 9.001739184539156e-05,
+      "loss": 6.4293,
+      "step": 322
+    },
+    {
+      "epoch": 0.013199027440083362,
+      "grad_norm": 15.2999849319458,
+      "learning_rate": 8.99565123698992e-05,
+      "loss": 6.9258,
+      "step": 323
+    },
+    {
+      "epoch": 0.013239891302126964,
+      "grad_norm": 13.13878059387207,
+      "learning_rate": 8.989546853887177e-05,
+      "loss": 6.8198,
+      "step": 324
+    },
+    {
+      "epoch": 0.013280755164170566,
+      "grad_norm": 16.574934005737305,
+      "learning_rate": 8.983426060340459e-05,
+      "loss": 7.6539,
+      "step": 325
+    },
+    {
+      "epoch": 0.013321619026214168,
+      "grad_norm": 13.789299964904785,
+      "learning_rate": 8.977288881526791e-05,
+      "loss": 5.8356,
+      "step": 326
+    },
+    {
+      "epoch": 0.01336248288825777,
+      "grad_norm": 13.275775909423828,
+      "learning_rate": 8.971135342690604e-05,
+      "loss": 5.7641,
+      "step": 327
+    },
+    {
+      "epoch": 0.013403346750301371,
+      "grad_norm": 15.172074317932129,
+      "learning_rate": 8.964965469143618e-05,
+      "loss": 7.4037,
+      "step": 328
+    },
+    {
+      "epoch": 0.013444210612344973,
+      "grad_norm": 14.117846488952637,
+      "learning_rate": 8.95877928626475e-05,
+      "loss": 6.2307,
+      "step": 329
+    },
+    {
+      "epoch": 0.013485074474388575,
+      "grad_norm": 15.833907127380371,
+      "learning_rate": 8.952576819499998e-05,
+      "loss": 6.7644,
+      "step": 330
+    },
+    {
+      "epoch": 0.013525938336432177,
+      "grad_norm": 36.139404296875,
+      "learning_rate": 8.946358094362344e-05,
+      "loss": 6.5228,
+      "step": 331
+    },
+    {
+      "epoch": 0.013566802198475779,
+      "grad_norm": 13.170965194702148,
+      "learning_rate": 8.940123136431645e-05,
+      "loss": 5.3543,
+      "step": 332
+    },
+    {
+      "epoch": 0.01360766606051938,
+      "grad_norm": 15.941726684570312,
+      "learning_rate": 8.933871971354529e-05,
+      "loss": 7.6215,
+      "step": 333
+    },
+    {
+      "epoch": 0.013648529922562982,
+      "grad_norm": 15.199174880981445,
+      "learning_rate": 8.927604624844292e-05,
+      "loss": 5.7218,
+      "step": 334
+    },
+    {
+      "epoch": 0.013689393784606584,
+      "grad_norm": 13.402868270874023,
+      "learning_rate": 8.921321122680788e-05,
+      "loss": 5.9142,
+      "step": 335
+    },
+    {
+      "epoch": 0.013730257646650186,
+      "grad_norm": 17.45403480529785,
+      "learning_rate": 8.915021490710326e-05,
+      "loss": 6.2388,
+      "step": 336
+    },
+    {
+      "epoch": 0.013771121508693787,
+      "grad_norm": 13.131780624389648,
+      "learning_rate": 8.908705754845563e-05,
+      "loss": 5.7358,
+      "step": 337
+    },
+    {
+      "epoch": 0.013811985370737388,
+      "grad_norm": 21.07666778564453,
+      "learning_rate": 8.902373941065397e-05,
+      "loss": 6.0878,
+      "step": 338
+    },
+    {
+      "epoch": 0.01385284923278099,
+      "grad_norm": 16.573720932006836,
+      "learning_rate": 8.896026075414858e-05,
+      "loss": 6.4805,
+      "step": 339
+    },
+    {
+      "epoch": 0.013893713094824591,
+      "grad_norm": 17.990293502807617,
+      "learning_rate": 8.889662184005007e-05,
+      "loss": 6.0373,
+      "step": 340
+    },
+    {
+      "epoch": 0.013934576956868193,
+      "grad_norm": 16.1629581451416,
+      "learning_rate": 8.883282293012824e-05,
+      "loss": 6.3638,
+      "step": 341
+    },
+    {
+      "epoch": 0.013975440818911795,
+      "grad_norm": 15.962685585021973,
+      "learning_rate": 8.876886428681097e-05,
+      "loss": 5.6468,
+      "step": 342
+    },
+    {
+      "epoch": 0.014016304680955396,
+      "grad_norm": 15.806046485900879,
+      "learning_rate": 8.870474617318323e-05,
+      "loss": 5.9261,
+      "step": 343
+    },
+    {
+      "epoch": 0.014057168542998998,
+      "grad_norm": 16.13351058959961,
+      "learning_rate": 8.864046885298591e-05,
+      "loss": 5.4726,
+      "step": 344
+    },
+    {
+      "epoch": 0.0140980324050426,
+      "grad_norm": 19.852676391601562,
+      "learning_rate": 8.85760325906148e-05,
+      "loss": 6.4975,
+      "step": 345
+    },
+    {
+      "epoch": 0.014138896267086202,
+      "grad_norm": 20.317991256713867,
+      "learning_rate": 8.85114376511195e-05,
+      "loss": 7.201,
+      "step": 346
+    },
+    {
+      "epoch": 0.014179760129129804,
+      "grad_norm": 28.479278564453125,
+      "learning_rate": 8.844668430020222e-05,
+      "loss": 8.2285,
+      "step": 347
+    },
+    {
+      "epoch": 0.014220623991173405,
+      "grad_norm": 19.233919143676758,
+      "learning_rate": 8.838177280421684e-05,
+      "loss": 7.8013,
+      "step": 348
+    },
+    {
+      "epoch": 0.014261487853217007,
+      "grad_norm": 21.015350341796875,
+      "learning_rate": 8.831670343016778e-05,
+      "loss": 6.5326,
+      "step": 349
+    },
+    {
+      "epoch": 0.014302351715260609,
+      "grad_norm": 31.803945541381836,
+      "learning_rate": 8.825147644570879e-05,
+      "loss": 7.9139,
+      "step": 350
+    },
+    {
+      "epoch": 0.01434321557730421,
+      "grad_norm": 11.69941520690918,
+      "learning_rate": 8.818609211914197e-05,
+      "loss": 6.2581,
+      "step": 351
+    },
+    {
+      "epoch": 0.014384079439347813,
+      "grad_norm": 11.810105323791504,
+      "learning_rate": 8.812055071941663e-05,
+      "loss": 6.338,
+      "step": 352
+    },
+    {
+      "epoch": 0.014424943301391414,
+      "grad_norm": 13.202411651611328,
+      "learning_rate": 8.805485251612813e-05,
+      "loss": 6.4457,
+      "step": 353
+    },
+    {
+      "epoch": 0.014465807163435016,
+      "grad_norm": 15.12958812713623,
+      "learning_rate": 8.79889977795169e-05,
+      "loss": 6.5046,
+      "step": 354
+    },
+    {
+      "epoch": 0.014506671025478618,
+      "grad_norm": 11.190995216369629,
+      "learning_rate": 8.79229867804672e-05,
+      "loss": 5.6477,
+      "step": 355
+    },
+    {
+      "epoch": 0.01454753488752222,
+      "grad_norm": 12.42231273651123,
+      "learning_rate": 8.785681979050602e-05,
+      "loss": 5.931,
+      "step": 356
+    },
+    {
+      "epoch": 0.014588398749565822,
+      "grad_norm": 13.565031051635742,
+      "learning_rate": 8.779049708180207e-05,
+      "loss": 6.558,
+      "step": 357
+    },
+    {
+      "epoch": 0.014629262611609423,
+      "grad_norm": 12.804165840148926,
+      "learning_rate": 8.772401892716455e-05,
+      "loss": 6.2798,
+      "step": 358
+    },
+    {
+      "epoch": 0.014670126473653025,
+      "grad_norm": 15.116120338439941,
+      "learning_rate": 8.765738560004207e-05,
+      "loss": 7.1642,
+      "step": 359
+    },
+    {
+      "epoch": 0.014710990335696627,
+      "grad_norm": 10.878889083862305,
+      "learning_rate": 8.759059737452149e-05,
+      "loss": 5.3341,
+      "step": 360
+    },
+    {
+      "epoch": 0.014751854197740229,
+      "grad_norm": 13.265491485595703,
+      "learning_rate": 8.752365452532689e-05,
+      "loss": 5.5701,
+      "step": 361
+    },
+    {
+      "epoch": 0.01479271805978383,
+      "grad_norm": 13.64254093170166,
+      "learning_rate": 8.74565573278183e-05,
+      "loss": 6.1891,
+      "step": 362
+    },
+    {
+      "epoch": 0.014833581921827432,
+      "grad_norm": 11.30701732635498,
+      "learning_rate": 8.738930605799069e-05,
+      "loss": 6.0558,
+      "step": 363
+    },
+    {
+      "epoch": 0.014874445783871034,
+      "grad_norm": 12.535035133361816,
+      "learning_rate": 8.732190099247277e-05,
+      "loss": 6.8535,
+      "step": 364
+    },
+    {
+      "epoch": 0.014915309645914636,
+      "grad_norm": 12.90744686126709,
+      "learning_rate": 8.725434240852585e-05,
+      "loss": 5.9282,
+      "step": 365
+    },
+    {
+      "epoch": 0.014956173507958238,
+      "grad_norm": 13.355973243713379,
+      "learning_rate": 8.718663058404277e-05,
+      "loss": 6.1722,
+      "step": 366
+    },
+    {
+      "epoch": 0.01499703737000184,
+      "grad_norm": 15.44559097290039,
+      "learning_rate": 8.711876579754663e-05,
+      "loss": 7.0216,
+      "step": 367
+    },
+    {
+      "epoch": 0.015037901232045441,
+      "grad_norm": 11.964495658874512,
+      "learning_rate": 8.705074832818977e-05,
+      "loss": 5.5576,
+      "step": 368
+    },
+    {
+      "epoch": 0.015078765094089043,
+      "grad_norm": 15.331159591674805,
+      "learning_rate": 8.698257845575255e-05,
+      "loss": 6.7291,
+      "step": 369
+    },
+    {
+      "epoch": 0.015119628956132645,
+      "grad_norm": 17.661767959594727,
+      "learning_rate": 8.691425646064222e-05,
+      "loss": 7.2161,
+      "step": 370
+    },
+    {
+      "epoch": 0.015160492818176247,
+      "grad_norm": 13.057313919067383,
+      "learning_rate": 8.684578262389179e-05,
+      "loss": 6.1985,
+      "step": 371
+    },
+    {
+      "epoch": 0.015201356680219848,
+      "grad_norm": 11.956735610961914,
+      "learning_rate": 8.677715722715878e-05,
+      "loss": 5.1447,
+      "step": 372
+    },
+    {
+      "epoch": 0.015242220542263448,
+      "grad_norm": 12.75511360168457,
+      "learning_rate": 8.670838055272422e-05,
+      "loss": 5.63,
+      "step": 373
+    },
+    {
+      "epoch": 0.01528308440430705,
+      "grad_norm": 11.913833618164062,
+      "learning_rate": 8.663945288349134e-05,
+      "loss": 5.7201,
+      "step": 374
+    },
+    {
+      "epoch": 0.015323948266350652,
+      "grad_norm": 12.547759056091309,
+      "learning_rate": 8.657037450298448e-05,
+      "loss": 6.0541,
+      "step": 375
+    },
+    {
+      "epoch": 0.015364812128394254,
+      "grad_norm": 14.531859397888184,
+      "learning_rate": 8.650114569534795e-05,
+      "loss": 6.176,
+      "step": 376
+    },
+    {
+      "epoch": 0.015405675990437856,
+      "grad_norm": 12.826741218566895,
+      "learning_rate": 8.643176674534475e-05,
+      "loss": 6.4219,
+      "step": 377
+    },
+    {
+      "epoch": 0.015446539852481457,
+      "grad_norm": 15.265447616577148,
+      "learning_rate": 8.63622379383555e-05,
+      "loss": 7.6793,
+      "step": 378
+    },
+    {
+      "epoch": 0.01548740371452506,
+      "grad_norm": 13.909723281860352,
+      "learning_rate": 8.629255956037725e-05,
+      "loss": 6.5717,
+      "step": 379
+    },
+    {
+      "epoch": 0.015528267576568661,
+      "grad_norm": 14.383438110351562,
+      "learning_rate": 8.62227318980223e-05,
+      "loss": 5.8065,
+      "step": 380
+    },
+    {
+      "epoch": 0.015569131438612263,
+      "grad_norm": 20.557435989379883,
+      "learning_rate": 8.615275523851696e-05,
+      "loss": 6.6797,
+      "step": 381
+    },
+    {
+      "epoch": 0.015609995300655865,
+      "grad_norm": 16.86141586303711,
+      "learning_rate": 8.608262986970046e-05,
+      "loss": 6.3276,
+      "step": 382
+    },
+    {
+      "epoch": 0.015650859162699468,
+      "grad_norm": 14.10315990447998,
+      "learning_rate": 8.601235608002372e-05,
+      "loss": 6.3538,
+      "step": 383
+    },
+    {
+      "epoch": 0.01569172302474307,
+      "grad_norm": 13.469561576843262,
+      "learning_rate": 8.594193415854816e-05,
+      "loss": 5.5875,
+      "step": 384
+    },
+    {
+      "epoch": 0.01573258688678667,
+      "grad_norm": 12.106831550598145,
+      "learning_rate": 8.58713643949445e-05,
+      "loss": 5.4271,
+      "step": 385
+    },
+    {
+      "epoch": 0.015773450748830273,
+      "grad_norm": 15.01386833190918,
+      "learning_rate": 8.580064707949164e-05,
+      "loss": 5.6764,
+      "step": 386
+    },
+    {
+      "epoch": 0.015814314610873875,
+      "grad_norm": 15.852811813354492,
+      "learning_rate": 8.572978250307538e-05,
+      "loss": 6.5961,
+      "step": 387
+    },
+    {
+      "epoch": 0.015855178472917477,
+      "grad_norm": 11.857748031616211,
+      "learning_rate": 8.565877095718724e-05,
+      "loss": 4.2244,
+      "step": 388
+    },
+    {
+      "epoch": 0.01589604233496108,
+      "grad_norm": 19.0756893157959,
+      "learning_rate": 8.558761273392332e-05,
+      "loss": 7.479,
+      "step": 389
+    }
+  ],
+  "logging_steps": 1,
+  "max_steps": 1554,
+  "num_input_tokens_seen": 0,
+  "num_train_epochs": 1,
+  "save_steps": 389,
+  "stateful_callbacks": {
+    "TrainerControl": {
+      "args": {
+        "should_epoch_stop": false,
+        "should_evaluate": false,
+        "should_log": false,
+        "should_save": true,
+        "should_training_stop": false
+      },
+      "attributes": {}
+    }
+  },
+  "total_flos": 2.7283571213756006e+17,
+  "train_batch_size": 4,
+  "trial_name": null,
+  "trial_params": null
+}

last-checkpoint/training_args.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:a9219a648712915e4e5b45eeefd7434b8aaf0ec95b985b3db3b0860e1a37f919
+size 6776