Training in progress, step 500

Browse files

Files changed (13) hide show

README.md +202 -0
adapter_config.json +29 -0
adapter_model.safetensors +3 -0
all_results.json +13 -0
eval_results.json +8 -0
merges.txt +0 -0
special_tokens_map.json +6 -0
tokenizer.json +0 -0
tokenizer_config.json +20 -0
train_results.json +8 -0
trainer_state.json +428 -0
training_args.bin +3 -0
vocab.json +0 -0

README.md ADDED Viewed

	@@ -0,0 +1,202 @@

+---
+base_model: openai-community/gpt2-xl
+library_name: peft
+---
+# Model Card for Model ID
+<!-- Provide a quick summary of what the model is/does. -->
+## Model Details
+### Model Description
+<!-- Provide a longer summary of what this model is. -->
+- **Developed by:** [More Information Needed]
+- **Funded by [optional]:** [More Information Needed]
+- **Shared by [optional]:** [More Information Needed]
+- **Model type:** [More Information Needed]
+- **Language(s) (NLP):** [More Information Needed]
+- **License:** [More Information Needed]
+- **Finetuned from model [optional]:** [More Information Needed]
+### Model Sources [optional]
+<!-- Provide the basic links for the model. -->
+- **Repository:** [More Information Needed]
+- **Paper [optional]:** [More Information Needed]
+- **Demo [optional]:** [More Information Needed]
+## Uses
+<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
+### Direct Use
+<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
+[More Information Needed]
+### Downstream Use [optional]
+<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
+[More Information Needed]
+### Out-of-Scope Use
+<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
+[More Information Needed]
+## Bias, Risks, and Limitations
+<!-- This section is meant to convey both technical and sociotechnical limitations. -->
+[More Information Needed]
+### Recommendations
+<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
+Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
+## How to Get Started with the Model
+Use the code below to get started with the model.
+[More Information Needed]
+## Training Details
+### Training Data
+<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+[More Information Needed]
+### Training Procedure
+<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+#### Preprocessing [optional]
+[More Information Needed]
+#### Training Hyperparameters
+- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
+#### Speeds, Sizes, Times [optional]
+<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
+[More Information Needed]
+## Evaluation
+<!-- This section describes the evaluation protocols and provides the results. -->
+### Testing Data, Factors & Metrics
+#### Testing Data
+<!-- This should link to a Dataset Card if possible. -->
+[More Information Needed]
+#### Factors
+<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
+[More Information Needed]
+#### Metrics
+<!-- These are the evaluation metrics being used, ideally with a description of why. -->
+[More Information Needed]
+### Results
+[More Information Needed]
+#### Summary
+## Model Examination [optional]
+<!-- Relevant interpretability work for the model goes here -->
+[More Information Needed]
+## Environmental Impact
+<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
+Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
+- **Hardware Type:** [More Information Needed]
+- **Hours used:** [More Information Needed]
+- **Cloud Provider:** [More Information Needed]
+- **Compute Region:** [More Information Needed]
+- **Carbon Emitted:** [More Information Needed]
+## Technical Specifications [optional]
+### Model Architecture and Objective
+[More Information Needed]
+### Compute Infrastructure
+[More Information Needed]
+#### Hardware
+[More Information Needed]
+#### Software
+[More Information Needed]
+## Citation [optional]
+<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
+**BibTeX:**
+[More Information Needed]
+**APA:**
+[More Information Needed]
+## Glossary [optional]
+<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
+[More Information Needed]
+## More Information [optional]
+[More Information Needed]
+## Model Card Authors [optional]
+[More Information Needed]
+## Model Card Contact
+[More Information Needed]
+### Framework versions
+- PEFT 0.13.0

adapter_config.json ADDED Viewed

	@@ -0,0 +1,29 @@

+{
+ "alpha_pattern": {},
+ "auto_mapping": null,
+ "base_model_name_or_path": "openai-community/gpt2-xl",
+ "bias": "none",
+ "fan_in_fan_out": true,
+ "inference_mode": true,
+ "init_lora_weights": true,
+ "layer_replication": null,
+ "layers_pattern": null,
+ "layers_to_transform": null,
+ "loftq_config": {},
+ "lora_alpha": 16,
+ "lora_dropout": 0.1,
+ "megatron_config": null,
+ "megatron_core": "megatron.core",
+ "modules_to_save": null,
+ "peft_type": "LORA",
+ "r": 8,
+ "rank_pattern": {},
+ "revision": null,
+ "target_modules": [
+ "c_proj",
+ "c_attn"
+ ],
+ "task_type": "CAUSAL_LM",
+ "use_dora": false,
+ "use_rslora": false
+}

adapter_model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:d2d7a002444aeda64444cae48fae98b31cce7cea7f8eae898b5e8f79bdd199ef
+size 27070688

all_results.json ADDED Viewed

	@@ -0,0 +1,13 @@

+{
+ "epoch": 1.0,
+ "eval_loss": 1.9224168062210083,
+ "eval_runtime": 63.692,
+ "eval_samples_per_second": 15.591,
+ "eval_steps_per_second": 1.963,
+ "perplexity": 6.837463340656789,
+ "total_flos": 1.0003539689472e+17,
+ "train_loss": 2.061040549059463,
+ "train_runtime": 1615.6801,
+ "train_samples_per_second": 6.798,
+ "train_steps_per_second": 3.399
+}

eval_results.json ADDED Viewed

	@@ -0,0 +1,8 @@

+{
+ "epoch": 1.0,
+ "eval_loss": 1.9224168062210083,
+ "eval_runtime": 63.692,
+ "eval_samples_per_second": 15.591,
+ "eval_steps_per_second": 1.963,
+ "perplexity": 6.837463340656789
+}

merges.txt ADDED Viewed

The diff for this file is too large to render. See raw diff

special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,6 @@

+{
+ "bos_token": "<|endoftext|>",
+ "eos_token": "<|endoftext|>",
+ "pad_token": "<|endoftext|>",
+ "unk_token": "<|endoftext|>"
+}

tokenizer.json ADDED Viewed

The diff for this file is too large to render. See raw diff

tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,20 @@

+{
+ "add_prefix_space": false,
+ "added_tokens_decoder": {
+ "50256": {
+ "content": "<|endoftext|>",
+ "lstrip": false,
+ "normalized": true,
+ "rstrip": false,
+ "single_word": false,
+ "special": true
+ }
+ },
+ "bos_token": "<|endoftext|>",
+ "clean_up_tokenization_spaces": false,
+ "eos_token": "<|endoftext|>",
+ "model_max_length": 1024,
+ "pad_token": "<|endoftext|>",
+ "tokenizer_class": "GPT2Tokenizer",
+ "unk_token": "<|endoftext|>"
+}

train_results.json ADDED Viewed

	@@ -0,0 +1,8 @@

+{
+ "epoch": 1.0,
+ "total_flos": 1.0003539689472e+17,
+ "train_loss": 2.061040549059463,
+ "train_runtime": 1615.6801,
+ "train_samples_per_second": 6.798,
+ "train_steps_per_second": 3.399
+}

trainer_state.json ADDED Viewed

	@@ -0,0 +1,428 @@

+{
+ "best_metric": null,
+ "best_model_checkpoint": null,
+ "epoch": 1.0,
+ "eval_steps": 500,
+ "global_step": 5492,
+ "is_hyper_param_search": false,
+ "is_local_process_zero": true,
+ "is_world_process_zero": true,
+ "log_history": [
+ {
+ "epoch": 0.01820830298616169,
+ "grad_norm": 0.22019147872924805,
+ "learning_rate": 3.642987249544627e-06,
+ "loss": 2.4398,
+ "step": 100
+ },
+ {
+ "epoch": 0.03641660597232338,
+ "grad_norm": 0.1777569204568863,
+ "learning_rate": 7.285974499089254e-06,
+ "loss": 2.4058,
+ "step": 200
+ },
+ {
+ "epoch": 0.05462490895848507,
+ "grad_norm": 0.2692304253578186,
+ "learning_rate": 1.0928961748633882e-05,
+ "loss": 2.3914,
+ "step": 300
+ },
+ {
+ "epoch": 0.07283321194464676,
+ "grad_norm": 0.36335742473602295,
+ "learning_rate": 1.4571948998178507e-05,
+ "loss": 2.3211,
+ "step": 400
+ },
+ {
+ "epoch": 0.09104151493080845,
+ "grad_norm": 0.5098339319229126,
+ "learning_rate": 1.8214936247723133e-05,
+ "loss": 2.2797,
+ "step": 500
+ },
+ {
+ "epoch": 0.10924981791697014,
+ "grad_norm": 0.5050227642059326,
+ "learning_rate": 1.999474720010985e-05,
+ "loss": 2.2523,
+ "step": 600
+ },
+ {
+ "epoch": 0.12745812090313183,
+ "grad_norm": 0.6238083839416504,
+ "learning_rate": 1.9953983978532914e-05,
+ "loss": 2.2214,
+ "step": 700
+ },
+ {
+ "epoch": 0.14566642388929352,
+ "grad_norm": 0.6451362371444702,
+ "learning_rate": 1.987302601308333e-05,
+ "loss": 2.1591,
+ "step": 800
+ },
+ {
+ "epoch": 0.1638747268754552,
+ "grad_norm": 0.6711136102676392,
+ "learning_rate": 1.9752200216552278e-05,
+ "loss": 2.1624,
+ "step": 900
+ },
+ {
+ "epoch": 0.1820830298616169,
+ "grad_norm": 0.7889479994773865,
+ "learning_rate": 1.9591994490261997e-05,
+ "loss": 2.0842,
+ "step": 1000
+ },
+ {
+ "epoch": 0.20029133284777859,
+ "grad_norm": 0.8510493040084839,
+ "learning_rate": 1.9393055753893e-05,
+ "loss": 2.1171,
+ "step": 1100
+ },
+ {
+ "epoch": 0.21849963583394028,
+ "grad_norm": 0.741717517375946,
+ "learning_rate": 1.915618733318621e-05,
+ "loss": 2.1071,
+ "step": 1200
+ },
+ {
+ "epoch": 0.23670793882010197,
+ "grad_norm": 0.7924832701683044,
+ "learning_rate": 1.8882345716068708e-05,
+ "loss": 2.0552,
+ "step": 1300
+ },
+ {
+ "epoch": 0.25491624180626365,
+ "grad_norm": 0.8211630582809448,
+ "learning_rate": 1.8572636690301997e-05,
+ "loss": 2.0649,
+ "step": 1400
+ },
+ {
+ "epoch": 0.27312454479242537,
+ "grad_norm": 0.7808334231376648,
+ "learning_rate": 1.8228310878249212e-05,
+ "loss": 2.0604,
+ "step": 1500
+ },
+ {
+ "epoch": 0.29133284777858703,
+ "grad_norm": 1.0906625986099243,
+ "learning_rate": 1.7850758686792054e-05,
+ "loss": 2.08,
+ "step": 1600
+ },
+ {
+ "epoch": 0.30954115076474875,
+ "grad_norm": 0.7606624960899353,
+ "learning_rate": 1.7441504692790104e-05,
+ "loss": 2.0447,
+ "step": 1700
+ },
+ {
+ "epoch": 0.3277494537509104,
+ "grad_norm": 0.8482190370559692,
+ "learning_rate": 1.700220148675417e-05,
+ "loss": 2.0584,
+ "step": 1800
+ },
+ {
+ "epoch": 0.34595775673707213,
+ "grad_norm": 0.8170859813690186,
+ "learning_rate": 1.6534622999593437e-05,
+ "loss": 2.0788,
+ "step": 1900
+ },
+ {
+ "epoch": 0.3641660597232338,
+ "grad_norm": 0.9151561260223389,
+ "learning_rate": 1.6040657339383255e-05,
+ "loss": 2.0458,
+ "step": 2000
+ },
+ {
+ "epoch": 0.3823743627093955,
+ "grad_norm": 0.7852152585983276,
+ "learning_rate": 1.5522299167079173e-05,
+ "loss": 2.0271,
+ "step": 2100
+ },
+ {
+ "epoch": 0.40058266569555717,
+ "grad_norm": 1.1423839330673218,
+ "learning_rate": 1.4981641641964437e-05,
+ "loss": 2.0153,
+ "step": 2200
+ },
+ {
+ "epoch": 0.4187909686817189,
+ "grad_norm": 0.9260895252227783,
+ "learning_rate": 1.44208679693558e-05,
+ "loss": 2.038,
+ "step": 2300
+ },
+ {
+ "epoch": 0.43699927166788055,
+ "grad_norm": 0.8138246536254883,
+ "learning_rate": 1.384224258469838e-05,
+ "loss": 2.0325,
+ "step": 2400
+ },
+ {
+ "epoch": 0.45520757465404227,
+ "grad_norm": 0.8293213844299316,
+ "learning_rate": 1.3248102009648686e-05,
+ "loss": 1.9852,
+ "step": 2500
+ },
+ {
+ "epoch": 0.47341587764020393,
+ "grad_norm": 1.0487293004989624,
+ "learning_rate": 1.2640845417069571e-05,
+ "loss": 2.0304,
+ "step": 2600
+ },
+ {
+ "epoch": 0.49162418062636565,
+ "grad_norm": 1.4424030780792236,
+ "learning_rate": 1.2022924943036024e-05,
+ "loss": 2.0349,
+ "step": 2700
+ },
+ {
+ "epoch": 0.5098324836125273,
+ "grad_norm": 0.930513322353363,
+ "learning_rate": 1.139683578497262e-05,
+ "loss": 2.0298,
+ "step": 2800
+ },
+ {
+ "epoch": 0.528040786598689,
+ "grad_norm": 1.3939330577850342,
+ "learning_rate": 1.0765106125906782e-05,
+ "loss": 2.0071,
+ "step": 2900
+ },
+ {
+ "epoch": 0.5462490895848507,
+ "grad_norm": 1.0249812602996826,
+ "learning_rate": 1.0130286925524367e-05,
+ "loss": 1.9692,
+ "step": 3000
+ },
+ {
+ "epoch": 0.5644573925710124,
+ "grad_norm": 0.9683905243873596,
+ "learning_rate": 9.494941619251817e-06,
+ "loss": 2.0243,
+ "step": 3100
+ },
+ {
+ "epoch": 0.5826656955571741,
+ "grad_norm": 1.1064997911453247,
+ "learning_rate": 8.861635766960579e-06,
+ "loss": 1.9983,
+ "step": 3200
+ },
+ {
+ "epoch": 0.6008739985433358,
+ "grad_norm": 0.9611035585403442,
+ "learning_rate": 8.232926693092881e-06,
+ "loss": 1.9898,
+ "step": 3300
+ },
+ {
+ "epoch": 0.6190823015294975,
+ "grad_norm": 1.2558187246322632,
+ "learning_rate": 7.611353160042658e-06,
+ "loss": 1.9698,
+ "step": 3400
+ },
+ {
+ "epoch": 0.6372906045156591,
+ "grad_norm": 1.1066709756851196,
+ "learning_rate": 6.99942511649105e-06,
+ "loss": 2.0468,
+ "step": 3500
+ },
+ {
+ "epoch": 0.6554989075018208,
+ "grad_norm": 1.0138076543807983,
+ "learning_rate": 6.399613562093272e-06,
+ "loss": 2.0535,
+ "step": 3600
+ },
+ {
+ "epoch": 0.6737072104879825,
+ "grad_norm": 1.125917673110962,
+ "learning_rate": 5.814340569443867e-06,
+ "loss": 2.009,
+ "step": 3700
+ },
+ {
+ "epoch": 0.6919155134741443,
+ "grad_norm": 0.7838294506072998,
+ "learning_rate": 5.245969503612125e-06,
+ "loss": 1.9229,
+ "step": 3800
+ },
+ {
+ "epoch": 0.7101238164603059,
+ "grad_norm": 1.051643967628479,
+ "learning_rate": 4.696795478741786e-06,
+ "loss": 1.9857,
+ "step": 3900
+ },
+ {
+ "epoch": 0.7283321194464676,
+ "grad_norm": 0.9563839435577393,
+ "learning_rate": 4.169036090251809e-06,
+ "loss": 2.0503,
+ "step": 4000
+ },
+ {
+ "epoch": 0.7465404224326293,
+ "grad_norm": 0.8592619299888611,
+ "learning_rate": 3.6648224600620653e-06,
+ "loss": 2.0066,
+ "step": 4100
+ },
+ {
+ "epoch": 0.764748725418791,
+ "grad_norm": 0.9203991293907166,
+ "learning_rate": 3.1861906310038825e-06,
+ "loss": 1.9719,
+ "step": 4200
+ },
+ {
+ "epoch": 0.7829570284049526,
+ "grad_norm": 0.9755929112434387,
+ "learning_rate": 2.735073345165228e-06,
+ "loss": 1.9785,
+ "step": 4300
+ },
+ {
+ "epoch": 0.8011653313911143,
+ "grad_norm": 1.030912160873413,
+ "learning_rate": 2.313292239370102e-06,
+ "loss": 2.0122,
+ "step": 4400
+ },
+ {
+ "epoch": 0.8193736343772761,
+ "grad_norm": 1.2122974395751953,
+ "learning_rate": 1.9225504893071823e-06,
+ "loss": 1.9747,
+ "step": 4500
+ },
+ {
+ "epoch": 0.8375819373634378,
+ "grad_norm": 1.0663396120071411,
+ "learning_rate": 1.5644259320111733e-06,
+ "loss": 1.9379,
+ "step": 4600
+ },
+ {
+ "epoch": 0.8557902403495994,
+ "grad_norm": 0.9615539908409119,
+ "learning_rate": 1.2403646944686198e-06,
+ "loss": 1.9893,
+ "step": 4700
+ },
+ {
+ "epoch": 0.8739985433357611,
+ "grad_norm": 1.184589147567749,
+ "learning_rate": 9.516753540762868e-07,
+ "loss": 1.9812,
+ "step": 4800
+ },
+ {
+ "epoch": 0.8922068463219228,
+ "grad_norm": 1.2580232620239258,
+ "learning_rate": 6.995236545324624e-07,
+ "loss": 1.948,
+ "step": 4900
+ },
+ {
+ "epoch": 0.9104151493080845,
+ "grad_norm": 0.958463728427887,
+ "learning_rate": 4.849277984987221e-07,
+ "loss": 1.9752,
+ "step": 5000
+ },
+ {
+ "epoch": 0.9286234522942461,
+ "grad_norm": 1.0817508697509766,
+ "learning_rate": 3.0875433604064976e-07,
+ "loss": 2.0005,
+ "step": 5100
+ },
+ {
+ "epoch": 0.9468317552804079,
+ "grad_norm": 1.1332553625106812,
+ "learning_rate": 1.7171466545021665e-07,
+ "loss": 1.9255,
+ "step": 5200
+ },
+ {
+ "epoch": 0.9650400582665696,
+ "grad_norm": 0.9218988418579102,
+ "learning_rate": 7.436216057970735e-08,
+ "loss": 2.0052,
+ "step": 5300
+ },
+ {
+ "epoch": 0.9832483612527313,
+ "grad_norm": 1.2976170778274536,
+ "learning_rate": 1.708993628716016e-08,
+ "loss": 2.0214,
+ "step": 5400
+ },
+ {
+ "epoch": 1.0,
+ "eval_loss": 1.9224168062210083,
+ "eval_runtime": 63.6773,
+ "eval_samples_per_second": 15.594,
+ "eval_steps_per_second": 1.963,
+ "step": 5492
+ },
+ {
+ "epoch": 1.0,
+ "step": 5492,
+ "total_flos": 1.0003539689472e+17,
+ "train_loss": 2.061040549059463,
+ "train_runtime": 1615.6801,
+ "train_samples_per_second": 6.798,
+ "train_steps_per_second": 3.399
+ }
+ ],
+ "logging_steps": 100,
+ "max_steps": 5492,
+ "num_input_tokens_seen": 0,
+ "num_train_epochs": 1,
+ "save_steps": 500,
+ "stateful_callbacks": {
+ "TrainerControl": {
+ "args": {
+ "should_epoch_stop": false,
+ "should_evaluate": false,
+ "should_log": false,
+ "should_save": true,
+ "should_training_stop": true
+ },
+ "attributes": {}
+ }
+ },
+ "total_flos": 1.0003539689472e+17,
+ "train_batch_size": 2,
+ "trial_name": null,
+ "trial_params": null
+}

training_args.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:562b76a7ab07d8605ae9ca4d2dd69562d1da363f04720fe69c7f093b76ce281c
+size 5304

vocab.json ADDED Viewed

The diff for this file is too large to render. See raw diff