The cache for model files in Transformers v4.22.0 has been updated. Migrating your old cache. This is a one-time only operation. You can interrupt this and resume the migration later on by calling `transformers.utils.move_cache()`. 0it [00:00, ?it/s] 0it [00:00, ?it/s] /opt/conda/lib/python3.10/site-packages/torchvision/io/image.py:13: UserWarning: Failed to load image Python extension: '/opt/conda/lib/python3.10/site-packages/torchvision/image.so: undefined symbol: _ZN3c1017RegisterOperatorsD1Ev'If you don't plan on using image functionality from `torchvision.io`, you can ignore this warning. Otherwise, there might be something wrong with your environment. Did you have `libjpeg` or `libpng` installed before building `torchvision` from source? warn( 2024-10-28 15:22:21.126391: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered 2024-10-28 15:22:21.126494: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered 2024-10-28 15:22:21.332900: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered /opt/conda/lib/python3.10/site-packages/transformers/deepspeed.py:24: FutureWarning: transformers.deepspeed module is deprecated and will be removed in a future version. Please import deepspeed modules directly from transformers.integrations warnings.warn( /opt/conda/lib/python3.10/site-packages/transformers/training_args.py:1525: FutureWarning: `evaluation_strategy` is deprecated and will be removed in version 4.46 of 🤗 Transformers. Use `eval_strategy` instead warnings.warn( last_checkpoint=None Generating train split: 0 examples [00:00, ? examples/s] Generating train split: 898 examples [00:00, 6110.14 examples/s] Generating train split: 2225 examples [00:00, 9637.85 examples/s] Generating train split: 4000 examples [00:00, 12899.94 examples/s] Generating train split: 5828 examples [00:00, 14884.91 examples/s] Generating train split: 7523 examples [00:00, 15595.32 examples/s] Generating train split: 9285 examples [00:00, 16260.22 examples/s] Generating train split: 11058 examples [00:00, 16728.66 examples/s] Generating train split: 12987 examples [00:00, 17463.21 examples/s] Generating train split: 15664 examples [00:01, 17614.10 examples/s] Generating train split: 17488 examples [00:01, 17785.65 examples/s] Generating train split: 19329 examples [00:01, 17959.16 examples/s] Generating train split: 21183 examples [00:01, 18123.64 examples/s] Generating train split: 23018 examples [00:01, 18187.32 examples/s] Generating train split: 24889 examples [00:01, 18338.05 examples/s] Generating train split: 27601 examples [00:01, 18234.38 examples/s] Generating train split: 30329 examples [00:01, 18215.43 examples/s] Generating train split: 33000 examples [00:01, 18044.72 examples/s] Generating train split: 34879 examples [00:02, 18223.20 examples/s] Generating train split: 37557 examples [00:02, 18092.41 examples/s] Generating train split: 39383 examples [00:02, 18130.29 examples/s] Generating train split: 42043 examples [00:02, 17990.14 examples/s] Generating train split: 43985 examples [00:02, 18209.20 examples/s] Generating train split: 46627 examples [00:02, 18001.74 examples/s] Generating train split: 49281 examples [00:02, 17896.12 examples/s] Generating train split: 51952 examples [00:03, 17822.14 examples/s] Generating train split: 54598 examples [00:03, 17763.31 examples/s] Generating train split: 57204 examples [00:03, 17638.45 examples/s] Generating train split: 59000 examples [00:03, 17635.96 examples/s] Generating train split: 60891 examples [00:03, 17946.43 examples/s] Generating train split: 62703 examples [00:03, 17989.09 examples/s] Generating train split: 64521 examples [00:03, 18038.81 examples/s] Generating train split: 67228 examples [00:03, 18040.38 examples/s] Generating train split: 70000 examples [00:04, 18065.80 examples/s] Generating train split: 71836 examples [00:04, 18134.41 examples/s] Generating train split: 74547 examples [00:04, 18110.82 examples/s] Generating train split: 77226 examples [00:04, 18024.54 examples/s] Generating train split: 79988 examples [00:04, 18082.97 examples/s] Generating train split: 81799 examples [00:04, 18086.64 examples/s] Generating train split: 84394 examples [00:04, 17822.59 examples/s] Generating train split: 86197 examples [00:04, 17871.89 examples/s] Generating train split: 88000 examples [00:05, 17866.84 examples/s] Generating train split: 89877 examples [00:05, 18103.63 examples/s] Generating train split: 92512 examples [00:05, 17904.66 examples/s] Generating train split: 92867 examples [00:05, 17531.75 examples/s] Generating validation split: 0 examples [00:00, ? examples/s] Generating validation split: 1722 examples [00:00, 17097.36 examples/s] Running tokenizer on train dataset: 0%| | 0/92867 [00:00> There were missing keys in the checkpoint model loaded: ['model.encoder.embed_tokens.weight', 'model.decoder.embed_tokens.weight', 'lm_head.weight']. /opt/conda/lib/python3.10/site-packages/transformers/optimization.py:591: FutureWarning: This implementation of AdamW is deprecated and will be removed in a future version. Use the PyTorch implementation torch.optim.AdamW instead, or set `no_deprecation_warning=True` to disable this warning warnings.warn( All 61904 steps, warm_up steps: 200 /opt/conda/lib/python3.10/site-packages/transformers/trainer.py:3108: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature. torch.load(os.path.join(checkpoint, OPTIMIZER_NAME), map_location=map_location) wandb: WARNING The `run_name` is currently set to the same value as `TrainingArguments.output_dir`. If this was not intended, please specify a different run name by setting the `TrainingArguments.run_name` parameter. wandb: Currently logged in as: abdiharyadi. Use `wandb login --relogin` to force relogin wandb: wandb version 0.18.5 is available! To upgrade, please run: wandb: $ pip install wandb --upgrade wandb: Tracking run with wandb version 0.17.5 wandb: Run data is saved locally in /kaggle/tmp/amr-tst-indo/AMRBART-id/fine-tune/wandb/run-20241028_152523-2rzpheht wandb: Run `wandb offline` to turn off syncing. wandb: Syncing run /kaggle/tmp/amr-tst-indo/AMRBART-id/fine-tune/../outputs/mbart-en-id-smaller-fted wandb: ⭐️ View project at https://wandb.ai/abdiharyadi/amr-tst wandb: 🚀 View run at https://wandb.ai/abdiharyadi/amr-tst/runs/2rzpheht 0%| | 0/61904 [00:00> Some non-default generation parameters are set in the model config. These should go into a GenerationConfig file (https://huggingface.co/docs/transformers/generation_strategies#save-a-custom-decoding-strategy-with-your-model) instead. This warning will be raised to an exception in v4.41. Non-default generation parameters: {'max_length': 200, 'early_stopping': True, 'num_beams': 5, 'forced_eos_token_id': 2} /opt/conda/lib/python3.10/multiprocessing/popen_fork.py:66: RuntimeWarning: os.fork() was called. os.fork() is incompatible with multithreaded code, and JAX is multithreaded, so this will likely lead to a deadlock. self.pid = os.fork() 75%|███████▌ | 46429/61904 [1:42:46<1071:42:26, 249.31s/it] 75%|███████▌ | 46430/61904 [1:42:48<752:13:49, 175.01s/it] 75%|███████▌ | 46431/61904 [1:42:49<528:27:03, 122.95s/it] 75%|███████▌ | 46432/61904 [1:42:51<371:51:40, 86.52s/it] 75%|███████▌ | 46433/61904 [1:42:52<262:06:56, 60.99s/it] 75%|███████▌ | 46434/61904 [1:42:53<185:19:45, 43.13s/it] 75%|███████▌ | 46435/61904 [1:42:55<131:31:54, 30.61s/it] 75%|███████▌ | 46436/61904 [1:42:56<93:45:21, 21.82s/it] 75%|███████▌ | 46437/61904 [1:42:58<67:32:22, 15.72s/it] 75%|███████▌ | 46438/61904 [1:42:59<49:09:39, 11.44s/it] 75%|███████▌ | 46439/61904 [1:43:01<36:13:35, 8.43s/it] 75%|███████▌ | 46440/61904 [1:43:02<27:10:21, 6.33s/it] {'loss': 2.4527, 'learning_rate': 1.2506158433813043e-07, 'epoch': 12.0} 75%|███████▌ | 46440/61904 [1:43:02<27:10:21, 6.33s/it] 75%|███████▌ | 46441/61904 [1:43:03<20:55:43, 4.87s/it] 75%|███████▌ | 46442/61904 [1:43:05<16:29:49, 3.84s/it] 75%|███████▌ | 46443/61904 [1:43:06<13:28:42, 3.14s/it] 75%|███████▌ | 46444/61904 [1:43:08<11:29:30, 2.68s/it] 75%|███████▌ | 46445/61904 [1:43:09<9:53:21, 2.30s/it] 75%|███████▌ | 46446/61904 [1:43:11<8:38:04, 2.01s/it] 75%|███████▌ | 46447/61904 [1:43:12<8:09:47, 1.90s/it] 75%|███████▌ | 46448/61904 [1:43:14<7:37:23, 1.78s/it] 75%|███████▌ | 46449/61904 [1:43:15<7:16:07, 1.69s/it] 75%|███████▌ | 46450/61904 [1:43:17<6:55:58, 1.61s/it] 75%|███████▌ | 46451/61904 [1:43:18<6:49:45, 1.59s/it] 75%|███████▌ | 46452/61904 [1:43:20<6:34:49, 1.53s/it] 75%|███████▌ | 46453/61904 [1:43:21<6:30:23, 1.52s/it] 75%|███████▌ | 46454/61904 [1:43:23<6:24:05, 1.49s/it] 75%|███████▌ | 46455/61904 [1:43:24<6:11:39, 1.44s/it] 75%|███████▌ | 46456/61904 [1:43:25<6:00:46, 1.40s/it] 75%|███████▌ | 46457/61904 [1:43:27<6:07:43, 1.43s/it] 75%|███████▌ | 46458/61904 [1:43:28<5:58:29, 1.39s/it] 75%|███████▌ | 46459/61904 [1:43:30<6:02:24, 1.41s/it] 75%|███████▌ | 46460/61904 [1:43:31<6:05:42, 1.42s/it] {'loss': 2.436, 'learning_rate': 1.250291715285881e-07, 'epoch': 12.01} 75%|███████▌ | 46460/61904 [1:43:31<6:05:42, 1.42s/it] 75%|███████▌ | 46461/61904 [1:43:32<5:59:07, 1.40s/it] 75%|███████▌ | 46462/61904 [1:43:34<6:19:38, 1.48s/it] 75%|███████▌ | 46463/61904 [1:43:35<6:04:33, 1.42s/it] 75%|███████▌ | 46464/61904 [1:43:37<6:06:03, 1.42s/it] 75%|███████▌ | 46465/61904 [1:43:38<5:56:54, 1.39s/it] 75%|███████▌ | 46466/61904 [1:43:40<6:05:41, 1.42s/it] 75%|███████▌ | 46467/61904 [1:43:41<6:33:40, 1.53s/it] 75%|███████▌ | 46468/61904 [1:43:43<6:11:53, 1.45s/it] 75%|███████▌ | 46469/61904 [1:43:44<6:02:19, 1.41s/it] 75%|███████▌ | 46470/61904 [1:43:45<5:58:53, 1.40s/it] 75%|███████▌ | 46471/61904 [1:43:47<6:09:47, 1.44s/it] 75%|███████▌ | 46472/61904 [1:43:48<6:04:29, 1.42s/it] 75%|███████▌ | 46473/61904 [1:43:49<5:57:32, 1.39s/it] 75%|███████▌ | 46474/61904 [1:43:51<6:04:04, 1.42s/it] 75%|███████▌ | 46475/61904 [1:43:52<6:07:06, 1.43s/it] 75%|███████▌ | 46476/61904 [1:43:54<6:03:32, 1.41s/it] 75%|███████▌ | 46477/61904 [1:43:55<6:01:12, 1.40s/it] 75%|███████▌ | 46478/61904 [1:43:57<6:06:32, 1.43s/it] 75%|███████▌ | 46479/61904 [1:43:58<6:13:43, 1.45s/it] 75%|███████▌ | 46480/61904 [1:44:00<6:08:51, 1.43s/it] {'loss': 2.4514, 'learning_rate': 1.2499675871904575e-07, 'epoch': 12.01} 75%|███████▌ | 46480/61904 [1:44:00<6:08:51, 1.43s/it] 75%|███████▌ | 46481/61904 [1:44:01<6:03:09, 1.41s/it] 75%|███████▌ | 46482/61904 [1:44:02<6:12:59, 1.45s/it] 75%|███████▌ | 46483/61904 [1:44:04<6:12:53, 1.45s/it] 75%|███████▌ | 46484/61904 [1:44:05<6:09:21, 1.44s/it] 75%|███████▌ | 46485/61904 [1:44:07<6:21:03, 1.48s/it]