The cache for model files in Transformers v4.22.0 has been updated. Migrating your old cache. This is a one-time only operation. You can interrupt this and resume the migration later on by calling `transformers.utils.move_cache()`. 0it [00:00, ?it/s] 0it [00:00, ?it/s] /opt/conda/lib/python3.10/site-packages/torchvision/io/image.py:13: UserWarning: Failed to load image Python extension: '/opt/conda/lib/python3.10/site-packages/torchvision/image.so: undefined symbol: _ZN3c1017RegisterOperatorsD1Ev'If you don't plan on using image functionality from `torchvision.io`, you can ignore this warning. Otherwise, there might be something wrong with your environment. Did you have `libjpeg` or `libpng` installed before building `torchvision` from source? warn( 2024-10-25 01:18:03.423955: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered 2024-10-25 01:18:03.424084: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered 2024-10-25 01:18:03.562275: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered /opt/conda/lib/python3.10/site-packages/transformers/deepspeed.py:24: FutureWarning: transformers.deepspeed module is deprecated and will be removed in a future version. Please import deepspeed modules directly from transformers.integrations warnings.warn( /opt/conda/lib/python3.10/site-packages/transformers/training_args.py:1525: FutureWarning: `evaluation_strategy` is deprecated and will be removed in version 4.46 of 🤗 Transformers. Use `eval_strategy` instead warnings.warn( /opt/conda/lib/python3.10/site-packages/transformers/tokenization_utils_base.py:1601: FutureWarning: `clean_up_tokenization_spaces` was not set. It will be set to `True` by default. This behavior will be depracted in transformers v4.45, and will be then set to `False` by default. For more details check this issue: https://github.com/huggingface/transformers/issues/31884 warnings.warn( Generating train split: 0 examples [00:00, ? examples/s] Generating train split: 1 examples [00:00, 9.64 examples/s] Generating train split: 1629 examples [00:00, 9433.13 examples/s] Generating train split: 3247 examples [00:00, 12482.57 examples/s] Generating train split: 5000 examples [00:00, 14388.71 examples/s] Generating train split: 6882 examples [00:00, 15971.88 examples/s] Generating train split: 8690 examples [00:00, 16683.55 examples/s] Generating train split: 10562 examples [00:00, 17344.81 examples/s] Generating train split: 12423 examples [00:00, 17744.79 examples/s] Generating train split: 14292 examples [00:00, 18036.96 examples/s] Generating train split: 16160 examples [00:01, 18234.20 examples/s] Generating train split: 18000 examples [00:01, 18062.25 examples/s] Generating train split: 20513 examples [00:01, 17541.22 examples/s] Generating train split: 22355 examples [00:01, 17775.77 examples/s] Generating train split: 24183 examples [00:01, 17914.16 examples/s] Generating train split: 26032 examples [00:01, 18077.37 examples/s] Generating train split: 28000 examples [00:01, 18229.14 examples/s] Generating train split: 29906 examples [00:01, 18467.16 examples/s] Generating train split: 32664 examples [00:01, 18433.43 examples/s] Generating train split: 35285 examples [00:02, 18091.59 examples/s] Generating train split: 38013 examples [00:02, 18120.13 examples/s] Generating train split: 39890 examples [00:02, 18275.79 examples/s] Generating train split: 42593 examples [00:02, 18185.28 examples/s] Generating train split: 44444 examples [00:02, 18262.23 examples/s] Generating train split: 46284 examples [00:02, 18293.94 examples/s] Generating train split: 48127 examples [00:02, 18326.88 examples/s] Generating train split: 50000 examples [00:02, 18332.91 examples/s] Generating train split: 51893 examples [00:02, 18500.54 examples/s] Generating train split: 54625 examples [00:03, 18388.69 examples/s] Generating train split: 57265 examples [00:03, 18108.54 examples/s] Generating train split: 60000 examples [00:03, 18095.93 examples/s] Generating train split: 62000 examples [00:03, 18222.44 examples/s] Generating train split: 63941 examples [00:03, 18520.69 examples/s] Generating train split: 65812 examples [00:03, 18568.69 examples/s] Generating train split: 68595 examples [00:03, 18561.16 examples/s] Generating train split: 71360 examples [00:04, 18511.88 examples/s] Generating train split: 74075 examples [00:04, 18374.99 examples/s] Generating train split: 75995 examples [00:04, 18568.35 examples/s] Generating train split: 78707 examples [00:04, 18399.49 examples/s] Generating train split: 81461 examples [00:04, 18384.03 examples/s] Generating train split: 84174 examples [00:04, 18285.59 examples/s] Generating train split: 86021 examples [00:04, 18326.39 examples/s] Generating train split: 88000 examples [00:04, 18361.99 examples/s] Generating train split: 89911 examples [00:05, 18552.70 examples/s] Generating train split: 92600 examples [00:05, 18327.45 examples/s] Generating train split: 92867 examples [00:05, 17814.29 examples/s] Generating validation split: 0 examples [00:00, ? examples/s] Generating validation split: 1722 examples [00:00, 17667.23 examples/s] Running tokenizer on train dataset: 0%| | 0/92867 [00:00> Some non-default generation parameters are set in the model config. These should go into a GenerationConfig file (https://huggingface.co/docs/transformers/generation_strategies#save-a-custom-decoding-strategy-with-your-model) instead. This warning will be raised to an exception in v4.41. Non-default generation parameters: {'max_length': 200, 'early_stopping': True, 'num_beams': 5, 'forced_eos_token_id': 2} /opt/conda/lib/python3.10/multiprocessing/popen_fork.py:66: RuntimeWarning: os.fork() was called. os.fork() is incompatible with multithreaded code, and JAX is multithreaded, so this will likely lead to a deadlock. self.pid = os.fork() 6%|▋ | 3870/61904 [2:01:52<9730:10:48, 603.59s/it] 6%|▋ | 3871/61904 [2:01:54<6817:16:16, 422.90s/it] 6%|▋ | 3872/61904 [2:01:55<4778:34:57, 296.44s/it] 6%|▋ | 3873/61904 [2:01:57<3352:37:10, 207.98s/it] 6%|▋ | 3874/61904 [2:01:58<2354:23:50, 146.06s/it] 6%|▋ | 3875/61904 [2:02:00<1655:35:21, 102.71s/it] 6%|▋ | 3876/61904 [2:02:01<1166:37:02, 72.38s/it] 6%|▋ | 3877/61904 [2:02:03<822:42:15, 51.04s/it] 6%|▋ | 3878/61904 [2:02:04<583:02:18, 36.17s/it] 6%|▋ | 3879/61904 [2:02:06<414:54:48, 25.74s/it] 6%|▋ | 3880/61904 [2:02:07<297:12:41, 18.44s/it] {'loss': 3.0219, 'learning_rate': 1.9403604304421105e-07, 'epoch': 1.0} 6%|▋ | 3880/61904 [2:02:07<297:12:41, 18.44s/it] 6%|▋ | 3881/61904 [2:02:08<214:38:11, 13.32s/it] 6%|▋ | 3882/61904 [2:02:10<157:09:50, 9.75s/it] 6%|▋ | 3883/61904 [2:02:11<116:33:26, 7.23s/it] 6%|▋ | 3884/61904 [2:02:13<88:43:15, 5.50s/it] 6%|▋ | 3885/61904 [2:02:14<69:09:26, 4.29s/it] 6%|▋ | 3886/61904 [2:02:15<55:16:04, 3.43s/it] 6%|▋ | 3887/61904 [2:02:17<45:40:38, 2.83s/it] 6%|▋ | 3888/61904 [2:02:19<39:43:49, 2.47s/it] 6%|▋ | 3889/61904 [2:02:20<35:31:27, 2.20s/it] 6%|▋ | 3890/61904 [2:02:22<31:52:09, 1.98s/it] 6%|▋ | 3891/61904 [2:02:23<29:19:44, 1.82s/it] 6%|▋ | 3892/61904 [2:02:24<26:55:09, 1.67s/it] 6%|▋ | 3893/61904 [2:02:26<26:00:11, 1.61s/it] 6%|▋ | 3894/61904 [2:02:27<25:00:24, 1.55s/it] 6%|▋ | 3895/61904 [2:02:29<23:50:16, 1.48s/it] 6%|▋ | 3896/61904 [2:02:30<23:19:09, 1.45s/it] 6%|▋ | 3897/61904 [2:02:31<23:37:07, 1.47s/it] 6%|▋ | 3898/61904 [2:02:33<22:55:14, 1.42s/it] 6%|▋ | 3899/61904 [2:02:34<22:33:14, 1.40s/it] 6%|▋ | 3900/61904 [2:02:35<21:50:24, 1.36s/it] {'loss': 3.0378, 'learning_rate': 1.9400363023466873e-07, 'epoch': 1.01} 6%|▋ | 3900/61904 [2:02:35<21:50:24, 1.36s/it] 6%|▋ | 3901/61904 [2:02:37<21:52:31, 1.36s/it] 6%|▋ | 3902/61904 [2:02:38<21:48:03, 1.35s/it] 6%|▋ | 3903/61904 [2:02:39<21:42:57, 1.35s/it] 6%|▋ | 3904/61904 [2:02:41<22:11:27, 1.38s/it] 6%|▋ | 3905/61904 [2:02:42<21:41:42, 1.35s/it] 6%|▋ | 3906/61904 [2:02:43<21:16:43, 1.32s/it] 6%|▋ | 3907/61904 [2:02:45<20:43:01, 1.29s/it] 6%|▋ | 3908/61904 [2:02:46<21:13:35, 1.32s/it] 6%|▋ | 3909/61904 [2:02:47<21:24:06, 1.33s/it]