File size: 44,026 Bytes
39b6e99 df235e1 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 |
2024-06-03 17:54:49,033 INFO MainThread:34 [wandb_setup.py:_flush():76] Current SDK version is 0.17.0
2024-06-03 17:54:49,033 INFO MainThread:34 [wandb_setup.py:_flush():76] Configure stats pid to 34
2024-06-03 17:54:49,033 INFO MainThread:34 [wandb_setup.py:_flush():76] Loading settings from /root/.config/wandb/settings
2024-06-03 17:54:49,033 INFO MainThread:34 [wandb_setup.py:_flush():76] Loading settings from /kaggle/working/wandb/settings
2024-06-03 17:54:49,033 INFO MainThread:34 [wandb_setup.py:_flush():76] Loading settings from environment variables: {}
2024-06-03 17:54:49,033 INFO MainThread:34 [wandb_setup.py:_flush():76] Applying setup settings: {'_disable_service': False}
2024-06-03 17:54:49,033 INFO MainThread:34 [wandb_setup.py:_flush():76] Inferring run settings from compute environment: {'program': '<python with no main file>'}
2024-06-03 17:54:49,033 INFO MainThread:34 [wandb_setup.py:_flush():76] Applying login settings: {}
2024-06-03 17:54:49,033 INFO MainThread:34 [wandb_setup.py:_flush():76] Applying login settings: {'api_key': '***REDACTED***'}
2024-06-03 17:54:49,033 INFO MainThread:34 [wandb_init.py:_log_setup():520] Logging user logs to /kaggle/working/wandb/run-20240603_175449-d191dh7n/logs/debug.log
2024-06-03 17:54:49,034 INFO MainThread:34 [wandb_init.py:_log_setup():521] Logging internal logs to /kaggle/working/wandb/run-20240603_175449-d191dh7n/logs/debug-internal.log
2024-06-03 17:54:49,034 INFO MainThread:34 [wandb_init.py:_jupyter_setup():466] configuring jupyter hooks <wandb.sdk.wandb_init._WandbInit object at 0x78eae9ee9ab0>
2024-06-03 17:54:49,034 INFO MainThread:34 [wandb_init.py:init():560] calling init triggers
2024-06-03 17:54:49,034 INFO MainThread:34 [wandb_init.py:init():567] wandb.init called with sweep_config: {}
config: {}
2024-06-03 17:54:49,034 INFO MainThread:34 [wandb_init.py:init():610] starting backend
2024-06-03 17:54:49,034 INFO MainThread:34 [wandb_init.py:init():614] setting up manager
2024-06-03 17:54:49,036 INFO MainThread:34 [backend.py:_multiprocessing_setup():105] multiprocessing start_methods=fork,spawn,forkserver, using: spawn
2024-06-03 17:54:49,038 INFO MainThread:34 [wandb_init.py:init():622] backend started and connected
2024-06-03 17:54:49,052 INFO MainThread:34 [wandb_run.py:_label_probe_notebook():1328] probe notebook
2024-06-03 17:54:49,382 INFO MainThread:34 [wandb_init.py:init():711] updated telemetry
2024-06-03 17:54:49,386 INFO MainThread:34 [wandb_init.py:init():744] communicating run to backend with 90.0 second timeout
2024-06-03 17:54:49,688 INFO MainThread:34 [wandb_run.py:_on_init():2396] communicating current version
2024-06-03 17:54:49,771 INFO MainThread:34 [wandb_run.py:_on_init():2405] got version response
2024-06-03 17:54:49,772 INFO MainThread:34 [wandb_init.py:init():795] starting run threads in backend
2024-06-03 17:55:06,077 INFO MainThread:34 [wandb_run.py:_console_start():2374] atexit reg
2024-06-03 17:55:06,077 INFO MainThread:34 [wandb_run.py:_redirect():2229] redirect: wrap_raw
2024-06-03 17:55:06,078 INFO MainThread:34 [wandb_run.py:_redirect():2294] Wrapping output streams.
2024-06-03 17:55:06,078 INFO MainThread:34 [wandb_run.py:_redirect():2319] Redirects installed.
2024-06-03 17:55:06,081 INFO MainThread:34 [wandb_init.py:init():838] run started, returning control to user process
2024-06-03 17:55:06,087 INFO MainThread:34 [wandb_run.py:_config_callback():1376] config_cb None None {'vocab_size': 65024, 'hidden_size': 4544, 'num_hidden_layers': 32, 'num_attention_heads': 71, 'layer_norm_epsilon': 1e-05, 'initializer_range': 0.02, 'use_cache': False, 'hidden_dropout': 0.0, 'attention_dropout': 0.0, 'bos_token_id': 11, 'eos_token_id': 11, 'num_kv_heads': 71, 'alibi': False, 'new_decoder_architecture': False, 'multi_query': True, 'parallel_attn': True, 'bias': False, 'return_dict': True, 'output_hidden_states': False, 'output_attentions': False, 'torchscript': False, 'torch_dtype': 'bfloat16', 'use_bfloat16': False, 'tf_legacy_loss': False, 'pruned_heads': {}, 'tie_word_embeddings': True, 'chunk_size_feed_forward': 0, 'is_encoder_decoder': False, 'is_decoder': False, 'cross_attention_hidden_size': None, 'add_cross_attention': False, 'tie_encoder_decoder': False, 'max_length': 20, 'min_length': 0, 'do_sample': False, 'early_stopping': False, 'num_beams': 1, 'num_beam_groups': 1, 'diversity_penalty': 0.0, 'temperature': 1.0, 'top_k': 50, 'top_p': 1.0, 'typical_p': 1.0, 'repetition_penalty': 1.0, 'length_penalty': 1.0, 'no_repeat_ngram_size': 0, 'encoder_no_repeat_ngram_size': 0, 'bad_words_ids': None, 'num_return_sequences': 1, 'output_scores': False, 'return_dict_in_generate': False, 'forced_bos_token_id': None, 'forced_eos_token_id': None, 'remove_invalid_values': False, 'exponential_decay_length_penalty': None, 'suppress_tokens': None, 'begin_suppress_tokens': None, 'architectures': ['FalconForCausalLM'], 'finetuning_task': None, 'id2label': {0: 'LABEL_0', 1: 'LABEL_1'}, 'label2id': {'LABEL_0': 0, 'LABEL_1': 1}, 'tokenizer_class': None, 'prefix': None, 'pad_token_id': None, 'sep_token_id': None, 'decoder_start_token_id': None, 'task_specific_params': None, 'problem_type': None, '_name_or_path': 'tiiuae/falcon-7b', 'transformers_version': '4.41.1', 'apply_residual_connection_post_layernorm': False, 'auto_map': {'AutoConfig': 'tiiuae/falcon-7b--configuration_falcon.FalconConfig', 'AutoModel': 'tiiuae/falcon-7b--modeling_falcon.FalconModel', 'AutoModelForSequenceClassification': 'tiiuae/falcon-7b--modeling_falcon.FalconForSequenceClassification', 'AutoModelForTokenClassification': 'tiiuae/falcon-7b--modeling_falcon.FalconForTokenClassification', 'AutoModelForQuestionAnswering': 'tiiuae/falcon-7b--modeling_falcon.FalconForQuestionAnswering', 'AutoModelForCausalLM': 'tiiuae/falcon-7b--modeling_falcon.FalconForCausalLM'}, 'model_type': 'falcon', 'quantization_config': {'quant_method': 'QuantizationMethod.BITS_AND_BYTES', '_load_in_8bit': False, '_load_in_4bit': True, 'llm_int8_threshold': 6.0, 'llm_int8_skip_modules': None, 'llm_int8_enable_fp32_cpu_offload': False, 'llm_int8_has_fp16_weight': False, 'bnb_4bit_quant_type': 'nf4', 'bnb_4bit_use_double_quant': False, 'bnb_4bit_compute_dtype': 'bfloat16', 'bnb_4bit_quant_storage': 'uint8', 'load_in_4bit': True, 'load_in_8bit': False}, 'output_dir': '/kaggle/working/', 'overwrite_output_dir': False, 'do_train': False, 'do_eval': False, 'do_predict': False, 'eval_strategy': 'no', 'prediction_loss_only': False, 'per_device_train_batch_size': 8, 'per_device_eval_batch_size': 8, 'per_gpu_train_batch_size': None, 'per_gpu_eval_batch_size': None, 'gradient_accumulation_steps': 1, 'eval_accumulation_steps': None, 'eval_delay': 0, 'learning_rate': 0.0002, 'weight_decay': 0.0, 'adam_beta1': 0.9, 'adam_beta2': 0.999, 'adam_epsilon': 1e-08, 'max_grad_norm': 1.0, 'num_train_epochs': 4, 'max_steps': -1, 'lr_scheduler_type': 'linear', 'lr_scheduler_kwargs': {}, 'warmup_ratio': 0.0, 'warmup_steps': 0, 'log_level': 'passive', 'log_level_replica': 'warning', 'log_on_each_node': True, 'logging_dir': '/kaggle/working/runs/Jun03_17-40-11_f28ebe0d2526', 'logging_strategy': 'steps', 'logging_first_step': False, 'logging_steps': 10, 'logging_nan_inf_filter': True, 'save_strategy': 'epoch', 'save_steps': 500, 'save_total_limit': 4, 'save_safetensors': True, 'save_on_each_node': False, 'save_only_model': False, 'restore_callback_states_from_checkpoint': False, 'no_cuda': False, 'use_cpu': False, 'use_mps_device': False, 'seed': 42, 'data_seed': None, 'jit_mode_eval': False, 'use_ipex': False, 'bf16': False, 'fp16': True, 'fp16_opt_level': 'O1', 'half_precision_backend': 'auto', 'bf16_full_eval': False, 'fp16_full_eval': False, 'tf32': None, 'local_rank': 0, 'ddp_backend': None, 'tpu_num_cores': None, 'tpu_metrics_debug': False, 'debug': [], 'dataloader_drop_last': False, 'eval_steps': None, 'dataloader_num_workers': 0, 'dataloader_prefetch_factor': None, 'past_index': -1, 'run_name': '/kaggle/working/', 'disable_tqdm': False, 'remove_unused_columns': True, 'label_names': None, 'load_best_model_at_end': False, 'metric_for_best_model': None, 'greater_is_better': None, 'ignore_data_skip': False, 'fsdp': [], 'fsdp_min_num_params': 0, 'fsdp_config': {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, 'fsdp_transformer_layer_cls_to_wrap': None, 'accelerator_config': {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}, 'deepspeed': None, 'label_smoothing_factor': 0.0, 'optim': 'adamw_torch', 'optim_args': None, 'adafactor': False, 'group_by_length': False, 'length_column_name': 'length', 'report_to': ['tensorboard', 'wandb'], 'ddp_find_unused_parameters': None, 'ddp_bucket_cap_mb': None, 'ddp_broadcast_buffers': None, 'dataloader_pin_memory': True, 'dataloader_persistent_workers': False, 'skip_memory_metrics': True, 'use_legacy_prediction_loop': False, 'push_to_hub': True, 'resume_from_checkpoint': None, 'hub_model_id': 'othmanfa/fsttModel', 'hub_strategy': 'every_save', 'hub_token': '<HUB_TOKEN>', 'hub_private_repo': False, 'hub_always_push': False, 'gradient_checkpointing': False, 'gradient_checkpointing_kwargs': None, 'include_inputs_for_metrics': False, 'eval_do_concat_batches': True, 'fp16_backend': 'auto', 'evaluation_strategy': None, 'push_to_hub_model_id': None, 'push_to_hub_organization': None, 'push_to_hub_token': '<PUSH_TO_HUB_TOKEN>', 'mp_parameters': '', 'auto_find_batch_size': True, 'full_determinism': False, 'torchdynamo': None, 'ray_scope': 'last', 'ddp_timeout': 1800, 'torch_compile': False, 'torch_compile_backend': None, 'torch_compile_mode': None, 'dispatch_batches': None, 'split_batches': None, 'include_tokens_per_second': False, 'include_num_input_tokens_seen': False, 'neftune_noise_alpha': None, 'optim_target_modules': None, 'batch_eval_metrics': False}
2024-06-03 17:55:07,353 INFO MainThread:34 [jupyter.py:save_ipynb():373] not saving jupyter notebook
2024-06-03 17:55:07,353 INFO MainThread:34 [wandb_init.py:_pause_backend():431] pausing backend
2024-06-03 17:56:56,275 INFO MainThread:34 [wandb_init.py:_resume_backend():436] resuming backend
2024-06-03 17:56:56,290 INFO MainThread:34 [jupyter.py:save_ipynb():373] not saving jupyter notebook
2024-06-03 17:56:56,290 INFO MainThread:34 [wandb_init.py:_pause_backend():431] pausing backend
2024-06-03 17:56:59,514 INFO MainThread:34 [wandb_init.py:_resume_backend():436] resuming backend
2024-06-03 17:56:59,595 INFO MainThread:34 [jupyter.py:save_ipynb():373] not saving jupyter notebook
2024-06-03 17:56:59,595 INFO MainThread:34 [wandb_init.py:_pause_backend():431] pausing backend
2024-06-03 17:57:06,214 INFO MainThread:34 [wandb_init.py:_resume_backend():436] resuming backend
2024-06-03 17:57:06,261 INFO MainThread:34 [jupyter.py:save_ipynb():373] not saving jupyter notebook
2024-06-03 17:57:06,261 INFO MainThread:34 [wandb_init.py:_pause_backend():431] pausing backend
2024-06-03 18:01:57,364 INFO MainThread:34 [wandb_init.py:_resume_backend():436] resuming backend
2024-06-03 18:01:57,366 INFO MainThread:34 [jupyter.py:save_ipynb():373] not saving jupyter notebook
2024-06-03 18:01:57,366 INFO MainThread:34 [wandb_init.py:_pause_backend():431] pausing backend
2024-06-03 18:02:16,908 INFO MainThread:34 [wandb_init.py:_resume_backend():436] resuming backend
2024-06-03 18:02:16,951 INFO MainThread:34 [jupyter.py:save_ipynb():373] not saving jupyter notebook
2024-06-03 18:02:16,952 INFO MainThread:34 [wandb_init.py:_pause_backend():431] pausing backend
2024-06-03 18:02:46,250 INFO MainThread:34 [wandb_init.py:_resume_backend():436] resuming backend
2024-06-03 18:02:46,252 INFO MainThread:34 [jupyter.py:save_ipynb():373] not saving jupyter notebook
2024-06-03 18:02:46,252 INFO MainThread:34 [wandb_init.py:_pause_backend():431] pausing backend
2024-06-03 18:03:47,943 INFO MainThread:34 [wandb_init.py:_resume_backend():436] resuming backend
2024-06-03 18:03:48,029 INFO MainThread:34 [jupyter.py:save_ipynb():373] not saving jupyter notebook
2024-06-03 18:03:48,029 INFO MainThread:34 [wandb_init.py:_pause_backend():431] pausing backend
2024-06-03 18:04:13,706 INFO MainThread:34 [wandb_init.py:_resume_backend():436] resuming backend
2024-06-03 18:04:13,759 INFO MainThread:34 [jupyter.py:save_ipynb():373] not saving jupyter notebook
2024-06-03 18:04:13,759 INFO MainThread:34 [wandb_init.py:_pause_backend():431] pausing backend
2024-06-03 18:04:26,491 INFO MainThread:34 [wandb_init.py:_resume_backend():436] resuming backend
2024-06-03 18:04:26,697 INFO MainThread:34 [jupyter.py:save_ipynb():373] not saving jupyter notebook
2024-06-03 18:04:26,697 INFO MainThread:34 [wandb_init.py:_pause_backend():431] pausing backend
2024-06-03 18:04:34,326 INFO MainThread:34 [wandb_init.py:_resume_backend():436] resuming backend
2024-06-03 18:04:35,570 INFO MainThread:34 [wandb_run.py:_config_callback():1376] config_cb None None {'vocab_size': 65024, 'hidden_size': 4544, 'num_hidden_layers': 32, 'num_attention_heads': 71, 'layer_norm_epsilon': 1e-05, 'initializer_range': 0.02, 'use_cache': False, 'hidden_dropout': 0.0, 'attention_dropout': 0.0, 'bos_token_id': 11, 'eos_token_id': 11, 'num_kv_heads': 71, 'alibi': False, 'new_decoder_architecture': False, 'multi_query': True, 'parallel_attn': True, 'bias': False, 'return_dict': True, 'output_hidden_states': False, 'output_attentions': False, 'torchscript': False, 'torch_dtype': 'bfloat16', 'use_bfloat16': False, 'tf_legacy_loss': False, 'pruned_heads': {}, 'tie_word_embeddings': True, 'chunk_size_feed_forward': 0, 'is_encoder_decoder': False, 'is_decoder': False, 'cross_attention_hidden_size': None, 'add_cross_attention': False, 'tie_encoder_decoder': False, 'max_length': 20, 'min_length': 0, 'do_sample': False, 'early_stopping': False, 'num_beams': 1, 'num_beam_groups': 1, 'diversity_penalty': 0.0, 'temperature': 1.0, 'top_k': 50, 'top_p': 1.0, 'typical_p': 1.0, 'repetition_penalty': 1.0, 'length_penalty': 1.0, 'no_repeat_ngram_size': 0, 'encoder_no_repeat_ngram_size': 0, 'bad_words_ids': None, 'num_return_sequences': 1, 'output_scores': False, 'return_dict_in_generate': False, 'forced_bos_token_id': None, 'forced_eos_token_id': None, 'remove_invalid_values': False, 'exponential_decay_length_penalty': None, 'suppress_tokens': None, 'begin_suppress_tokens': None, 'architectures': ['FalconForCausalLM'], 'finetuning_task': None, 'id2label': {0: 'LABEL_0', 1: 'LABEL_1'}, 'label2id': {'LABEL_0': 0, 'LABEL_1': 1}, 'tokenizer_class': None, 'prefix': None, 'pad_token_id': None, 'sep_token_id': None, 'decoder_start_token_id': None, 'task_specific_params': None, 'problem_type': None, '_name_or_path': 'tiiuae/falcon-7b', 'transformers_version': '4.41.1', 'apply_residual_connection_post_layernorm': False, 'auto_map': {'AutoConfig': 'tiiuae/falcon-7b--configuration_falcon.FalconConfig', 'AutoModel': 'tiiuae/falcon-7b--modeling_falcon.FalconModel', 'AutoModelForSequenceClassification': 'tiiuae/falcon-7b--modeling_falcon.FalconForSequenceClassification', 'AutoModelForTokenClassification': 'tiiuae/falcon-7b--modeling_falcon.FalconForTokenClassification', 'AutoModelForQuestionAnswering': 'tiiuae/falcon-7b--modeling_falcon.FalconForQuestionAnswering', 'AutoModelForCausalLM': 'tiiuae/falcon-7b--modeling_falcon.FalconForCausalLM'}, 'model_type': 'falcon', 'quantization_config': {'quant_method': 'QuantizationMethod.BITS_AND_BYTES', '_load_in_8bit': False, '_load_in_4bit': True, 'llm_int8_threshold': 6.0, 'llm_int8_skip_modules': None, 'llm_int8_enable_fp32_cpu_offload': False, 'llm_int8_has_fp16_weight': False, 'bnb_4bit_quant_type': 'nf4', 'bnb_4bit_use_double_quant': False, 'bnb_4bit_compute_dtype': 'bfloat16', 'bnb_4bit_quant_storage': 'uint8', 'load_in_4bit': True, 'load_in_8bit': False}, 'output_dir': '/kaggle/working/', 'overwrite_output_dir': False, 'do_train': False, 'do_eval': False, 'do_predict': False, 'eval_strategy': 'no', 'prediction_loss_only': False, 'per_device_train_batch_size': 8, 'per_device_eval_batch_size': 8, 'per_gpu_train_batch_size': None, 'per_gpu_eval_batch_size': None, 'gradient_accumulation_steps': 1, 'eval_accumulation_steps': None, 'eval_delay': 0, 'learning_rate': 0.0002, 'weight_decay': 0.0, 'adam_beta1': 0.9, 'adam_beta2': 0.999, 'adam_epsilon': 1e-08, 'max_grad_norm': 1.0, 'num_train_epochs': 4, 'max_steps': -1, 'lr_scheduler_type': 'linear', 'lr_scheduler_kwargs': {}, 'warmup_ratio': 0.0, 'warmup_steps': 0, 'log_level': 'passive', 'log_level_replica': 'warning', 'log_on_each_node': True, 'logging_dir': '/kaggle/working/runs/Jun03_17-40-11_f28ebe0d2526', 'logging_strategy': 'steps', 'logging_first_step': False, 'logging_steps': 10, 'logging_nan_inf_filter': True, 'save_strategy': 'epoch', 'save_steps': 500, 'save_total_limit': 4, 'save_safetensors': True, 'save_on_each_node': False, 'save_only_model': False, 'restore_callback_states_from_checkpoint': False, 'no_cuda': False, 'use_cpu': False, 'use_mps_device': False, 'seed': 42, 'data_seed': None, 'jit_mode_eval': False, 'use_ipex': False, 'bf16': False, 'fp16': True, 'fp16_opt_level': 'O1', 'half_precision_backend': 'auto', 'bf16_full_eval': False, 'fp16_full_eval': False, 'tf32': None, 'local_rank': 0, 'ddp_backend': None, 'tpu_num_cores': None, 'tpu_metrics_debug': False, 'debug': [], 'dataloader_drop_last': False, 'eval_steps': None, 'dataloader_num_workers': 0, 'dataloader_prefetch_factor': None, 'past_index': -1, 'run_name': '/kaggle/working/', 'disable_tqdm': False, 'remove_unused_columns': True, 'label_names': None, 'load_best_model_at_end': False, 'metric_for_best_model': None, 'greater_is_better': None, 'ignore_data_skip': False, 'fsdp': [], 'fsdp_min_num_params': 0, 'fsdp_config': {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, 'fsdp_transformer_layer_cls_to_wrap': None, 'accelerator_config': {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}, 'deepspeed': None, 'label_smoothing_factor': 0.0, 'optim': 'adamw_torch', 'optim_args': None, 'adafactor': False, 'group_by_length': False, 'length_column_name': 'length', 'report_to': ['tensorboard', 'wandb'], 'ddp_find_unused_parameters': None, 'ddp_bucket_cap_mb': None, 'ddp_broadcast_buffers': None, 'dataloader_pin_memory': True, 'dataloader_persistent_workers': False, 'skip_memory_metrics': True, 'use_legacy_prediction_loop': False, 'push_to_hub': True, 'resume_from_checkpoint': None, 'hub_model_id': 'othmanfa/fsttModel', 'hub_strategy': 'every_save', 'hub_token': '<HUB_TOKEN>', 'hub_private_repo': False, 'hub_always_push': False, 'gradient_checkpointing': False, 'gradient_checkpointing_kwargs': None, 'include_inputs_for_metrics': False, 'eval_do_concat_batches': True, 'fp16_backend': 'auto', 'evaluation_strategy': None, 'push_to_hub_model_id': None, 'push_to_hub_organization': None, 'push_to_hub_token': '<PUSH_TO_HUB_TOKEN>', 'mp_parameters': '', 'auto_find_batch_size': True, 'full_determinism': False, 'torchdynamo': None, 'ray_scope': 'last', 'ddp_timeout': 1800, 'torch_compile': False, 'torch_compile_backend': None, 'torch_compile_mode': None, 'dispatch_batches': None, 'split_batches': None, 'include_tokens_per_second': False, 'include_num_input_tokens_seen': False, 'neftune_noise_alpha': None, 'optim_target_modules': None, 'batch_eval_metrics': False}
2024-06-03 18:18:50,784 INFO MainThread:34 [jupyter.py:save_ipynb():373] not saving jupyter notebook
2024-06-03 18:18:50,784 INFO MainThread:34 [wandb_init.py:_pause_backend():431] pausing backend
2024-06-03 18:41:05,951 INFO MainThread:34 [wandb_init.py:_resume_backend():436] resuming backend
2024-06-03 18:41:05,953 INFO MainThread:34 [jupyter.py:save_ipynb():373] not saving jupyter notebook
2024-06-03 18:41:05,953 INFO MainThread:34 [wandb_init.py:_pause_backend():431] pausing backend
2024-06-03 18:45:28,892 INFO MainThread:34 [wandb_init.py:_resume_backend():436] resuming backend
2024-06-03 18:45:28,927 INFO MainThread:34 [jupyter.py:save_ipynb():373] not saving jupyter notebook
2024-06-03 18:45:28,927 INFO MainThread:34 [wandb_init.py:_pause_backend():431] pausing backend
2024-06-03 18:45:30,228 INFO MainThread:34 [wandb_init.py:_resume_backend():436] resuming backend
2024-06-03 18:45:30,229 INFO MainThread:34 [jupyter.py:save_ipynb():373] not saving jupyter notebook
2024-06-03 18:45:30,229 INFO MainThread:34 [wandb_init.py:_pause_backend():431] pausing backend
2024-06-03 18:45:31,254 INFO MainThread:34 [wandb_init.py:_resume_backend():436] resuming backend
2024-06-03 18:45:31,276 INFO MainThread:34 [jupyter.py:save_ipynb():373] not saving jupyter notebook
2024-06-03 18:45:31,276 INFO MainThread:34 [wandb_init.py:_pause_backend():431] pausing backend
2024-06-03 18:45:33,122 INFO MainThread:34 [wandb_init.py:_resume_backend():436] resuming backend
2024-06-03 18:45:33,358 INFO MainThread:34 [jupyter.py:save_ipynb():373] not saving jupyter notebook
2024-06-03 18:45:33,358 INFO MainThread:34 [wandb_init.py:_pause_backend():431] pausing backend
2024-06-03 18:45:36,415 INFO MainThread:34 [wandb_init.py:_resume_backend():436] resuming backend
2024-06-03 18:45:37,683 INFO MainThread:34 [wandb_run.py:_config_callback():1376] config_cb None None {'vocab_size': 65024, 'hidden_size': 4544, 'num_hidden_layers': 32, 'num_attention_heads': 71, 'layer_norm_epsilon': 1e-05, 'initializer_range': 0.02, 'use_cache': False, 'hidden_dropout': 0.0, 'attention_dropout': 0.0, 'bos_token_id': 11, 'eos_token_id': 11, 'num_kv_heads': 71, 'alibi': False, 'new_decoder_architecture': False, 'multi_query': True, 'parallel_attn': True, 'bias': False, 'return_dict': True, 'output_hidden_states': False, 'output_attentions': False, 'torchscript': False, 'torch_dtype': 'bfloat16', 'use_bfloat16': False, 'tf_legacy_loss': False, 'pruned_heads': {}, 'tie_word_embeddings': True, 'chunk_size_feed_forward': 0, 'is_encoder_decoder': False, 'is_decoder': False, 'cross_attention_hidden_size': None, 'add_cross_attention': False, 'tie_encoder_decoder': False, 'max_length': 20, 'min_length': 0, 'do_sample': False, 'early_stopping': False, 'num_beams': 1, 'num_beam_groups': 1, 'diversity_penalty': 0.0, 'temperature': 1.0, 'top_k': 50, 'top_p': 1.0, 'typical_p': 1.0, 'repetition_penalty': 1.0, 'length_penalty': 1.0, 'no_repeat_ngram_size': 0, 'encoder_no_repeat_ngram_size': 0, 'bad_words_ids': None, 'num_return_sequences': 1, 'output_scores': False, 'return_dict_in_generate': False, 'forced_bos_token_id': None, 'forced_eos_token_id': None, 'remove_invalid_values': False, 'exponential_decay_length_penalty': None, 'suppress_tokens': None, 'begin_suppress_tokens': None, 'architectures': ['FalconForCausalLM'], 'finetuning_task': None, 'id2label': {0: 'LABEL_0', 1: 'LABEL_1'}, 'label2id': {'LABEL_0': 0, 'LABEL_1': 1}, 'tokenizer_class': None, 'prefix': None, 'pad_token_id': None, 'sep_token_id': None, 'decoder_start_token_id': None, 'task_specific_params': None, 'problem_type': None, '_name_or_path': 'tiiuae/falcon-7b', 'transformers_version': '4.41.1', 'apply_residual_connection_post_layernorm': False, 'auto_map': {'AutoConfig': 'tiiuae/falcon-7b--configuration_falcon.FalconConfig', 'AutoModel': 'tiiuae/falcon-7b--modeling_falcon.FalconModel', 'AutoModelForSequenceClassification': 'tiiuae/falcon-7b--modeling_falcon.FalconForSequenceClassification', 'AutoModelForTokenClassification': 'tiiuae/falcon-7b--modeling_falcon.FalconForTokenClassification', 'AutoModelForQuestionAnswering': 'tiiuae/falcon-7b--modeling_falcon.FalconForQuestionAnswering', 'AutoModelForCausalLM': 'tiiuae/falcon-7b--modeling_falcon.FalconForCausalLM'}, 'model_type': 'falcon', 'quantization_config': {'quant_method': 'QuantizationMethod.BITS_AND_BYTES', '_load_in_8bit': False, '_load_in_4bit': True, 'llm_int8_threshold': 6.0, 'llm_int8_skip_modules': None, 'llm_int8_enable_fp32_cpu_offload': False, 'llm_int8_has_fp16_weight': False, 'bnb_4bit_quant_type': 'nf4', 'bnb_4bit_use_double_quant': False, 'bnb_4bit_compute_dtype': 'bfloat16', 'bnb_4bit_quant_storage': 'uint8', 'load_in_4bit': True, 'load_in_8bit': False}, 'output_dir': '/kaggle/working/', 'overwrite_output_dir': False, 'do_train': False, 'do_eval': False, 'do_predict': False, 'eval_strategy': 'no', 'prediction_loss_only': False, 'per_device_train_batch_size': 8, 'per_device_eval_batch_size': 8, 'per_gpu_train_batch_size': None, 'per_gpu_eval_batch_size': None, 'gradient_accumulation_steps': 1, 'eval_accumulation_steps': None, 'eval_delay': 0, 'learning_rate': 0.0002, 'weight_decay': 0.0, 'adam_beta1': 0.9, 'adam_beta2': 0.999, 'adam_epsilon': 1e-08, 'max_grad_norm': 1.0, 'num_train_epochs': 20, 'max_steps': -1, 'lr_scheduler_type': 'linear', 'lr_scheduler_kwargs': {}, 'warmup_ratio': 0.0, 'warmup_steps': 0, 'log_level': 'passive', 'log_level_replica': 'warning', 'log_on_each_node': True, 'logging_dir': '/kaggle/working/runs/Jun03_18-45-28_f28ebe0d2526', 'logging_strategy': 'steps', 'logging_first_step': False, 'logging_steps': 10, 'logging_nan_inf_filter': True, 'save_strategy': 'epoch', 'save_steps': 500, 'save_total_limit': 4, 'save_safetensors': True, 'save_on_each_node': False, 'save_only_model': False, 'restore_callback_states_from_checkpoint': False, 'no_cuda': False, 'use_cpu': False, 'use_mps_device': False, 'seed': 42, 'data_seed': None, 'jit_mode_eval': False, 'use_ipex': False, 'bf16': False, 'fp16': True, 'fp16_opt_level': 'O1', 'half_precision_backend': 'auto', 'bf16_full_eval': False, 'fp16_full_eval': False, 'tf32': None, 'local_rank': 0, 'ddp_backend': None, 'tpu_num_cores': None, 'tpu_metrics_debug': False, 'debug': [], 'dataloader_drop_last': False, 'eval_steps': None, 'dataloader_num_workers': 0, 'dataloader_prefetch_factor': None, 'past_index': -1, 'run_name': '/kaggle/working/', 'disable_tqdm': False, 'remove_unused_columns': True, 'label_names': None, 'load_best_model_at_end': False, 'metric_for_best_model': None, 'greater_is_better': None, 'ignore_data_skip': False, 'fsdp': [], 'fsdp_min_num_params': 0, 'fsdp_config': {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, 'fsdp_transformer_layer_cls_to_wrap': None, 'accelerator_config': {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}, 'deepspeed': None, 'label_smoothing_factor': 0.0, 'optim': 'adamw_torch', 'optim_args': None, 'adafactor': False, 'group_by_length': False, 'length_column_name': 'length', 'report_to': ['tensorboard', 'wandb'], 'ddp_find_unused_parameters': None, 'ddp_bucket_cap_mb': None, 'ddp_broadcast_buffers': None, 'dataloader_pin_memory': True, 'dataloader_persistent_workers': False, 'skip_memory_metrics': True, 'use_legacy_prediction_loop': False, 'push_to_hub': True, 'resume_from_checkpoint': None, 'hub_model_id': 'othmanfa/fsttModel', 'hub_strategy': 'every_save', 'hub_token': '<HUB_TOKEN>', 'hub_private_repo': False, 'hub_always_push': False, 'gradient_checkpointing': False, 'gradient_checkpointing_kwargs': None, 'include_inputs_for_metrics': False, 'eval_do_concat_batches': True, 'fp16_backend': 'auto', 'evaluation_strategy': None, 'push_to_hub_model_id': None, 'push_to_hub_organization': None, 'push_to_hub_token': '<PUSH_TO_HUB_TOKEN>', 'mp_parameters': '', 'auto_find_batch_size': True, 'full_determinism': False, 'torchdynamo': None, 'ray_scope': 'last', 'ddp_timeout': 1800, 'torch_compile': False, 'torch_compile_backend': None, 'torch_compile_mode': None, 'dispatch_batches': None, 'split_batches': None, 'include_tokens_per_second': False, 'include_num_input_tokens_seen': False, 'neftune_noise_alpha': None, 'optim_target_modules': None, 'batch_eval_metrics': False}
2024-06-03 19:55:43,601 INFO MainThread:34 [jupyter.py:save_ipynb():373] not saving jupyter notebook
2024-06-03 19:55:43,602 INFO MainThread:34 [wandb_init.py:_pause_backend():431] pausing backend
2024-06-03 19:56:53,516 INFO MainThread:34 [wandb_init.py:_resume_backend():436] resuming backend
2024-06-03 19:56:55,309 INFO MainThread:34 [jupyter.py:save_ipynb():373] not saving jupyter notebook
2024-06-03 19:56:55,309 INFO MainThread:34 [wandb_init.py:_pause_backend():431] pausing backend
2024-06-03 20:02:21,391 INFO MainThread:34 [wandb_init.py:_resume_backend():436] resuming backend
2024-06-03 20:02:22,164 INFO MainThread:34 [jupyter.py:save_ipynb():373] not saving jupyter notebook
2024-06-03 20:02:22,164 INFO MainThread:34 [wandb_init.py:_pause_backend():431] pausing backend
2024-06-03 20:03:12,802 INFO MainThread:34 [wandb_init.py:_resume_backend():436] resuming backend
2024-06-03 20:03:12,827 INFO MainThread:34 [jupyter.py:save_ipynb():373] not saving jupyter notebook
2024-06-03 20:03:12,827 INFO MainThread:34 [wandb_init.py:_pause_backend():431] pausing backend
2024-06-03 20:03:22,908 INFO MainThread:34 [wandb_init.py:_resume_backend():436] resuming backend
2024-06-03 20:03:23,545 INFO MainThread:34 [jupyter.py:save_ipynb():373] not saving jupyter notebook
2024-06-03 20:03:23,546 INFO MainThread:34 [wandb_init.py:_pause_backend():431] pausing backend
2024-06-03 20:04:16,404 INFO MainThread:34 [wandb_init.py:_resume_backend():436] resuming backend
2024-06-03 20:04:16,447 INFO MainThread:34 [jupyter.py:save_ipynb():373] not saving jupyter notebook
2024-06-03 20:04:16,447 INFO MainThread:34 [wandb_init.py:_pause_backend():431] pausing backend
2024-06-03 20:04:32,978 INFO MainThread:34 [wandb_init.py:_resume_backend():436] resuming backend
2024-06-03 20:04:33,028 INFO MainThread:34 [jupyter.py:save_ipynb():373] not saving jupyter notebook
2024-06-03 20:04:33,028 INFO MainThread:34 [wandb_init.py:_pause_backend():431] pausing backend
2024-06-03 20:05:18,072 INFO MainThread:34 [wandb_init.py:_resume_backend():436] resuming backend
2024-06-03 20:05:18,118 INFO MainThread:34 [jupyter.py:save_ipynb():373] not saving jupyter notebook
2024-06-03 20:05:18,118 INFO MainThread:34 [wandb_init.py:_pause_backend():431] pausing backend
2024-06-03 20:05:31,531 INFO MainThread:34 [wandb_init.py:_resume_backend():436] resuming backend
2024-06-03 20:05:31,580 INFO MainThread:34 [jupyter.py:save_ipynb():373] not saving jupyter notebook
2024-06-03 20:05:31,580 INFO MainThread:34 [wandb_init.py:_pause_backend():431] pausing backend
2024-06-03 20:05:44,101 INFO MainThread:34 [wandb_init.py:_resume_backend():436] resuming backend
2024-06-03 20:05:44,780 INFO MainThread:34 [jupyter.py:save_ipynb():373] not saving jupyter notebook
2024-06-03 20:05:44,780 INFO MainThread:34 [wandb_init.py:_pause_backend():431] pausing backend
2024-06-03 20:06:37,084 INFO MainThread:34 [wandb_init.py:_resume_backend():436] resuming backend
2024-06-03 20:06:37,830 INFO MainThread:34 [jupyter.py:save_ipynb():373] not saving jupyter notebook
2024-06-03 20:06:37,830 INFO MainThread:34 [wandb_init.py:_pause_backend():431] pausing backend
2024-06-03 20:08:59,975 INFO MainThread:34 [wandb_init.py:_resume_backend():436] resuming backend
2024-06-03 20:09:00,010 INFO MainThread:34 [jupyter.py:save_ipynb():373] not saving jupyter notebook
2024-06-03 20:09:00,010 INFO MainThread:34 [wandb_init.py:_pause_backend():431] pausing backend
2024-06-03 20:09:06,499 INFO MainThread:34 [wandb_init.py:_resume_backend():436] resuming backend
2024-06-03 20:09:06,500 INFO MainThread:34 [jupyter.py:save_ipynb():373] not saving jupyter notebook
2024-06-03 20:09:06,500 INFO MainThread:34 [wandb_init.py:_pause_backend():431] pausing backend
2024-06-03 20:09:07,197 INFO MainThread:34 [wandb_init.py:_resume_backend():436] resuming backend
2024-06-03 20:09:07,218 INFO MainThread:34 [jupyter.py:save_ipynb():373] not saving jupyter notebook
2024-06-03 20:09:07,218 INFO MainThread:34 [wandb_init.py:_pause_backend():431] pausing backend
2024-06-03 20:09:18,369 INFO MainThread:34 [wandb_init.py:_resume_backend():436] resuming backend
2024-06-03 20:09:19,119 INFO MainThread:34 [jupyter.py:save_ipynb():373] not saving jupyter notebook
2024-06-03 20:09:19,120 INFO MainThread:34 [wandb_init.py:_pause_backend():431] pausing backend
2024-06-03 20:11:52,561 INFO MainThread:34 [wandb_init.py:_resume_backend():436] resuming backend
2024-06-03 20:11:54,589 INFO MainThread:34 [jupyter.py:save_ipynb():373] not saving jupyter notebook
2024-06-03 20:11:54,589 INFO MainThread:34 [wandb_init.py:_pause_backend():431] pausing backend
2024-06-03 20:16:50,594 INFO MainThread:34 [wandb_init.py:_resume_backend():436] resuming backend
2024-06-03 20:19:04,529 INFO MainThread:34 [jupyter.py:save_ipynb():373] not saving jupyter notebook
2024-06-03 20:19:04,530 INFO MainThread:34 [wandb_init.py:_pause_backend():431] pausing backend
2024-06-03 20:20:33,194 INFO MainThread:34 [wandb_init.py:_resume_backend():436] resuming backend
2024-06-03 20:20:33,197 INFO MainThread:34 [jupyter.py:save_ipynb():373] not saving jupyter notebook
2024-06-03 20:20:33,197 INFO MainThread:34 [wandb_init.py:_pause_backend():431] pausing backend
2024-06-03 20:20:53,790 INFO MainThread:34 [wandb_init.py:_resume_backend():436] resuming backend
2024-06-03 20:24:20,236 INFO MainThread:34 [jupyter.py:save_ipynb():373] not saving jupyter notebook
2024-06-03 20:24:20,236 INFO MainThread:34 [wandb_init.py:_pause_backend():431] pausing backend
2024-06-03 20:32:45,840 INFO MainThread:34 [wandb_init.py:_resume_backend():436] resuming backend
2024-06-03 20:32:45,841 INFO MainThread:34 [jupyter.py:save_ipynb():373] not saving jupyter notebook
2024-06-03 20:32:45,841 INFO MainThread:34 [wandb_init.py:_pause_backend():431] pausing backend
2024-06-03 20:34:19,718 INFO MainThread:34 [wandb_init.py:_resume_backend():436] resuming backend
2024-06-03 20:34:19,722 INFO MainThread:34 [jupyter.py:save_ipynb():373] not saving jupyter notebook
2024-06-03 20:34:19,722 INFO MainThread:34 [wandb_init.py:_pause_backend():431] pausing backend
2024-06-03 20:34:21,601 INFO MainThread:34 [wandb_init.py:_resume_backend():436] resuming backend
2024-06-03 20:34:21,602 INFO MainThread:34 [jupyter.py:save_ipynb():373] not saving jupyter notebook
2024-06-03 20:34:21,602 INFO MainThread:34 [wandb_init.py:_pause_backend():431] pausing backend
2024-06-03 20:34:23,187 INFO MainThread:34 [wandb_init.py:_resume_backend():436] resuming backend
2024-06-03 20:37:48,397 INFO MainThread:34 [jupyter.py:save_ipynb():373] not saving jupyter notebook
2024-06-03 20:37:48,397 INFO MainThread:34 [wandb_init.py:_pause_backend():431] pausing backend
2024-06-03 20:38:33,502 INFO MainThread:34 [wandb_init.py:_resume_backend():436] resuming backend
2024-06-03 20:41:58,862 INFO MainThread:34 [jupyter.py:save_ipynb():373] not saving jupyter notebook
2024-06-03 20:41:58,862 INFO MainThread:34 [wandb_init.py:_pause_backend():431] pausing backend
2024-06-03 20:43:51,168 INFO MainThread:34 [wandb_init.py:_resume_backend():436] resuming backend
2024-06-03 20:43:51,171 INFO MainThread:34 [jupyter.py:save_ipynb():373] not saving jupyter notebook
2024-06-03 20:43:51,171 INFO MainThread:34 [wandb_init.py:_pause_backend():431] pausing backend
2024-06-03 20:43:53,895 INFO MainThread:34 [wandb_init.py:_resume_backend():436] resuming backend
2024-06-03 20:47:25,895 INFO MainThread:34 [jupyter.py:save_ipynb():373] not saving jupyter notebook
2024-06-03 20:47:25,895 INFO MainThread:34 [wandb_init.py:_pause_backend():431] pausing backend
2024-06-03 20:50:07,262 INFO MainThread:34 [wandb_init.py:_resume_backend():436] resuming backend
2024-06-03 20:50:07,303 INFO MainThread:34 [jupyter.py:save_ipynb():373] not saving jupyter notebook
2024-06-03 20:50:07,303 INFO MainThread:34 [wandb_init.py:_pause_backend():431] pausing backend
2024-06-03 20:50:09,915 INFO MainThread:34 [wandb_init.py:_resume_backend():436] resuming backend
2024-06-03 20:50:09,916 INFO MainThread:34 [jupyter.py:save_ipynb():373] not saving jupyter notebook
2024-06-03 20:50:09,917 INFO MainThread:34 [wandb_init.py:_pause_backend():431] pausing backend
2024-06-03 20:50:10,463 INFO MainThread:34 [wandb_init.py:_resume_backend():436] resuming backend
2024-06-03 20:50:10,484 INFO MainThread:34 [jupyter.py:save_ipynb():373] not saving jupyter notebook
2024-06-03 20:50:10,484 INFO MainThread:34 [wandb_init.py:_pause_backend():431] pausing backend
2024-06-03 20:50:13,975 INFO MainThread:34 [wandb_init.py:_resume_backend():436] resuming backend
2024-06-03 20:50:14,119 INFO MainThread:34 [jupyter.py:save_ipynb():373] not saving jupyter notebook
2024-06-03 20:50:14,119 INFO MainThread:34 [wandb_init.py:_pause_backend():431] pausing backend
2024-06-03 20:50:15,412 INFO MainThread:34 [wandb_init.py:_resume_backend():436] resuming backend
2024-06-03 20:50:16,872 INFO MainThread:34 [wandb_run.py:_config_callback():1376] config_cb None None {'vocab_size': 65024, 'hidden_size': 4544, 'num_hidden_layers': 32, 'num_attention_heads': 71, 'layer_norm_epsilon': 1e-05, 'initializer_range': 0.02, 'use_cache': False, 'hidden_dropout': 0.0, 'attention_dropout': 0.0, 'bos_token_id': 11, 'eos_token_id': 11, 'num_kv_heads': 71, 'alibi': False, 'new_decoder_architecture': False, 'multi_query': True, 'parallel_attn': True, 'bias': False, 'return_dict': True, 'output_hidden_states': False, 'output_attentions': False, 'torchscript': False, 'torch_dtype': 'bfloat16', 'use_bfloat16': False, 'tf_legacy_loss': False, 'pruned_heads': {}, 'tie_word_embeddings': True, 'chunk_size_feed_forward': 0, 'is_encoder_decoder': False, 'is_decoder': False, 'cross_attention_hidden_size': None, 'add_cross_attention': False, 'tie_encoder_decoder': False, 'max_length': 20, 'min_length': 0, 'do_sample': False, 'early_stopping': False, 'num_beams': 1, 'num_beam_groups': 1, 'diversity_penalty': 0.0, 'temperature': 1.0, 'top_k': 50, 'top_p': 1.0, 'typical_p': 1.0, 'repetition_penalty': 1.0, 'length_penalty': 1.0, 'no_repeat_ngram_size': 0, 'encoder_no_repeat_ngram_size': 0, 'bad_words_ids': None, 'num_return_sequences': 1, 'output_scores': False, 'return_dict_in_generate': False, 'forced_bos_token_id': None, 'forced_eos_token_id': None, 'remove_invalid_values': False, 'exponential_decay_length_penalty': None, 'suppress_tokens': None, 'begin_suppress_tokens': None, 'architectures': ['FalconForCausalLM'], 'finetuning_task': None, 'id2label': {0: 'LABEL_0', 1: 'LABEL_1'}, 'label2id': {'LABEL_0': 0, 'LABEL_1': 1}, 'tokenizer_class': None, 'prefix': None, 'pad_token_id': None, 'sep_token_id': None, 'decoder_start_token_id': None, 'task_specific_params': None, 'problem_type': None, '_name_or_path': 'tiiuae/falcon-7b', 'transformers_version': '4.41.1', 'apply_residual_connection_post_layernorm': False, 'auto_map': {'AutoConfig': 'tiiuae/falcon-7b--configuration_falcon.FalconConfig', 'AutoModel': 'tiiuae/falcon-7b--modeling_falcon.FalconModel', 'AutoModelForSequenceClassification': 'tiiuae/falcon-7b--modeling_falcon.FalconForSequenceClassification', 'AutoModelForTokenClassification': 'tiiuae/falcon-7b--modeling_falcon.FalconForTokenClassification', 'AutoModelForQuestionAnswering': 'tiiuae/falcon-7b--modeling_falcon.FalconForQuestionAnswering', 'AutoModelForCausalLM': 'tiiuae/falcon-7b--modeling_falcon.FalconForCausalLM'}, 'model_type': 'falcon', 'quantization_config': {'quant_method': 'QuantizationMethod.BITS_AND_BYTES', '_load_in_8bit': False, '_load_in_4bit': True, 'llm_int8_threshold': 6.0, 'llm_int8_skip_modules': None, 'llm_int8_enable_fp32_cpu_offload': False, 'llm_int8_has_fp16_weight': False, 'bnb_4bit_quant_type': 'nf4', 'bnb_4bit_use_double_quant': False, 'bnb_4bit_compute_dtype': 'bfloat16', 'bnb_4bit_quant_storage': 'uint8', 'load_in_4bit': True, 'load_in_8bit': False}, 'output_dir': '/kaggle/working/', 'overwrite_output_dir': False, 'do_train': False, 'do_eval': False, 'do_predict': False, 'eval_strategy': 'no', 'prediction_loss_only': False, 'per_device_train_batch_size': 8, 'per_device_eval_batch_size': 8, 'per_gpu_train_batch_size': None, 'per_gpu_eval_batch_size': None, 'gradient_accumulation_steps': 1, 'eval_accumulation_steps': None, 'eval_delay': 0, 'learning_rate': 0.0002, 'weight_decay': 0.0, 'adam_beta1': 0.9, 'adam_beta2': 0.999, 'adam_epsilon': 1e-08, 'max_grad_norm': 1.0, 'num_train_epochs': 20, 'max_steps': -1, 'lr_scheduler_type': 'linear', 'lr_scheduler_kwargs': {}, 'warmup_ratio': 0.0, 'warmup_steps': 0, 'log_level': 'passive', 'log_level_replica': 'warning', 'log_on_each_node': True, 'logging_dir': '/kaggle/working/runs/Jun03_20-50-07_f28ebe0d2526', 'logging_strategy': 'steps', 'logging_first_step': False, 'logging_steps': 10, 'logging_nan_inf_filter': True, 'save_strategy': 'epoch', 'save_steps': 500, 'save_total_limit': 4, 'save_safetensors': True, 'save_on_each_node': False, 'save_only_model': False, 'restore_callback_states_from_checkpoint': False, 'no_cuda': False, 'use_cpu': False, 'use_mps_device': False, 'seed': 42, 'data_seed': None, 'jit_mode_eval': False, 'use_ipex': False, 'bf16': False, 'fp16': True, 'fp16_opt_level': 'O1', 'half_precision_backend': 'auto', 'bf16_full_eval': False, 'fp16_full_eval': False, 'tf32': None, 'local_rank': 0, 'ddp_backend': None, 'tpu_num_cores': None, 'tpu_metrics_debug': False, 'debug': [], 'dataloader_drop_last': False, 'eval_steps': None, 'dataloader_num_workers': 0, 'dataloader_prefetch_factor': None, 'past_index': -1, 'run_name': '/kaggle/working/', 'disable_tqdm': False, 'remove_unused_columns': True, 'label_names': None, 'load_best_model_at_end': False, 'metric_for_best_model': None, 'greater_is_better': None, 'ignore_data_skip': False, 'fsdp': [], 'fsdp_min_num_params': 0, 'fsdp_config': {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, 'fsdp_transformer_layer_cls_to_wrap': None, 'accelerator_config': {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}, 'deepspeed': None, 'label_smoothing_factor': 0.0, 'optim': 'adamw_torch', 'optim_args': None, 'adafactor': False, 'group_by_length': False, 'length_column_name': 'length', 'report_to': ['tensorboard', 'wandb'], 'ddp_find_unused_parameters': None, 'ddp_bucket_cap_mb': None, 'ddp_broadcast_buffers': None, 'dataloader_pin_memory': True, 'dataloader_persistent_workers': False, 'skip_memory_metrics': True, 'use_legacy_prediction_loop': False, 'push_to_hub': True, 'resume_from_checkpoint': None, 'hub_model_id': 'othmanfa/fsttModel', 'hub_strategy': 'every_save', 'hub_token': '<HUB_TOKEN>', 'hub_private_repo': False, 'hub_always_push': False, 'gradient_checkpointing': False, 'gradient_checkpointing_kwargs': None, 'include_inputs_for_metrics': False, 'eval_do_concat_batches': True, 'fp16_backend': 'auto', 'evaluation_strategy': None, 'push_to_hub_model_id': None, 'push_to_hub_organization': None, 'push_to_hub_token': '<PUSH_TO_HUB_TOKEN>', 'mp_parameters': '', 'auto_find_batch_size': True, 'full_determinism': False, 'torchdynamo': None, 'ray_scope': 'last', 'ddp_timeout': 1800, 'torch_compile': False, 'torch_compile_backend': None, 'torch_compile_mode': None, 'dispatch_batches': None, 'split_batches': None, 'include_tokens_per_second': False, 'include_num_input_tokens_seen': False, 'neftune_noise_alpha': None, 'optim_target_modules': None, 'batch_eval_metrics': False}
2024-06-03 22:00:27,455 INFO MainThread:34 [jupyter.py:save_ipynb():373] not saving jupyter notebook
2024-06-03 22:00:27,455 INFO MainThread:34 [wandb_init.py:_pause_backend():431] pausing backend
2024-06-03 22:01:48,661 INFO MainThread:34 [wandb_init.py:_resume_backend():436] resuming backend
|