File size: 44,026 Bytes
39b6e99
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
df235e1
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
2024-06-03 17:54:49,033 INFO    MainThread:34 [wandb_setup.py:_flush():76] Current SDK version is 0.17.0
2024-06-03 17:54:49,033 INFO    MainThread:34 [wandb_setup.py:_flush():76] Configure stats pid to 34
2024-06-03 17:54:49,033 INFO    MainThread:34 [wandb_setup.py:_flush():76] Loading settings from /root/.config/wandb/settings
2024-06-03 17:54:49,033 INFO    MainThread:34 [wandb_setup.py:_flush():76] Loading settings from /kaggle/working/wandb/settings
2024-06-03 17:54:49,033 INFO    MainThread:34 [wandb_setup.py:_flush():76] Loading settings from environment variables: {}
2024-06-03 17:54:49,033 INFO    MainThread:34 [wandb_setup.py:_flush():76] Applying setup settings: {'_disable_service': False}
2024-06-03 17:54:49,033 INFO    MainThread:34 [wandb_setup.py:_flush():76] Inferring run settings from compute environment: {'program': '<python with no main file>'}
2024-06-03 17:54:49,033 INFO    MainThread:34 [wandb_setup.py:_flush():76] Applying login settings: {}
2024-06-03 17:54:49,033 INFO    MainThread:34 [wandb_setup.py:_flush():76] Applying login settings: {'api_key': '***REDACTED***'}
2024-06-03 17:54:49,033 INFO    MainThread:34 [wandb_init.py:_log_setup():520] Logging user logs to /kaggle/working/wandb/run-20240603_175449-d191dh7n/logs/debug.log
2024-06-03 17:54:49,034 INFO    MainThread:34 [wandb_init.py:_log_setup():521] Logging internal logs to /kaggle/working/wandb/run-20240603_175449-d191dh7n/logs/debug-internal.log
2024-06-03 17:54:49,034 INFO    MainThread:34 [wandb_init.py:_jupyter_setup():466] configuring jupyter hooks <wandb.sdk.wandb_init._WandbInit object at 0x78eae9ee9ab0>
2024-06-03 17:54:49,034 INFO    MainThread:34 [wandb_init.py:init():560] calling init triggers
2024-06-03 17:54:49,034 INFO    MainThread:34 [wandb_init.py:init():567] wandb.init called with sweep_config: {}
config: {}
2024-06-03 17:54:49,034 INFO    MainThread:34 [wandb_init.py:init():610] starting backend
2024-06-03 17:54:49,034 INFO    MainThread:34 [wandb_init.py:init():614] setting up manager
2024-06-03 17:54:49,036 INFO    MainThread:34 [backend.py:_multiprocessing_setup():105] multiprocessing start_methods=fork,spawn,forkserver, using: spawn
2024-06-03 17:54:49,038 INFO    MainThread:34 [wandb_init.py:init():622] backend started and connected
2024-06-03 17:54:49,052 INFO    MainThread:34 [wandb_run.py:_label_probe_notebook():1328] probe notebook
2024-06-03 17:54:49,382 INFO    MainThread:34 [wandb_init.py:init():711] updated telemetry
2024-06-03 17:54:49,386 INFO    MainThread:34 [wandb_init.py:init():744] communicating run to backend with 90.0 second timeout
2024-06-03 17:54:49,688 INFO    MainThread:34 [wandb_run.py:_on_init():2396] communicating current version
2024-06-03 17:54:49,771 INFO    MainThread:34 [wandb_run.py:_on_init():2405] got version response 
2024-06-03 17:54:49,772 INFO    MainThread:34 [wandb_init.py:init():795] starting run threads in backend
2024-06-03 17:55:06,077 INFO    MainThread:34 [wandb_run.py:_console_start():2374] atexit reg
2024-06-03 17:55:06,077 INFO    MainThread:34 [wandb_run.py:_redirect():2229] redirect: wrap_raw
2024-06-03 17:55:06,078 INFO    MainThread:34 [wandb_run.py:_redirect():2294] Wrapping output streams.
2024-06-03 17:55:06,078 INFO    MainThread:34 [wandb_run.py:_redirect():2319] Redirects installed.
2024-06-03 17:55:06,081 INFO    MainThread:34 [wandb_init.py:init():838] run started, returning control to user process
2024-06-03 17:55:06,087 INFO    MainThread:34 [wandb_run.py:_config_callback():1376] config_cb None None {'vocab_size': 65024, 'hidden_size': 4544, 'num_hidden_layers': 32, 'num_attention_heads': 71, 'layer_norm_epsilon': 1e-05, 'initializer_range': 0.02, 'use_cache': False, 'hidden_dropout': 0.0, 'attention_dropout': 0.0, 'bos_token_id': 11, 'eos_token_id': 11, 'num_kv_heads': 71, 'alibi': False, 'new_decoder_architecture': False, 'multi_query': True, 'parallel_attn': True, 'bias': False, 'return_dict': True, 'output_hidden_states': False, 'output_attentions': False, 'torchscript': False, 'torch_dtype': 'bfloat16', 'use_bfloat16': False, 'tf_legacy_loss': False, 'pruned_heads': {}, 'tie_word_embeddings': True, 'chunk_size_feed_forward': 0, 'is_encoder_decoder': False, 'is_decoder': False, 'cross_attention_hidden_size': None, 'add_cross_attention': False, 'tie_encoder_decoder': False, 'max_length': 20, 'min_length': 0, 'do_sample': False, 'early_stopping': False, 'num_beams': 1, 'num_beam_groups': 1, 'diversity_penalty': 0.0, 'temperature': 1.0, 'top_k': 50, 'top_p': 1.0, 'typical_p': 1.0, 'repetition_penalty': 1.0, 'length_penalty': 1.0, 'no_repeat_ngram_size': 0, 'encoder_no_repeat_ngram_size': 0, 'bad_words_ids': None, 'num_return_sequences': 1, 'output_scores': False, 'return_dict_in_generate': False, 'forced_bos_token_id': None, 'forced_eos_token_id': None, 'remove_invalid_values': False, 'exponential_decay_length_penalty': None, 'suppress_tokens': None, 'begin_suppress_tokens': None, 'architectures': ['FalconForCausalLM'], 'finetuning_task': None, 'id2label': {0: 'LABEL_0', 1: 'LABEL_1'}, 'label2id': {'LABEL_0': 0, 'LABEL_1': 1}, 'tokenizer_class': None, 'prefix': None, 'pad_token_id': None, 'sep_token_id': None, 'decoder_start_token_id': None, 'task_specific_params': None, 'problem_type': None, '_name_or_path': 'tiiuae/falcon-7b', 'transformers_version': '4.41.1', 'apply_residual_connection_post_layernorm': False, 'auto_map': {'AutoConfig': 'tiiuae/falcon-7b--configuration_falcon.FalconConfig', 'AutoModel': 'tiiuae/falcon-7b--modeling_falcon.FalconModel', 'AutoModelForSequenceClassification': 'tiiuae/falcon-7b--modeling_falcon.FalconForSequenceClassification', 'AutoModelForTokenClassification': 'tiiuae/falcon-7b--modeling_falcon.FalconForTokenClassification', 'AutoModelForQuestionAnswering': 'tiiuae/falcon-7b--modeling_falcon.FalconForQuestionAnswering', 'AutoModelForCausalLM': 'tiiuae/falcon-7b--modeling_falcon.FalconForCausalLM'}, 'model_type': 'falcon', 'quantization_config': {'quant_method': 'QuantizationMethod.BITS_AND_BYTES', '_load_in_8bit': False, '_load_in_4bit': True, 'llm_int8_threshold': 6.0, 'llm_int8_skip_modules': None, 'llm_int8_enable_fp32_cpu_offload': False, 'llm_int8_has_fp16_weight': False, 'bnb_4bit_quant_type': 'nf4', 'bnb_4bit_use_double_quant': False, 'bnb_4bit_compute_dtype': 'bfloat16', 'bnb_4bit_quant_storage': 'uint8', 'load_in_4bit': True, 'load_in_8bit': False}, 'output_dir': '/kaggle/working/', 'overwrite_output_dir': False, 'do_train': False, 'do_eval': False, 'do_predict': False, 'eval_strategy': 'no', 'prediction_loss_only': False, 'per_device_train_batch_size': 8, 'per_device_eval_batch_size': 8, 'per_gpu_train_batch_size': None, 'per_gpu_eval_batch_size': None, 'gradient_accumulation_steps': 1, 'eval_accumulation_steps': None, 'eval_delay': 0, 'learning_rate': 0.0002, 'weight_decay': 0.0, 'adam_beta1': 0.9, 'adam_beta2': 0.999, 'adam_epsilon': 1e-08, 'max_grad_norm': 1.0, 'num_train_epochs': 4, 'max_steps': -1, 'lr_scheduler_type': 'linear', 'lr_scheduler_kwargs': {}, 'warmup_ratio': 0.0, 'warmup_steps': 0, 'log_level': 'passive', 'log_level_replica': 'warning', 'log_on_each_node': True, 'logging_dir': '/kaggle/working/runs/Jun03_17-40-11_f28ebe0d2526', 'logging_strategy': 'steps', 'logging_first_step': False, 'logging_steps': 10, 'logging_nan_inf_filter': True, 'save_strategy': 'epoch', 'save_steps': 500, 'save_total_limit': 4, 'save_safetensors': True, 'save_on_each_node': False, 'save_only_model': False, 'restore_callback_states_from_checkpoint': False, 'no_cuda': False, 'use_cpu': False, 'use_mps_device': False, 'seed': 42, 'data_seed': None, 'jit_mode_eval': False, 'use_ipex': False, 'bf16': False, 'fp16': True, 'fp16_opt_level': 'O1', 'half_precision_backend': 'auto', 'bf16_full_eval': False, 'fp16_full_eval': False, 'tf32': None, 'local_rank': 0, 'ddp_backend': None, 'tpu_num_cores': None, 'tpu_metrics_debug': False, 'debug': [], 'dataloader_drop_last': False, 'eval_steps': None, 'dataloader_num_workers': 0, 'dataloader_prefetch_factor': None, 'past_index': -1, 'run_name': '/kaggle/working/', 'disable_tqdm': False, 'remove_unused_columns': True, 'label_names': None, 'load_best_model_at_end': False, 'metric_for_best_model': None, 'greater_is_better': None, 'ignore_data_skip': False, 'fsdp': [], 'fsdp_min_num_params': 0, 'fsdp_config': {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, 'fsdp_transformer_layer_cls_to_wrap': None, 'accelerator_config': {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}, 'deepspeed': None, 'label_smoothing_factor': 0.0, 'optim': 'adamw_torch', 'optim_args': None, 'adafactor': False, 'group_by_length': False, 'length_column_name': 'length', 'report_to': ['tensorboard', 'wandb'], 'ddp_find_unused_parameters': None, 'ddp_bucket_cap_mb': None, 'ddp_broadcast_buffers': None, 'dataloader_pin_memory': True, 'dataloader_persistent_workers': False, 'skip_memory_metrics': True, 'use_legacy_prediction_loop': False, 'push_to_hub': True, 'resume_from_checkpoint': None, 'hub_model_id': 'othmanfa/fsttModel', 'hub_strategy': 'every_save', 'hub_token': '<HUB_TOKEN>', 'hub_private_repo': False, 'hub_always_push': False, 'gradient_checkpointing': False, 'gradient_checkpointing_kwargs': None, 'include_inputs_for_metrics': False, 'eval_do_concat_batches': True, 'fp16_backend': 'auto', 'evaluation_strategy': None, 'push_to_hub_model_id': None, 'push_to_hub_organization': None, 'push_to_hub_token': '<PUSH_TO_HUB_TOKEN>', 'mp_parameters': '', 'auto_find_batch_size': True, 'full_determinism': False, 'torchdynamo': None, 'ray_scope': 'last', 'ddp_timeout': 1800, 'torch_compile': False, 'torch_compile_backend': None, 'torch_compile_mode': None, 'dispatch_batches': None, 'split_batches': None, 'include_tokens_per_second': False, 'include_num_input_tokens_seen': False, 'neftune_noise_alpha': None, 'optim_target_modules': None, 'batch_eval_metrics': False}
2024-06-03 17:55:07,353 INFO    MainThread:34 [jupyter.py:save_ipynb():373] not saving jupyter notebook
2024-06-03 17:55:07,353 INFO    MainThread:34 [wandb_init.py:_pause_backend():431] pausing backend
2024-06-03 17:56:56,275 INFO    MainThread:34 [wandb_init.py:_resume_backend():436] resuming backend
2024-06-03 17:56:56,290 INFO    MainThread:34 [jupyter.py:save_ipynb():373] not saving jupyter notebook
2024-06-03 17:56:56,290 INFO    MainThread:34 [wandb_init.py:_pause_backend():431] pausing backend
2024-06-03 17:56:59,514 INFO    MainThread:34 [wandb_init.py:_resume_backend():436] resuming backend
2024-06-03 17:56:59,595 INFO    MainThread:34 [jupyter.py:save_ipynb():373] not saving jupyter notebook
2024-06-03 17:56:59,595 INFO    MainThread:34 [wandb_init.py:_pause_backend():431] pausing backend
2024-06-03 17:57:06,214 INFO    MainThread:34 [wandb_init.py:_resume_backend():436] resuming backend
2024-06-03 17:57:06,261 INFO    MainThread:34 [jupyter.py:save_ipynb():373] not saving jupyter notebook
2024-06-03 17:57:06,261 INFO    MainThread:34 [wandb_init.py:_pause_backend():431] pausing backend
2024-06-03 18:01:57,364 INFO    MainThread:34 [wandb_init.py:_resume_backend():436] resuming backend
2024-06-03 18:01:57,366 INFO    MainThread:34 [jupyter.py:save_ipynb():373] not saving jupyter notebook
2024-06-03 18:01:57,366 INFO    MainThread:34 [wandb_init.py:_pause_backend():431] pausing backend
2024-06-03 18:02:16,908 INFO    MainThread:34 [wandb_init.py:_resume_backend():436] resuming backend
2024-06-03 18:02:16,951 INFO    MainThread:34 [jupyter.py:save_ipynb():373] not saving jupyter notebook
2024-06-03 18:02:16,952 INFO    MainThread:34 [wandb_init.py:_pause_backend():431] pausing backend
2024-06-03 18:02:46,250 INFO    MainThread:34 [wandb_init.py:_resume_backend():436] resuming backend
2024-06-03 18:02:46,252 INFO    MainThread:34 [jupyter.py:save_ipynb():373] not saving jupyter notebook
2024-06-03 18:02:46,252 INFO    MainThread:34 [wandb_init.py:_pause_backend():431] pausing backend
2024-06-03 18:03:47,943 INFO    MainThread:34 [wandb_init.py:_resume_backend():436] resuming backend
2024-06-03 18:03:48,029 INFO    MainThread:34 [jupyter.py:save_ipynb():373] not saving jupyter notebook
2024-06-03 18:03:48,029 INFO    MainThread:34 [wandb_init.py:_pause_backend():431] pausing backend
2024-06-03 18:04:13,706 INFO    MainThread:34 [wandb_init.py:_resume_backend():436] resuming backend
2024-06-03 18:04:13,759 INFO    MainThread:34 [jupyter.py:save_ipynb():373] not saving jupyter notebook
2024-06-03 18:04:13,759 INFO    MainThread:34 [wandb_init.py:_pause_backend():431] pausing backend
2024-06-03 18:04:26,491 INFO    MainThread:34 [wandb_init.py:_resume_backend():436] resuming backend
2024-06-03 18:04:26,697 INFO    MainThread:34 [jupyter.py:save_ipynb():373] not saving jupyter notebook
2024-06-03 18:04:26,697 INFO    MainThread:34 [wandb_init.py:_pause_backend():431] pausing backend
2024-06-03 18:04:34,326 INFO    MainThread:34 [wandb_init.py:_resume_backend():436] resuming backend
2024-06-03 18:04:35,570 INFO    MainThread:34 [wandb_run.py:_config_callback():1376] config_cb None None {'vocab_size': 65024, 'hidden_size': 4544, 'num_hidden_layers': 32, 'num_attention_heads': 71, 'layer_norm_epsilon': 1e-05, 'initializer_range': 0.02, 'use_cache': False, 'hidden_dropout': 0.0, 'attention_dropout': 0.0, 'bos_token_id': 11, 'eos_token_id': 11, 'num_kv_heads': 71, 'alibi': False, 'new_decoder_architecture': False, 'multi_query': True, 'parallel_attn': True, 'bias': False, 'return_dict': True, 'output_hidden_states': False, 'output_attentions': False, 'torchscript': False, 'torch_dtype': 'bfloat16', 'use_bfloat16': False, 'tf_legacy_loss': False, 'pruned_heads': {}, 'tie_word_embeddings': True, 'chunk_size_feed_forward': 0, 'is_encoder_decoder': False, 'is_decoder': False, 'cross_attention_hidden_size': None, 'add_cross_attention': False, 'tie_encoder_decoder': False, 'max_length': 20, 'min_length': 0, 'do_sample': False, 'early_stopping': False, 'num_beams': 1, 'num_beam_groups': 1, 'diversity_penalty': 0.0, 'temperature': 1.0, 'top_k': 50, 'top_p': 1.0, 'typical_p': 1.0, 'repetition_penalty': 1.0, 'length_penalty': 1.0, 'no_repeat_ngram_size': 0, 'encoder_no_repeat_ngram_size': 0, 'bad_words_ids': None, 'num_return_sequences': 1, 'output_scores': False, 'return_dict_in_generate': False, 'forced_bos_token_id': None, 'forced_eos_token_id': None, 'remove_invalid_values': False, 'exponential_decay_length_penalty': None, 'suppress_tokens': None, 'begin_suppress_tokens': None, 'architectures': ['FalconForCausalLM'], 'finetuning_task': None, 'id2label': {0: 'LABEL_0', 1: 'LABEL_1'}, 'label2id': {'LABEL_0': 0, 'LABEL_1': 1}, 'tokenizer_class': None, 'prefix': None, 'pad_token_id': None, 'sep_token_id': None, 'decoder_start_token_id': None, 'task_specific_params': None, 'problem_type': None, '_name_or_path': 'tiiuae/falcon-7b', 'transformers_version': '4.41.1', 'apply_residual_connection_post_layernorm': False, 'auto_map': {'AutoConfig': 'tiiuae/falcon-7b--configuration_falcon.FalconConfig', 'AutoModel': 'tiiuae/falcon-7b--modeling_falcon.FalconModel', 'AutoModelForSequenceClassification': 'tiiuae/falcon-7b--modeling_falcon.FalconForSequenceClassification', 'AutoModelForTokenClassification': 'tiiuae/falcon-7b--modeling_falcon.FalconForTokenClassification', 'AutoModelForQuestionAnswering': 'tiiuae/falcon-7b--modeling_falcon.FalconForQuestionAnswering', 'AutoModelForCausalLM': 'tiiuae/falcon-7b--modeling_falcon.FalconForCausalLM'}, 'model_type': 'falcon', 'quantization_config': {'quant_method': 'QuantizationMethod.BITS_AND_BYTES', '_load_in_8bit': False, '_load_in_4bit': True, 'llm_int8_threshold': 6.0, 'llm_int8_skip_modules': None, 'llm_int8_enable_fp32_cpu_offload': False, 'llm_int8_has_fp16_weight': False, 'bnb_4bit_quant_type': 'nf4', 'bnb_4bit_use_double_quant': False, 'bnb_4bit_compute_dtype': 'bfloat16', 'bnb_4bit_quant_storage': 'uint8', 'load_in_4bit': True, 'load_in_8bit': False}, 'output_dir': '/kaggle/working/', 'overwrite_output_dir': False, 'do_train': False, 'do_eval': False, 'do_predict': False, 'eval_strategy': 'no', 'prediction_loss_only': False, 'per_device_train_batch_size': 8, 'per_device_eval_batch_size': 8, 'per_gpu_train_batch_size': None, 'per_gpu_eval_batch_size': None, 'gradient_accumulation_steps': 1, 'eval_accumulation_steps': None, 'eval_delay': 0, 'learning_rate': 0.0002, 'weight_decay': 0.0, 'adam_beta1': 0.9, 'adam_beta2': 0.999, 'adam_epsilon': 1e-08, 'max_grad_norm': 1.0, 'num_train_epochs': 4, 'max_steps': -1, 'lr_scheduler_type': 'linear', 'lr_scheduler_kwargs': {}, 'warmup_ratio': 0.0, 'warmup_steps': 0, 'log_level': 'passive', 'log_level_replica': 'warning', 'log_on_each_node': True, 'logging_dir': '/kaggle/working/runs/Jun03_17-40-11_f28ebe0d2526', 'logging_strategy': 'steps', 'logging_first_step': False, 'logging_steps': 10, 'logging_nan_inf_filter': True, 'save_strategy': 'epoch', 'save_steps': 500, 'save_total_limit': 4, 'save_safetensors': True, 'save_on_each_node': False, 'save_only_model': False, 'restore_callback_states_from_checkpoint': False, 'no_cuda': False, 'use_cpu': False, 'use_mps_device': False, 'seed': 42, 'data_seed': None, 'jit_mode_eval': False, 'use_ipex': False, 'bf16': False, 'fp16': True, 'fp16_opt_level': 'O1', 'half_precision_backend': 'auto', 'bf16_full_eval': False, 'fp16_full_eval': False, 'tf32': None, 'local_rank': 0, 'ddp_backend': None, 'tpu_num_cores': None, 'tpu_metrics_debug': False, 'debug': [], 'dataloader_drop_last': False, 'eval_steps': None, 'dataloader_num_workers': 0, 'dataloader_prefetch_factor': None, 'past_index': -1, 'run_name': '/kaggle/working/', 'disable_tqdm': False, 'remove_unused_columns': True, 'label_names': None, 'load_best_model_at_end': False, 'metric_for_best_model': None, 'greater_is_better': None, 'ignore_data_skip': False, 'fsdp': [], 'fsdp_min_num_params': 0, 'fsdp_config': {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, 'fsdp_transformer_layer_cls_to_wrap': None, 'accelerator_config': {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}, 'deepspeed': None, 'label_smoothing_factor': 0.0, 'optim': 'adamw_torch', 'optim_args': None, 'adafactor': False, 'group_by_length': False, 'length_column_name': 'length', 'report_to': ['tensorboard', 'wandb'], 'ddp_find_unused_parameters': None, 'ddp_bucket_cap_mb': None, 'ddp_broadcast_buffers': None, 'dataloader_pin_memory': True, 'dataloader_persistent_workers': False, 'skip_memory_metrics': True, 'use_legacy_prediction_loop': False, 'push_to_hub': True, 'resume_from_checkpoint': None, 'hub_model_id': 'othmanfa/fsttModel', 'hub_strategy': 'every_save', 'hub_token': '<HUB_TOKEN>', 'hub_private_repo': False, 'hub_always_push': False, 'gradient_checkpointing': False, 'gradient_checkpointing_kwargs': None, 'include_inputs_for_metrics': False, 'eval_do_concat_batches': True, 'fp16_backend': 'auto', 'evaluation_strategy': None, 'push_to_hub_model_id': None, 'push_to_hub_organization': None, 'push_to_hub_token': '<PUSH_TO_HUB_TOKEN>', 'mp_parameters': '', 'auto_find_batch_size': True, 'full_determinism': False, 'torchdynamo': None, 'ray_scope': 'last', 'ddp_timeout': 1800, 'torch_compile': False, 'torch_compile_backend': None, 'torch_compile_mode': None, 'dispatch_batches': None, 'split_batches': None, 'include_tokens_per_second': False, 'include_num_input_tokens_seen': False, 'neftune_noise_alpha': None, 'optim_target_modules': None, 'batch_eval_metrics': False}
2024-06-03 18:18:50,784 INFO    MainThread:34 [jupyter.py:save_ipynb():373] not saving jupyter notebook
2024-06-03 18:18:50,784 INFO    MainThread:34 [wandb_init.py:_pause_backend():431] pausing backend
2024-06-03 18:41:05,951 INFO    MainThread:34 [wandb_init.py:_resume_backend():436] resuming backend
2024-06-03 18:41:05,953 INFO    MainThread:34 [jupyter.py:save_ipynb():373] not saving jupyter notebook
2024-06-03 18:41:05,953 INFO    MainThread:34 [wandb_init.py:_pause_backend():431] pausing backend
2024-06-03 18:45:28,892 INFO    MainThread:34 [wandb_init.py:_resume_backend():436] resuming backend
2024-06-03 18:45:28,927 INFO    MainThread:34 [jupyter.py:save_ipynb():373] not saving jupyter notebook
2024-06-03 18:45:28,927 INFO    MainThread:34 [wandb_init.py:_pause_backend():431] pausing backend
2024-06-03 18:45:30,228 INFO    MainThread:34 [wandb_init.py:_resume_backend():436] resuming backend
2024-06-03 18:45:30,229 INFO    MainThread:34 [jupyter.py:save_ipynb():373] not saving jupyter notebook
2024-06-03 18:45:30,229 INFO    MainThread:34 [wandb_init.py:_pause_backend():431] pausing backend
2024-06-03 18:45:31,254 INFO    MainThread:34 [wandb_init.py:_resume_backend():436] resuming backend
2024-06-03 18:45:31,276 INFO    MainThread:34 [jupyter.py:save_ipynb():373] not saving jupyter notebook
2024-06-03 18:45:31,276 INFO    MainThread:34 [wandb_init.py:_pause_backend():431] pausing backend
2024-06-03 18:45:33,122 INFO    MainThread:34 [wandb_init.py:_resume_backend():436] resuming backend
2024-06-03 18:45:33,358 INFO    MainThread:34 [jupyter.py:save_ipynb():373] not saving jupyter notebook
2024-06-03 18:45:33,358 INFO    MainThread:34 [wandb_init.py:_pause_backend():431] pausing backend
2024-06-03 18:45:36,415 INFO    MainThread:34 [wandb_init.py:_resume_backend():436] resuming backend
2024-06-03 18:45:37,683 INFO    MainThread:34 [wandb_run.py:_config_callback():1376] config_cb None None {'vocab_size': 65024, 'hidden_size': 4544, 'num_hidden_layers': 32, 'num_attention_heads': 71, 'layer_norm_epsilon': 1e-05, 'initializer_range': 0.02, 'use_cache': False, 'hidden_dropout': 0.0, 'attention_dropout': 0.0, 'bos_token_id': 11, 'eos_token_id': 11, 'num_kv_heads': 71, 'alibi': False, 'new_decoder_architecture': False, 'multi_query': True, 'parallel_attn': True, 'bias': False, 'return_dict': True, 'output_hidden_states': False, 'output_attentions': False, 'torchscript': False, 'torch_dtype': 'bfloat16', 'use_bfloat16': False, 'tf_legacy_loss': False, 'pruned_heads': {}, 'tie_word_embeddings': True, 'chunk_size_feed_forward': 0, 'is_encoder_decoder': False, 'is_decoder': False, 'cross_attention_hidden_size': None, 'add_cross_attention': False, 'tie_encoder_decoder': False, 'max_length': 20, 'min_length': 0, 'do_sample': False, 'early_stopping': False, 'num_beams': 1, 'num_beam_groups': 1, 'diversity_penalty': 0.0, 'temperature': 1.0, 'top_k': 50, 'top_p': 1.0, 'typical_p': 1.0, 'repetition_penalty': 1.0, 'length_penalty': 1.0, 'no_repeat_ngram_size': 0, 'encoder_no_repeat_ngram_size': 0, 'bad_words_ids': None, 'num_return_sequences': 1, 'output_scores': False, 'return_dict_in_generate': False, 'forced_bos_token_id': None, 'forced_eos_token_id': None, 'remove_invalid_values': False, 'exponential_decay_length_penalty': None, 'suppress_tokens': None, 'begin_suppress_tokens': None, 'architectures': ['FalconForCausalLM'], 'finetuning_task': None, 'id2label': {0: 'LABEL_0', 1: 'LABEL_1'}, 'label2id': {'LABEL_0': 0, 'LABEL_1': 1}, 'tokenizer_class': None, 'prefix': None, 'pad_token_id': None, 'sep_token_id': None, 'decoder_start_token_id': None, 'task_specific_params': None, 'problem_type': None, '_name_or_path': 'tiiuae/falcon-7b', 'transformers_version': '4.41.1', 'apply_residual_connection_post_layernorm': False, 'auto_map': {'AutoConfig': 'tiiuae/falcon-7b--configuration_falcon.FalconConfig', 'AutoModel': 'tiiuae/falcon-7b--modeling_falcon.FalconModel', 'AutoModelForSequenceClassification': 'tiiuae/falcon-7b--modeling_falcon.FalconForSequenceClassification', 'AutoModelForTokenClassification': 'tiiuae/falcon-7b--modeling_falcon.FalconForTokenClassification', 'AutoModelForQuestionAnswering': 'tiiuae/falcon-7b--modeling_falcon.FalconForQuestionAnswering', 'AutoModelForCausalLM': 'tiiuae/falcon-7b--modeling_falcon.FalconForCausalLM'}, 'model_type': 'falcon', 'quantization_config': {'quant_method': 'QuantizationMethod.BITS_AND_BYTES', '_load_in_8bit': False, '_load_in_4bit': True, 'llm_int8_threshold': 6.0, 'llm_int8_skip_modules': None, 'llm_int8_enable_fp32_cpu_offload': False, 'llm_int8_has_fp16_weight': False, 'bnb_4bit_quant_type': 'nf4', 'bnb_4bit_use_double_quant': False, 'bnb_4bit_compute_dtype': 'bfloat16', 'bnb_4bit_quant_storage': 'uint8', 'load_in_4bit': True, 'load_in_8bit': False}, 'output_dir': '/kaggle/working/', 'overwrite_output_dir': False, 'do_train': False, 'do_eval': False, 'do_predict': False, 'eval_strategy': 'no', 'prediction_loss_only': False, 'per_device_train_batch_size': 8, 'per_device_eval_batch_size': 8, 'per_gpu_train_batch_size': None, 'per_gpu_eval_batch_size': None, 'gradient_accumulation_steps': 1, 'eval_accumulation_steps': None, 'eval_delay': 0, 'learning_rate': 0.0002, 'weight_decay': 0.0, 'adam_beta1': 0.9, 'adam_beta2': 0.999, 'adam_epsilon': 1e-08, 'max_grad_norm': 1.0, 'num_train_epochs': 20, 'max_steps': -1, 'lr_scheduler_type': 'linear', 'lr_scheduler_kwargs': {}, 'warmup_ratio': 0.0, 'warmup_steps': 0, 'log_level': 'passive', 'log_level_replica': 'warning', 'log_on_each_node': True, 'logging_dir': '/kaggle/working/runs/Jun03_18-45-28_f28ebe0d2526', 'logging_strategy': 'steps', 'logging_first_step': False, 'logging_steps': 10, 'logging_nan_inf_filter': True, 'save_strategy': 'epoch', 'save_steps': 500, 'save_total_limit': 4, 'save_safetensors': True, 'save_on_each_node': False, 'save_only_model': False, 'restore_callback_states_from_checkpoint': False, 'no_cuda': False, 'use_cpu': False, 'use_mps_device': False, 'seed': 42, 'data_seed': None, 'jit_mode_eval': False, 'use_ipex': False, 'bf16': False, 'fp16': True, 'fp16_opt_level': 'O1', 'half_precision_backend': 'auto', 'bf16_full_eval': False, 'fp16_full_eval': False, 'tf32': None, 'local_rank': 0, 'ddp_backend': None, 'tpu_num_cores': None, 'tpu_metrics_debug': False, 'debug': [], 'dataloader_drop_last': False, 'eval_steps': None, 'dataloader_num_workers': 0, 'dataloader_prefetch_factor': None, 'past_index': -1, 'run_name': '/kaggle/working/', 'disable_tqdm': False, 'remove_unused_columns': True, 'label_names': None, 'load_best_model_at_end': False, 'metric_for_best_model': None, 'greater_is_better': None, 'ignore_data_skip': False, 'fsdp': [], 'fsdp_min_num_params': 0, 'fsdp_config': {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, 'fsdp_transformer_layer_cls_to_wrap': None, 'accelerator_config': {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}, 'deepspeed': None, 'label_smoothing_factor': 0.0, 'optim': 'adamw_torch', 'optim_args': None, 'adafactor': False, 'group_by_length': False, 'length_column_name': 'length', 'report_to': ['tensorboard', 'wandb'], 'ddp_find_unused_parameters': None, 'ddp_bucket_cap_mb': None, 'ddp_broadcast_buffers': None, 'dataloader_pin_memory': True, 'dataloader_persistent_workers': False, 'skip_memory_metrics': True, 'use_legacy_prediction_loop': False, 'push_to_hub': True, 'resume_from_checkpoint': None, 'hub_model_id': 'othmanfa/fsttModel', 'hub_strategy': 'every_save', 'hub_token': '<HUB_TOKEN>', 'hub_private_repo': False, 'hub_always_push': False, 'gradient_checkpointing': False, 'gradient_checkpointing_kwargs': None, 'include_inputs_for_metrics': False, 'eval_do_concat_batches': True, 'fp16_backend': 'auto', 'evaluation_strategy': None, 'push_to_hub_model_id': None, 'push_to_hub_organization': None, 'push_to_hub_token': '<PUSH_TO_HUB_TOKEN>', 'mp_parameters': '', 'auto_find_batch_size': True, 'full_determinism': False, 'torchdynamo': None, 'ray_scope': 'last', 'ddp_timeout': 1800, 'torch_compile': False, 'torch_compile_backend': None, 'torch_compile_mode': None, 'dispatch_batches': None, 'split_batches': None, 'include_tokens_per_second': False, 'include_num_input_tokens_seen': False, 'neftune_noise_alpha': None, 'optim_target_modules': None, 'batch_eval_metrics': False}
2024-06-03 19:55:43,601 INFO    MainThread:34 [jupyter.py:save_ipynb():373] not saving jupyter notebook
2024-06-03 19:55:43,602 INFO    MainThread:34 [wandb_init.py:_pause_backend():431] pausing backend
2024-06-03 19:56:53,516 INFO    MainThread:34 [wandb_init.py:_resume_backend():436] resuming backend
2024-06-03 19:56:55,309 INFO    MainThread:34 [jupyter.py:save_ipynb():373] not saving jupyter notebook
2024-06-03 19:56:55,309 INFO    MainThread:34 [wandb_init.py:_pause_backend():431] pausing backend
2024-06-03 20:02:21,391 INFO    MainThread:34 [wandb_init.py:_resume_backend():436] resuming backend
2024-06-03 20:02:22,164 INFO    MainThread:34 [jupyter.py:save_ipynb():373] not saving jupyter notebook
2024-06-03 20:02:22,164 INFO    MainThread:34 [wandb_init.py:_pause_backend():431] pausing backend
2024-06-03 20:03:12,802 INFO    MainThread:34 [wandb_init.py:_resume_backend():436] resuming backend
2024-06-03 20:03:12,827 INFO    MainThread:34 [jupyter.py:save_ipynb():373] not saving jupyter notebook
2024-06-03 20:03:12,827 INFO    MainThread:34 [wandb_init.py:_pause_backend():431] pausing backend
2024-06-03 20:03:22,908 INFO    MainThread:34 [wandb_init.py:_resume_backend():436] resuming backend
2024-06-03 20:03:23,545 INFO    MainThread:34 [jupyter.py:save_ipynb():373] not saving jupyter notebook
2024-06-03 20:03:23,546 INFO    MainThread:34 [wandb_init.py:_pause_backend():431] pausing backend
2024-06-03 20:04:16,404 INFO    MainThread:34 [wandb_init.py:_resume_backend():436] resuming backend
2024-06-03 20:04:16,447 INFO    MainThread:34 [jupyter.py:save_ipynb():373] not saving jupyter notebook
2024-06-03 20:04:16,447 INFO    MainThread:34 [wandb_init.py:_pause_backend():431] pausing backend
2024-06-03 20:04:32,978 INFO    MainThread:34 [wandb_init.py:_resume_backend():436] resuming backend
2024-06-03 20:04:33,028 INFO    MainThread:34 [jupyter.py:save_ipynb():373] not saving jupyter notebook
2024-06-03 20:04:33,028 INFO    MainThread:34 [wandb_init.py:_pause_backend():431] pausing backend
2024-06-03 20:05:18,072 INFO    MainThread:34 [wandb_init.py:_resume_backend():436] resuming backend
2024-06-03 20:05:18,118 INFO    MainThread:34 [jupyter.py:save_ipynb():373] not saving jupyter notebook
2024-06-03 20:05:18,118 INFO    MainThread:34 [wandb_init.py:_pause_backend():431] pausing backend
2024-06-03 20:05:31,531 INFO    MainThread:34 [wandb_init.py:_resume_backend():436] resuming backend
2024-06-03 20:05:31,580 INFO    MainThread:34 [jupyter.py:save_ipynb():373] not saving jupyter notebook
2024-06-03 20:05:31,580 INFO    MainThread:34 [wandb_init.py:_pause_backend():431] pausing backend
2024-06-03 20:05:44,101 INFO    MainThread:34 [wandb_init.py:_resume_backend():436] resuming backend
2024-06-03 20:05:44,780 INFO    MainThread:34 [jupyter.py:save_ipynb():373] not saving jupyter notebook
2024-06-03 20:05:44,780 INFO    MainThread:34 [wandb_init.py:_pause_backend():431] pausing backend
2024-06-03 20:06:37,084 INFO    MainThread:34 [wandb_init.py:_resume_backend():436] resuming backend
2024-06-03 20:06:37,830 INFO    MainThread:34 [jupyter.py:save_ipynb():373] not saving jupyter notebook
2024-06-03 20:06:37,830 INFO    MainThread:34 [wandb_init.py:_pause_backend():431] pausing backend
2024-06-03 20:08:59,975 INFO    MainThread:34 [wandb_init.py:_resume_backend():436] resuming backend
2024-06-03 20:09:00,010 INFO    MainThread:34 [jupyter.py:save_ipynb():373] not saving jupyter notebook
2024-06-03 20:09:00,010 INFO    MainThread:34 [wandb_init.py:_pause_backend():431] pausing backend
2024-06-03 20:09:06,499 INFO    MainThread:34 [wandb_init.py:_resume_backend():436] resuming backend
2024-06-03 20:09:06,500 INFO    MainThread:34 [jupyter.py:save_ipynb():373] not saving jupyter notebook
2024-06-03 20:09:06,500 INFO    MainThread:34 [wandb_init.py:_pause_backend():431] pausing backend
2024-06-03 20:09:07,197 INFO    MainThread:34 [wandb_init.py:_resume_backend():436] resuming backend
2024-06-03 20:09:07,218 INFO    MainThread:34 [jupyter.py:save_ipynb():373] not saving jupyter notebook
2024-06-03 20:09:07,218 INFO    MainThread:34 [wandb_init.py:_pause_backend():431] pausing backend
2024-06-03 20:09:18,369 INFO    MainThread:34 [wandb_init.py:_resume_backend():436] resuming backend
2024-06-03 20:09:19,119 INFO    MainThread:34 [jupyter.py:save_ipynb():373] not saving jupyter notebook
2024-06-03 20:09:19,120 INFO    MainThread:34 [wandb_init.py:_pause_backend():431] pausing backend
2024-06-03 20:11:52,561 INFO    MainThread:34 [wandb_init.py:_resume_backend():436] resuming backend
2024-06-03 20:11:54,589 INFO    MainThread:34 [jupyter.py:save_ipynb():373] not saving jupyter notebook
2024-06-03 20:11:54,589 INFO    MainThread:34 [wandb_init.py:_pause_backend():431] pausing backend
2024-06-03 20:16:50,594 INFO    MainThread:34 [wandb_init.py:_resume_backend():436] resuming backend
2024-06-03 20:19:04,529 INFO    MainThread:34 [jupyter.py:save_ipynb():373] not saving jupyter notebook
2024-06-03 20:19:04,530 INFO    MainThread:34 [wandb_init.py:_pause_backend():431] pausing backend
2024-06-03 20:20:33,194 INFO    MainThread:34 [wandb_init.py:_resume_backend():436] resuming backend
2024-06-03 20:20:33,197 INFO    MainThread:34 [jupyter.py:save_ipynb():373] not saving jupyter notebook
2024-06-03 20:20:33,197 INFO    MainThread:34 [wandb_init.py:_pause_backend():431] pausing backend
2024-06-03 20:20:53,790 INFO    MainThread:34 [wandb_init.py:_resume_backend():436] resuming backend
2024-06-03 20:24:20,236 INFO    MainThread:34 [jupyter.py:save_ipynb():373] not saving jupyter notebook
2024-06-03 20:24:20,236 INFO    MainThread:34 [wandb_init.py:_pause_backend():431] pausing backend
2024-06-03 20:32:45,840 INFO    MainThread:34 [wandb_init.py:_resume_backend():436] resuming backend
2024-06-03 20:32:45,841 INFO    MainThread:34 [jupyter.py:save_ipynb():373] not saving jupyter notebook
2024-06-03 20:32:45,841 INFO    MainThread:34 [wandb_init.py:_pause_backend():431] pausing backend
2024-06-03 20:34:19,718 INFO    MainThread:34 [wandb_init.py:_resume_backend():436] resuming backend
2024-06-03 20:34:19,722 INFO    MainThread:34 [jupyter.py:save_ipynb():373] not saving jupyter notebook
2024-06-03 20:34:19,722 INFO    MainThread:34 [wandb_init.py:_pause_backend():431] pausing backend
2024-06-03 20:34:21,601 INFO    MainThread:34 [wandb_init.py:_resume_backend():436] resuming backend
2024-06-03 20:34:21,602 INFO    MainThread:34 [jupyter.py:save_ipynb():373] not saving jupyter notebook
2024-06-03 20:34:21,602 INFO    MainThread:34 [wandb_init.py:_pause_backend():431] pausing backend
2024-06-03 20:34:23,187 INFO    MainThread:34 [wandb_init.py:_resume_backend():436] resuming backend
2024-06-03 20:37:48,397 INFO    MainThread:34 [jupyter.py:save_ipynb():373] not saving jupyter notebook
2024-06-03 20:37:48,397 INFO    MainThread:34 [wandb_init.py:_pause_backend():431] pausing backend
2024-06-03 20:38:33,502 INFO    MainThread:34 [wandb_init.py:_resume_backend():436] resuming backend
2024-06-03 20:41:58,862 INFO    MainThread:34 [jupyter.py:save_ipynb():373] not saving jupyter notebook
2024-06-03 20:41:58,862 INFO    MainThread:34 [wandb_init.py:_pause_backend():431] pausing backend
2024-06-03 20:43:51,168 INFO    MainThread:34 [wandb_init.py:_resume_backend():436] resuming backend
2024-06-03 20:43:51,171 INFO    MainThread:34 [jupyter.py:save_ipynb():373] not saving jupyter notebook
2024-06-03 20:43:51,171 INFO    MainThread:34 [wandb_init.py:_pause_backend():431] pausing backend
2024-06-03 20:43:53,895 INFO    MainThread:34 [wandb_init.py:_resume_backend():436] resuming backend
2024-06-03 20:47:25,895 INFO    MainThread:34 [jupyter.py:save_ipynb():373] not saving jupyter notebook
2024-06-03 20:47:25,895 INFO    MainThread:34 [wandb_init.py:_pause_backend():431] pausing backend
2024-06-03 20:50:07,262 INFO    MainThread:34 [wandb_init.py:_resume_backend():436] resuming backend
2024-06-03 20:50:07,303 INFO    MainThread:34 [jupyter.py:save_ipynb():373] not saving jupyter notebook
2024-06-03 20:50:07,303 INFO    MainThread:34 [wandb_init.py:_pause_backend():431] pausing backend
2024-06-03 20:50:09,915 INFO    MainThread:34 [wandb_init.py:_resume_backend():436] resuming backend
2024-06-03 20:50:09,916 INFO    MainThread:34 [jupyter.py:save_ipynb():373] not saving jupyter notebook
2024-06-03 20:50:09,917 INFO    MainThread:34 [wandb_init.py:_pause_backend():431] pausing backend
2024-06-03 20:50:10,463 INFO    MainThread:34 [wandb_init.py:_resume_backend():436] resuming backend
2024-06-03 20:50:10,484 INFO    MainThread:34 [jupyter.py:save_ipynb():373] not saving jupyter notebook
2024-06-03 20:50:10,484 INFO    MainThread:34 [wandb_init.py:_pause_backend():431] pausing backend
2024-06-03 20:50:13,975 INFO    MainThread:34 [wandb_init.py:_resume_backend():436] resuming backend
2024-06-03 20:50:14,119 INFO    MainThread:34 [jupyter.py:save_ipynb():373] not saving jupyter notebook
2024-06-03 20:50:14,119 INFO    MainThread:34 [wandb_init.py:_pause_backend():431] pausing backend
2024-06-03 20:50:15,412 INFO    MainThread:34 [wandb_init.py:_resume_backend():436] resuming backend
2024-06-03 20:50:16,872 INFO    MainThread:34 [wandb_run.py:_config_callback():1376] config_cb None None {'vocab_size': 65024, 'hidden_size': 4544, 'num_hidden_layers': 32, 'num_attention_heads': 71, 'layer_norm_epsilon': 1e-05, 'initializer_range': 0.02, 'use_cache': False, 'hidden_dropout': 0.0, 'attention_dropout': 0.0, 'bos_token_id': 11, 'eos_token_id': 11, 'num_kv_heads': 71, 'alibi': False, 'new_decoder_architecture': False, 'multi_query': True, 'parallel_attn': True, 'bias': False, 'return_dict': True, 'output_hidden_states': False, 'output_attentions': False, 'torchscript': False, 'torch_dtype': 'bfloat16', 'use_bfloat16': False, 'tf_legacy_loss': False, 'pruned_heads': {}, 'tie_word_embeddings': True, 'chunk_size_feed_forward': 0, 'is_encoder_decoder': False, 'is_decoder': False, 'cross_attention_hidden_size': None, 'add_cross_attention': False, 'tie_encoder_decoder': False, 'max_length': 20, 'min_length': 0, 'do_sample': False, 'early_stopping': False, 'num_beams': 1, 'num_beam_groups': 1, 'diversity_penalty': 0.0, 'temperature': 1.0, 'top_k': 50, 'top_p': 1.0, 'typical_p': 1.0, 'repetition_penalty': 1.0, 'length_penalty': 1.0, 'no_repeat_ngram_size': 0, 'encoder_no_repeat_ngram_size': 0, 'bad_words_ids': None, 'num_return_sequences': 1, 'output_scores': False, 'return_dict_in_generate': False, 'forced_bos_token_id': None, 'forced_eos_token_id': None, 'remove_invalid_values': False, 'exponential_decay_length_penalty': None, 'suppress_tokens': None, 'begin_suppress_tokens': None, 'architectures': ['FalconForCausalLM'], 'finetuning_task': None, 'id2label': {0: 'LABEL_0', 1: 'LABEL_1'}, 'label2id': {'LABEL_0': 0, 'LABEL_1': 1}, 'tokenizer_class': None, 'prefix': None, 'pad_token_id': None, 'sep_token_id': None, 'decoder_start_token_id': None, 'task_specific_params': None, 'problem_type': None, '_name_or_path': 'tiiuae/falcon-7b', 'transformers_version': '4.41.1', 'apply_residual_connection_post_layernorm': False, 'auto_map': {'AutoConfig': 'tiiuae/falcon-7b--configuration_falcon.FalconConfig', 'AutoModel': 'tiiuae/falcon-7b--modeling_falcon.FalconModel', 'AutoModelForSequenceClassification': 'tiiuae/falcon-7b--modeling_falcon.FalconForSequenceClassification', 'AutoModelForTokenClassification': 'tiiuae/falcon-7b--modeling_falcon.FalconForTokenClassification', 'AutoModelForQuestionAnswering': 'tiiuae/falcon-7b--modeling_falcon.FalconForQuestionAnswering', 'AutoModelForCausalLM': 'tiiuae/falcon-7b--modeling_falcon.FalconForCausalLM'}, 'model_type': 'falcon', 'quantization_config': {'quant_method': 'QuantizationMethod.BITS_AND_BYTES', '_load_in_8bit': False, '_load_in_4bit': True, 'llm_int8_threshold': 6.0, 'llm_int8_skip_modules': None, 'llm_int8_enable_fp32_cpu_offload': False, 'llm_int8_has_fp16_weight': False, 'bnb_4bit_quant_type': 'nf4', 'bnb_4bit_use_double_quant': False, 'bnb_4bit_compute_dtype': 'bfloat16', 'bnb_4bit_quant_storage': 'uint8', 'load_in_4bit': True, 'load_in_8bit': False}, 'output_dir': '/kaggle/working/', 'overwrite_output_dir': False, 'do_train': False, 'do_eval': False, 'do_predict': False, 'eval_strategy': 'no', 'prediction_loss_only': False, 'per_device_train_batch_size': 8, 'per_device_eval_batch_size': 8, 'per_gpu_train_batch_size': None, 'per_gpu_eval_batch_size': None, 'gradient_accumulation_steps': 1, 'eval_accumulation_steps': None, 'eval_delay': 0, 'learning_rate': 0.0002, 'weight_decay': 0.0, 'adam_beta1': 0.9, 'adam_beta2': 0.999, 'adam_epsilon': 1e-08, 'max_grad_norm': 1.0, 'num_train_epochs': 20, 'max_steps': -1, 'lr_scheduler_type': 'linear', 'lr_scheduler_kwargs': {}, 'warmup_ratio': 0.0, 'warmup_steps': 0, 'log_level': 'passive', 'log_level_replica': 'warning', 'log_on_each_node': True, 'logging_dir': '/kaggle/working/runs/Jun03_20-50-07_f28ebe0d2526', 'logging_strategy': 'steps', 'logging_first_step': False, 'logging_steps': 10, 'logging_nan_inf_filter': True, 'save_strategy': 'epoch', 'save_steps': 500, 'save_total_limit': 4, 'save_safetensors': True, 'save_on_each_node': False, 'save_only_model': False, 'restore_callback_states_from_checkpoint': False, 'no_cuda': False, 'use_cpu': False, 'use_mps_device': False, 'seed': 42, 'data_seed': None, 'jit_mode_eval': False, 'use_ipex': False, 'bf16': False, 'fp16': True, 'fp16_opt_level': 'O1', 'half_precision_backend': 'auto', 'bf16_full_eval': False, 'fp16_full_eval': False, 'tf32': None, 'local_rank': 0, 'ddp_backend': None, 'tpu_num_cores': None, 'tpu_metrics_debug': False, 'debug': [], 'dataloader_drop_last': False, 'eval_steps': None, 'dataloader_num_workers': 0, 'dataloader_prefetch_factor': None, 'past_index': -1, 'run_name': '/kaggle/working/', 'disable_tqdm': False, 'remove_unused_columns': True, 'label_names': None, 'load_best_model_at_end': False, 'metric_for_best_model': None, 'greater_is_better': None, 'ignore_data_skip': False, 'fsdp': [], 'fsdp_min_num_params': 0, 'fsdp_config': {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, 'fsdp_transformer_layer_cls_to_wrap': None, 'accelerator_config': {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}, 'deepspeed': None, 'label_smoothing_factor': 0.0, 'optim': 'adamw_torch', 'optim_args': None, 'adafactor': False, 'group_by_length': False, 'length_column_name': 'length', 'report_to': ['tensorboard', 'wandb'], 'ddp_find_unused_parameters': None, 'ddp_bucket_cap_mb': None, 'ddp_broadcast_buffers': None, 'dataloader_pin_memory': True, 'dataloader_persistent_workers': False, 'skip_memory_metrics': True, 'use_legacy_prediction_loop': False, 'push_to_hub': True, 'resume_from_checkpoint': None, 'hub_model_id': 'othmanfa/fsttModel', 'hub_strategy': 'every_save', 'hub_token': '<HUB_TOKEN>', 'hub_private_repo': False, 'hub_always_push': False, 'gradient_checkpointing': False, 'gradient_checkpointing_kwargs': None, 'include_inputs_for_metrics': False, 'eval_do_concat_batches': True, 'fp16_backend': 'auto', 'evaluation_strategy': None, 'push_to_hub_model_id': None, 'push_to_hub_organization': None, 'push_to_hub_token': '<PUSH_TO_HUB_TOKEN>', 'mp_parameters': '', 'auto_find_batch_size': True, 'full_determinism': False, 'torchdynamo': None, 'ray_scope': 'last', 'ddp_timeout': 1800, 'torch_compile': False, 'torch_compile_backend': None, 'torch_compile_mode': None, 'dispatch_batches': None, 'split_batches': None, 'include_tokens_per_second': False, 'include_num_input_tokens_seen': False, 'neftune_noise_alpha': None, 'optim_target_modules': None, 'batch_eval_metrics': False}
2024-06-03 22:00:27,455 INFO    MainThread:34 [jupyter.py:save_ipynb():373] not saving jupyter notebook
2024-06-03 22:00:27,455 INFO    MainThread:34 [wandb_init.py:_pause_backend():431] pausing backend
2024-06-03 22:01:48,661 INFO    MainThread:34 [wandb_init.py:_resume_backend():436] resuming backend