2023-10-04 00:11:34,650 INFO [train_bert_encoder.py:1464] (3/4) Training started 2023-10-04 00:11:34,651 INFO [train_bert_encoder.py:1485] (3/4) Device: cuda:3 2023-10-04 00:11:34,655 INFO [train_bert_encoder.py:1494] (3/4) {'best_train_loss': inf, 'best_valid_loss': inf, 'best_train_epoch': -1, 'best_valid_epoch': -1, 'batch_idx_train': 0, 'log_interval': 50, 'reset_interval': 200, 'valid_interval': 3000, 'feature_dim': 80, 'subsampling_factor': 4, 'warm_step': 2000, 'env_info': {'k2-version': '1.24.3', 'k2-build-type': 'Release', 'k2-with-cuda': True, 'k2-git-sha1': '2b2ac14b326d61d79d04e53fbd69b1ff6d630411', 'k2-git-date': 'Thu Aug 24 05:58:26 2023', 'lhotse-version': '1.17.0.dev+git.3dde48dc.clean', 'torch-version': '2.0.1+cu117', 'torch-cuda-available': True, 'torch-cuda-version': '11.7', 'python-version': '3.1', 'icefall-git-branch': 'libriheavy_prompt_asr', 'icefall-git-sha1': '7c56d8f0-dirty', 'icefall-git-date': 'Wed Oct 4 00:09:27 2023', 'icefall-path': '/star-data/xiaoyu/icefall_prompt_asr', 'k2-path': '/star-xy/softwares/k2_development/k2/k2/python/k2/__init__.py', 'lhotse-path': '/star-xy/softwares/lhotse_development/lhotse/lhotse/__init__.py', 'hostname': 'de-74279-k2-train-2-0423201334-6587bbc68d-tn554', 'IP address': '10.177.74.211'}, 'world_size': 4, 'master_port': 13994, 'tensorboard': True, 'num_epochs': 20, 'start_epoch': 1, 'start_batch': 0, 'exp_dir': PosixPath('zipformer_prompt_asr/exp_medium_BERT_memory_layer_0_memory_drop_0.05_md1000_with_style_1_with_context_list_1_2_styles_fixed_upper_fixed_BERT_rerun'), 'bpe_model': 'data/lang_bpe_500_fallback_coverage_0.99/bpe.model', 'base_lr': 0.045, 'lr_batches': 7500, 'lr_epochs': 3.5, 'ref_duration': 600, 'prune_range': 5, 'lm_scale': 0.25, 'am_scale': 0.0, 'simple_loss_scale': 0.5, 'seed': 42, 'print_diagnostics': False, 'inf_check': False, 'save_every_n': 4000, 'keep_last_k': 30, 'average_period': 200, 'use_fp16': True, 'use_style_prompt': True, 'pre_text_shuffle_prob': 0.05, 'style_text_shuffle_prob': 0.2, 'prompt_mask_prob': 0.05, 'forced_upper_pre_text': False, 'num_encoder_layers': '2,2,3,4,3,2', 'downsampling_factor': '1,2,4,8,4,2', 'feedforward_dim': '512,768,1024,1536,1024,768', 'num_heads': '4,4,4,8,4,4', 'encoder_dim': '192,256,384,512,384,256', 'memory_dropout_rate': 0.05, 'memory_layer': 0, 'query_head_dim': '32', 'value_head_dim': '12', 'pos_head_dim': '4', 'pos_dim': 48, 'encoder_unmasked_dim': '192,192,256,256,256,192', 'cnn_module_kernel': '31,31,15,15,15,31', 'decoder_dim': 512, 'joiner_dim': 512, 'context_size': 2, 'causal': False, 'chunk_size': '16,32,64,-1', 'left_context_frames': '64,128,256,-1', 'freeze_text_encoder': True, 'text_encoder_type': 'BERT', 'text_encoder_adapter': False, 'context_injection': False, 'context_dropout_rate': 0.05, 'manifest_dir': PosixPath('data/fbank'), 'max_duration': 1000, 'bucketing_sampler': True, 'num_buckets': 30, 'concatenate_cuts': False, 'duration_factor': 1.0, 'gap': 1.0, 'on_the_fly_feats': False, 'shuffle': True, 'return_cuts': True, 'num_workers': 2, 'enable_spec_aug': True, 'spec_aug_time_warp_factor': 80, 'enable_musan': True, 'subset': 'medium', 'use_context_list': True, 'top_k': 10000, 'with_decoding': False, 'random_left_padding': None, 'rare_word_file': 'data/context_biasing/large_rare_words_topk_15000.txt', 'long_audio_cuts': 'data/manifest_npr/npr1_cuts_all_guids_0.jsonl.gz', 'blank_id': 0, 'vocab_size': 500} 2023-10-04 00:11:34,656 INFO [train_bert_encoder.py:1496] (3/4) About to create model 2023-10-04 00:11:45,029 INFO [train_bert_encoder.py:769] (3/4) Loading pre-trained BERT-base-cased as text encoder 2023-10-04 00:11:55,056 WARNING [_http.py:271] (3/4) '(MaxRetryError("HTTPSConnectionPool(host='huggingface.co', port=443): Max retries exceeded with url: /bert-base-cased/resolve/main/config.json (Caused by ConnectTimeoutError(, 'Connection to huggingface.co timed out. (connect timeout=10)'))"), '(Request ID: f76bf3c1-d88d-44e2-94a1-bec1820737bf)')' thrown while requesting HEAD https://huggingface.co/bert-base-cased/resolve/main/config.json 2023-10-04 00:12:05,080 WARNING [_http.py:271] (3/4) '(MaxRetryError("HTTPSConnectionPool(host='huggingface.co', port=443): Max retries exceeded with url: /bert-base-cased/resolve/main/config.json (Caused by ConnectTimeoutError(, 'Connection to huggingface.co timed out. (connect timeout=10)'))"), '(Request ID: 5b824bec-a069-4060-85b4-e3235bf5dcca)')' thrown while requesting HEAD https://huggingface.co/bert-base-cased/resolve/main/config.json 2023-10-04 00:12:06,986 INFO [train_bert_encoder.py:856] (3/4) Num params in text encoder: 108310272 2023-10-04 00:12:17,051 WARNING [_http.py:271] (3/4) '(MaxRetryError("HTTPSConnectionPool(host='huggingface.co', port=443): Max retries exceeded with url: /bert-base-cased/resolve/main/vocab.txt (Caused by ConnectTimeoutError(, 'Connection to huggingface.co timed out. (connect timeout=10)'))"), '(Request ID: 0e50b780-263f-44c6-88a5-1f18cf6f6eb6)')' thrown while requesting HEAD https://huggingface.co/bert-base-cased/resolve/main/vocab.txt 2023-10-04 00:12:17,104 INFO [train_bert_encoder.py:1501] (3/4) Number of model parameters: 179038803 2023-10-04 00:12:20,958 INFO [train_bert_encoder.py:1516] (3/4) Using DDP 2023-10-04 00:12:21,425 INFO [train_bert_encoder.py:1521] (3/4) Freeze the parameters of text encoder and don't include them in the optimizer 2023-10-04 00:12:21,456 INFO [utils.py:1428] (3/4) Remove module.text_encoder.embeddings.word_embeddings.weight from parameters 2023-10-04 00:12:21,457 INFO [utils.py:1428] (3/4) Remove module.text_encoder.embeddings.position_embeddings.weight from parameters 2023-10-04 00:12:21,457 INFO [utils.py:1428] (3/4) Remove module.text_encoder.embeddings.token_type_embeddings.weight from parameters 2023-10-04 00:12:21,457 INFO [utils.py:1428] (3/4) Remove module.text_encoder.embeddings.LayerNorm.weight from parameters 2023-10-04 00:12:21,457 INFO [utils.py:1428] (3/4) Remove module.text_encoder.embeddings.LayerNorm.bias from parameters 2023-10-04 00:12:21,457 INFO [utils.py:1428] (3/4) Remove module.text_encoder.encoder.layer.0.attention.self.query.weight from parameters 2023-10-04 00:12:21,457 INFO [utils.py:1428] (3/4) Remove module.text_encoder.encoder.layer.0.attention.self.query.bias from parameters 2023-10-04 00:12:21,457 INFO [utils.py:1428] (3/4) Remove module.text_encoder.encoder.layer.0.attention.self.key.weight from parameters 2023-10-04 00:12:21,458 INFO [utils.py:1428] (3/4) Remove module.text_encoder.encoder.layer.0.attention.self.key.bias from parameters 2023-10-04 00:12:21,458 INFO [utils.py:1428] (3/4) Remove module.text_encoder.encoder.layer.0.attention.self.value.weight from parameters 2023-10-04 00:12:21,458 INFO [utils.py:1428] (3/4) Remove module.text_encoder.encoder.layer.0.attention.self.value.bias from parameters 2023-10-04 00:12:21,458 INFO [utils.py:1428] (3/4) Remove module.text_encoder.encoder.layer.0.attention.output.dense.weight from parameters 2023-10-04 00:12:21,458 INFO [utils.py:1428] (3/4) Remove module.text_encoder.encoder.layer.0.attention.output.dense.bias from parameters 2023-10-04 00:12:21,458 INFO [utils.py:1428] (3/4) Remove module.text_encoder.encoder.layer.0.attention.output.LayerNorm.weight from parameters 2023-10-04 00:12:21,458 INFO [utils.py:1428] (3/4) Remove module.text_encoder.encoder.layer.0.attention.output.LayerNorm.bias from parameters 2023-10-04 00:12:21,458 INFO [utils.py:1428] (3/4) Remove module.text_encoder.encoder.layer.0.intermediate.dense.weight from parameters 2023-10-04 00:12:21,458 INFO [utils.py:1428] (3/4) Remove module.text_encoder.encoder.layer.0.intermediate.dense.bias from parameters 2023-10-04 00:12:21,458 INFO [utils.py:1428] (3/4) Remove module.text_encoder.encoder.layer.0.output.dense.weight from parameters 2023-10-04 00:12:21,459 INFO [utils.py:1428] (3/4) Remove module.text_encoder.encoder.layer.0.output.dense.bias from parameters 2023-10-04 00:12:21,459 INFO [utils.py:1428] (3/4) Remove module.text_encoder.encoder.layer.0.output.LayerNorm.weight from parameters 2023-10-04 00:12:21,459 INFO [utils.py:1428] (3/4) Remove module.text_encoder.encoder.layer.0.output.LayerNorm.bias from parameters 2023-10-04 00:12:21,459 INFO [utils.py:1428] (3/4) Remove module.text_encoder.encoder.layer.1.attention.self.query.weight from parameters 2023-10-04 00:12:21,459 INFO [utils.py:1428] (3/4) Remove module.text_encoder.encoder.layer.1.attention.self.query.bias from parameters 2023-10-04 00:12:21,459 INFO [utils.py:1428] (3/4) Remove module.text_encoder.encoder.layer.1.attention.self.key.weight from parameters 2023-10-04 00:12:21,459 INFO [utils.py:1428] (3/4) Remove module.text_encoder.encoder.layer.1.attention.self.key.bias from parameters 2023-10-04 00:12:21,459 INFO [utils.py:1428] (3/4) Remove module.text_encoder.encoder.layer.1.attention.self.value.weight from parameters 2023-10-04 00:12:21,459 INFO [utils.py:1428] (3/4) Remove module.text_encoder.encoder.layer.1.attention.self.value.bias from parameters 2023-10-04 00:12:21,459 INFO [utils.py:1428] (3/4) Remove module.text_encoder.encoder.layer.1.attention.output.dense.weight from parameters 2023-10-04 00:12:21,460 INFO [utils.py:1428] (3/4) Remove module.text_encoder.encoder.layer.1.attention.output.dense.bias from parameters 2023-10-04 00:12:21,460 INFO [utils.py:1428] (3/4) Remove module.text_encoder.encoder.layer.1.attention.output.LayerNorm.weight from parameters 2023-10-04 00:12:21,460 INFO [utils.py:1428] (3/4) Remove module.text_encoder.encoder.layer.1.attention.output.LayerNorm.bias from parameters 2023-10-04 00:12:21,460 INFO [utils.py:1428] (3/4) Remove module.text_encoder.encoder.layer.1.intermediate.dense.weight from parameters 2023-10-04 00:12:21,460 INFO [utils.py:1428] (3/4) Remove module.text_encoder.encoder.layer.1.intermediate.dense.bias from parameters 2023-10-04 00:12:21,460 INFO [utils.py:1428] (3/4) Remove module.text_encoder.encoder.layer.1.output.dense.weight from parameters 2023-10-04 00:12:21,460 INFO [utils.py:1428] (3/4) Remove module.text_encoder.encoder.layer.1.output.dense.bias from parameters 2023-10-04 00:12:21,460 INFO [utils.py:1428] (3/4) Remove module.text_encoder.encoder.layer.1.output.LayerNorm.weight from parameters 2023-10-04 00:12:21,460 INFO [utils.py:1428] (3/4) Remove module.text_encoder.encoder.layer.1.output.LayerNorm.bias from parameters 2023-10-04 00:12:21,460 INFO [utils.py:1428] (3/4) Remove module.text_encoder.encoder.layer.2.attention.self.query.weight from parameters 2023-10-04 00:12:21,460 INFO [utils.py:1428] (3/4) Remove module.text_encoder.encoder.layer.2.attention.self.query.bias from parameters 2023-10-04 00:12:21,460 INFO [utils.py:1428] (3/4) Remove module.text_encoder.encoder.layer.2.attention.self.key.weight from parameters 2023-10-04 00:12:21,461 INFO [utils.py:1428] (3/4) Remove module.text_encoder.encoder.layer.2.attention.self.key.bias from parameters 2023-10-04 00:12:21,461 INFO [utils.py:1428] (3/4) Remove module.text_encoder.encoder.layer.2.attention.self.value.weight from parameters 2023-10-04 00:12:21,461 INFO [utils.py:1428] (3/4) Remove module.text_encoder.encoder.layer.2.attention.self.value.bias from parameters 2023-10-04 00:12:21,461 INFO [utils.py:1428] (3/4) Remove module.text_encoder.encoder.layer.2.attention.output.dense.weight from parameters 2023-10-04 00:12:21,461 INFO [utils.py:1428] (3/4) Remove module.text_encoder.encoder.layer.2.attention.output.dense.bias from parameters 2023-10-04 00:12:21,461 INFO [utils.py:1428] (3/4) Remove module.text_encoder.encoder.layer.2.attention.output.LayerNorm.weight from parameters 2023-10-04 00:12:21,461 INFO [utils.py:1428] (3/4) Remove module.text_encoder.encoder.layer.2.attention.output.LayerNorm.bias from parameters 2023-10-04 00:12:21,461 INFO [utils.py:1428] (3/4) Remove module.text_encoder.encoder.layer.2.intermediate.dense.weight from parameters 2023-10-04 00:12:21,461 INFO [utils.py:1428] (3/4) Remove module.text_encoder.encoder.layer.2.intermediate.dense.bias from parameters 2023-10-04 00:12:21,461 INFO [utils.py:1428] (3/4) Remove module.text_encoder.encoder.layer.2.output.dense.weight from parameters 2023-10-04 00:12:21,461 INFO [utils.py:1428] (3/4) Remove module.text_encoder.encoder.layer.2.output.dense.bias from parameters 2023-10-04 00:12:21,461 INFO [utils.py:1428] (3/4) Remove module.text_encoder.encoder.layer.2.output.LayerNorm.weight from parameters 2023-10-04 00:12:21,461 INFO [utils.py:1428] (3/4) Remove module.text_encoder.encoder.layer.2.output.LayerNorm.bias from parameters 2023-10-04 00:12:21,461 INFO [utils.py:1428] (3/4) Remove module.text_encoder.encoder.layer.3.attention.self.query.weight from parameters 2023-10-04 00:12:21,462 INFO [utils.py:1428] (3/4) Remove module.text_encoder.encoder.layer.3.attention.self.query.bias from parameters 2023-10-04 00:12:21,462 INFO [utils.py:1428] (3/4) Remove module.text_encoder.encoder.layer.3.attention.self.key.weight from parameters 2023-10-04 00:12:21,462 INFO [utils.py:1428] (3/4) Remove module.text_encoder.encoder.layer.3.attention.self.key.bias from parameters 2023-10-04 00:12:21,462 INFO [utils.py:1428] (3/4) Remove module.text_encoder.encoder.layer.3.attention.self.value.weight from parameters 2023-10-04 00:12:21,462 INFO [utils.py:1428] (3/4) Remove module.text_encoder.encoder.layer.3.attention.self.value.bias from parameters 2023-10-04 00:12:21,462 INFO [utils.py:1428] (3/4) Remove module.text_encoder.encoder.layer.3.attention.output.dense.weight from parameters 2023-10-04 00:12:21,462 INFO [utils.py:1428] (3/4) Remove module.text_encoder.encoder.layer.3.attention.output.dense.bias from parameters 2023-10-04 00:12:21,462 INFO [utils.py:1428] (3/4) Remove module.text_encoder.encoder.layer.3.attention.output.LayerNorm.weight from parameters 2023-10-04 00:12:21,462 INFO [utils.py:1428] (3/4) Remove module.text_encoder.encoder.layer.3.attention.output.LayerNorm.bias from parameters 2023-10-04 00:12:21,462 INFO [utils.py:1428] (3/4) Remove module.text_encoder.encoder.layer.3.intermediate.dense.weight from parameters 2023-10-04 00:12:21,462 INFO [utils.py:1428] (3/4) Remove module.text_encoder.encoder.layer.3.intermediate.dense.bias from parameters 2023-10-04 00:12:21,462 INFO [utils.py:1428] (3/4) Remove module.text_encoder.encoder.layer.3.output.dense.weight from parameters 2023-10-04 00:12:21,462 INFO [utils.py:1428] (3/4) Remove module.text_encoder.encoder.layer.3.output.dense.bias from parameters 2023-10-04 00:12:21,462 INFO [utils.py:1428] (3/4) Remove module.text_encoder.encoder.layer.3.output.LayerNorm.weight from parameters 2023-10-04 00:12:21,463 INFO [utils.py:1428] (3/4) Remove module.text_encoder.encoder.layer.3.output.LayerNorm.bias from parameters 2023-10-04 00:12:21,463 INFO [utils.py:1428] (3/4) Remove module.text_encoder.encoder.layer.4.attention.self.query.weight from parameters 2023-10-04 00:12:21,463 INFO [utils.py:1428] (3/4) Remove module.text_encoder.encoder.layer.4.attention.self.query.bias from parameters 2023-10-04 00:12:21,463 INFO [utils.py:1428] (3/4) Remove module.text_encoder.encoder.layer.4.attention.self.key.weight from parameters 2023-10-04 00:12:21,463 INFO [utils.py:1428] (3/4) Remove module.text_encoder.encoder.layer.4.attention.self.key.bias from parameters 2023-10-04 00:12:21,463 INFO [utils.py:1428] (3/4) Remove module.text_encoder.encoder.layer.4.attention.self.value.weight from parameters 2023-10-04 00:12:21,463 INFO [utils.py:1428] (3/4) Remove module.text_encoder.encoder.layer.4.attention.self.value.bias from parameters 2023-10-04 00:12:21,463 INFO [utils.py:1428] (3/4) Remove module.text_encoder.encoder.layer.4.attention.output.dense.weight from parameters 2023-10-04 00:12:21,463 INFO [utils.py:1428] (3/4) Remove module.text_encoder.encoder.layer.4.attention.output.dense.bias from parameters 2023-10-04 00:12:21,463 INFO [utils.py:1428] (3/4) Remove module.text_encoder.encoder.layer.4.attention.output.LayerNorm.weight from parameters 2023-10-04 00:12:21,463 INFO [utils.py:1428] (3/4) Remove module.text_encoder.encoder.layer.4.attention.output.LayerNorm.bias from parameters 2023-10-04 00:12:21,463 INFO [utils.py:1428] (3/4) Remove module.text_encoder.encoder.layer.4.intermediate.dense.weight from parameters 2023-10-04 00:12:21,464 INFO [utils.py:1428] (3/4) Remove module.text_encoder.encoder.layer.4.intermediate.dense.bias from parameters 2023-10-04 00:12:21,464 INFO [utils.py:1428] (3/4) Remove module.text_encoder.encoder.layer.4.output.dense.weight from parameters 2023-10-04 00:12:21,464 INFO [utils.py:1428] (3/4) Remove module.text_encoder.encoder.layer.4.output.dense.bias from parameters 2023-10-04 00:12:21,464 INFO [utils.py:1428] (3/4) Remove module.text_encoder.encoder.layer.4.output.LayerNorm.weight from parameters 2023-10-04 00:12:21,464 INFO [utils.py:1428] (3/4) Remove module.text_encoder.encoder.layer.4.output.LayerNorm.bias from parameters 2023-10-04 00:12:21,464 INFO [utils.py:1428] (3/4) Remove module.text_encoder.encoder.layer.5.attention.self.query.weight from parameters 2023-10-04 00:12:21,464 INFO [utils.py:1428] (3/4) Remove module.text_encoder.encoder.layer.5.attention.self.query.bias from parameters 2023-10-04 00:12:21,464 INFO [utils.py:1428] (3/4) Remove module.text_encoder.encoder.layer.5.attention.self.key.weight from parameters 2023-10-04 00:12:21,464 INFO [utils.py:1428] (3/4) Remove module.text_encoder.encoder.layer.5.attention.self.key.bias from parameters 2023-10-04 00:12:21,464 INFO [utils.py:1428] (3/4) Remove module.text_encoder.encoder.layer.5.attention.self.value.weight from parameters 2023-10-04 00:12:21,464 INFO [utils.py:1428] (3/4) Remove module.text_encoder.encoder.layer.5.attention.self.value.bias from parameters 2023-10-04 00:12:21,465 INFO [utils.py:1428] (3/4) Remove module.text_encoder.encoder.layer.5.attention.output.dense.weight from parameters 2023-10-04 00:12:21,465 INFO [utils.py:1428] (3/4) Remove module.text_encoder.encoder.layer.5.attention.output.dense.bias from parameters 2023-10-04 00:12:21,465 INFO [utils.py:1428] (3/4) Remove module.text_encoder.encoder.layer.5.attention.output.LayerNorm.weight from parameters 2023-10-04 00:12:21,465 INFO [utils.py:1428] (3/4) Remove module.text_encoder.encoder.layer.5.attention.output.LayerNorm.bias from parameters 2023-10-04 00:12:21,465 INFO [utils.py:1428] (3/4) Remove module.text_encoder.encoder.layer.5.intermediate.dense.weight from parameters 2023-10-04 00:12:21,465 INFO [utils.py:1428] (3/4) Remove module.text_encoder.encoder.layer.5.intermediate.dense.bias from parameters 2023-10-04 00:12:21,465 INFO [utils.py:1428] (3/4) Remove module.text_encoder.encoder.layer.5.output.dense.weight from parameters 2023-10-04 00:12:21,465 INFO [utils.py:1428] (3/4) Remove module.text_encoder.encoder.layer.5.output.dense.bias from parameters 2023-10-04 00:12:21,465 INFO [utils.py:1428] (3/4) Remove module.text_encoder.encoder.layer.5.output.LayerNorm.weight from parameters 2023-10-04 00:12:21,465 INFO [utils.py:1428] (3/4) Remove module.text_encoder.encoder.layer.5.output.LayerNorm.bias from parameters 2023-10-04 00:12:21,465 INFO [utils.py:1428] (3/4) Remove module.text_encoder.encoder.layer.6.attention.self.query.weight from parameters 2023-10-04 00:12:21,465 INFO [utils.py:1428] (3/4) Remove module.text_encoder.encoder.layer.6.attention.self.query.bias from parameters 2023-10-04 00:12:21,466 INFO [utils.py:1428] (3/4) Remove module.text_encoder.encoder.layer.6.attention.self.key.weight from parameters 2023-10-04 00:12:21,466 INFO [utils.py:1428] (3/4) Remove module.text_encoder.encoder.layer.6.attention.self.key.bias from parameters 2023-10-04 00:12:21,466 INFO [utils.py:1428] (3/4) Remove module.text_encoder.encoder.layer.6.attention.self.value.weight from parameters 2023-10-04 00:12:21,466 INFO [utils.py:1428] (3/4) Remove module.text_encoder.encoder.layer.6.attention.self.value.bias from parameters 2023-10-04 00:12:21,466 INFO [utils.py:1428] (3/4) Remove module.text_encoder.encoder.layer.6.attention.output.dense.weight from parameters 2023-10-04 00:12:21,466 INFO [utils.py:1428] (3/4) Remove module.text_encoder.encoder.layer.6.attention.output.dense.bias from parameters 2023-10-04 00:12:21,466 INFO [utils.py:1428] (3/4) Remove module.text_encoder.encoder.layer.6.attention.output.LayerNorm.weight from parameters 2023-10-04 00:12:21,466 INFO [utils.py:1428] (3/4) Remove module.text_encoder.encoder.layer.6.attention.output.LayerNorm.bias from parameters 2023-10-04 00:12:21,466 INFO [utils.py:1428] (3/4) Remove module.text_encoder.encoder.layer.6.intermediate.dense.weight from parameters 2023-10-04 00:12:21,466 INFO [utils.py:1428] (3/4) Remove module.text_encoder.encoder.layer.6.intermediate.dense.bias from parameters 2023-10-04 00:12:21,466 INFO [utils.py:1428] (3/4) Remove module.text_encoder.encoder.layer.6.output.dense.weight from parameters 2023-10-04 00:12:21,467 INFO [utils.py:1428] (3/4) Remove module.text_encoder.encoder.layer.6.output.dense.bias from parameters 2023-10-04 00:12:21,467 INFO [utils.py:1428] (3/4) Remove module.text_encoder.encoder.layer.6.output.LayerNorm.weight from parameters 2023-10-04 00:12:21,467 INFO [utils.py:1428] (3/4) Remove module.text_encoder.encoder.layer.6.output.LayerNorm.bias from parameters 2023-10-04 00:12:21,467 INFO [utils.py:1428] (3/4) Remove module.text_encoder.encoder.layer.7.attention.self.query.weight from parameters 2023-10-04 00:12:21,467 INFO [utils.py:1428] (3/4) Remove module.text_encoder.encoder.layer.7.attention.self.query.bias from parameters 2023-10-04 00:12:21,467 INFO [utils.py:1428] (3/4) Remove module.text_encoder.encoder.layer.7.attention.self.key.weight from parameters 2023-10-04 00:12:21,467 INFO [utils.py:1428] (3/4) Remove module.text_encoder.encoder.layer.7.attention.self.key.bias from parameters 2023-10-04 00:12:21,467 INFO [utils.py:1428] (3/4) Remove module.text_encoder.encoder.layer.7.attention.self.value.weight from parameters 2023-10-04 00:12:21,467 INFO [utils.py:1428] (3/4) Remove module.text_encoder.encoder.layer.7.attention.self.value.bias from parameters 2023-10-04 00:12:21,467 INFO [utils.py:1428] (3/4) Remove module.text_encoder.encoder.layer.7.attention.output.dense.weight from parameters 2023-10-04 00:12:21,467 INFO [utils.py:1428] (3/4) Remove module.text_encoder.encoder.layer.7.attention.output.dense.bias from parameters 2023-10-04 00:12:21,468 INFO [utils.py:1428] (3/4) Remove module.text_encoder.encoder.layer.7.attention.output.LayerNorm.weight from parameters 2023-10-04 00:12:21,468 INFO [utils.py:1428] (3/4) Remove module.text_encoder.encoder.layer.7.attention.output.LayerNorm.bias from parameters 2023-10-04 00:12:21,468 INFO [utils.py:1428] (3/4) Remove module.text_encoder.encoder.layer.7.intermediate.dense.weight from parameters 2023-10-04 00:12:21,468 INFO [utils.py:1428] (3/4) Remove module.text_encoder.encoder.layer.7.intermediate.dense.bias from parameters 2023-10-04 00:12:21,468 INFO [utils.py:1428] (3/4) Remove module.text_encoder.encoder.layer.7.output.dense.weight from parameters 2023-10-04 00:12:21,468 INFO [utils.py:1428] (3/4) Remove module.text_encoder.encoder.layer.7.output.dense.bias from parameters 2023-10-04 00:12:21,468 INFO [utils.py:1428] (3/4) Remove module.text_encoder.encoder.layer.7.output.LayerNorm.weight from parameters 2023-10-04 00:12:21,468 INFO [utils.py:1428] (3/4) Remove module.text_encoder.encoder.layer.7.output.LayerNorm.bias from parameters 2023-10-04 00:12:21,468 INFO [utils.py:1428] (3/4) Remove module.text_encoder.encoder.layer.8.attention.self.query.weight from parameters 2023-10-04 00:12:21,468 INFO [utils.py:1428] (3/4) Remove module.text_encoder.encoder.layer.8.attention.self.query.bias from parameters 2023-10-04 00:12:21,468 INFO [utils.py:1428] (3/4) Remove module.text_encoder.encoder.layer.8.attention.self.key.weight from parameters 2023-10-04 00:12:21,468 INFO [utils.py:1428] (3/4) Remove module.text_encoder.encoder.layer.8.attention.self.key.bias from parameters 2023-10-04 00:12:21,468 INFO [utils.py:1428] (3/4) Remove module.text_encoder.encoder.layer.8.attention.self.value.weight from parameters 2023-10-04 00:12:21,468 INFO [utils.py:1428] (3/4) Remove module.text_encoder.encoder.layer.8.attention.self.value.bias from parameters 2023-10-04 00:12:21,469 INFO [utils.py:1428] (3/4) Remove module.text_encoder.encoder.layer.8.attention.output.dense.weight from parameters 2023-10-04 00:12:21,469 INFO [utils.py:1428] (3/4) Remove module.text_encoder.encoder.layer.8.attention.output.dense.bias from parameters 2023-10-04 00:12:21,469 INFO [utils.py:1428] (3/4) Remove module.text_encoder.encoder.layer.8.attention.output.LayerNorm.weight from parameters 2023-10-04 00:12:21,469 INFO [utils.py:1428] (3/4) Remove module.text_encoder.encoder.layer.8.attention.output.LayerNorm.bias from parameters 2023-10-04 00:12:21,469 INFO [utils.py:1428] (3/4) Remove module.text_encoder.encoder.layer.8.intermediate.dense.weight from parameters 2023-10-04 00:12:21,469 INFO [utils.py:1428] (3/4) Remove module.text_encoder.encoder.layer.8.intermediate.dense.bias from parameters 2023-10-04 00:12:21,469 INFO [utils.py:1428] (3/4) Remove module.text_encoder.encoder.layer.8.output.dense.weight from parameters 2023-10-04 00:12:21,469 INFO [utils.py:1428] (3/4) Remove module.text_encoder.encoder.layer.8.output.dense.bias from parameters 2023-10-04 00:12:21,469 INFO [utils.py:1428] (3/4) Remove module.text_encoder.encoder.layer.8.output.LayerNorm.weight from parameters 2023-10-04 00:12:21,469 INFO [utils.py:1428] (3/4) Remove module.text_encoder.encoder.layer.8.output.LayerNorm.bias from parameters 2023-10-04 00:12:21,469 INFO [utils.py:1428] (3/4) Remove module.text_encoder.encoder.layer.9.attention.self.query.weight from parameters 2023-10-04 00:12:21,469 INFO [utils.py:1428] (3/4) Remove module.text_encoder.encoder.layer.9.attention.self.query.bias from parameters 2023-10-04 00:12:21,469 INFO [utils.py:1428] (3/4) Remove module.text_encoder.encoder.layer.9.attention.self.key.weight from parameters 2023-10-04 00:12:21,469 INFO [utils.py:1428] (3/4) Remove module.text_encoder.encoder.layer.9.attention.self.key.bias from parameters 2023-10-04 00:12:21,470 INFO [utils.py:1428] (3/4) Remove module.text_encoder.encoder.layer.9.attention.self.value.weight from parameters 2023-10-04 00:12:21,470 INFO [utils.py:1428] (3/4) Remove module.text_encoder.encoder.layer.9.attention.self.value.bias from parameters 2023-10-04 00:12:21,470 INFO [utils.py:1428] (3/4) Remove module.text_encoder.encoder.layer.9.attention.output.dense.weight from parameters 2023-10-04 00:12:21,470 INFO [utils.py:1428] (3/4) Remove module.text_encoder.encoder.layer.9.attention.output.dense.bias from parameters 2023-10-04 00:12:21,470 INFO [utils.py:1428] (3/4) Remove module.text_encoder.encoder.layer.9.attention.output.LayerNorm.weight from parameters 2023-10-04 00:12:21,470 INFO [utils.py:1428] (3/4) Remove module.text_encoder.encoder.layer.9.attention.output.LayerNorm.bias from parameters 2023-10-04 00:12:21,470 INFO [utils.py:1428] (3/4) Remove module.text_encoder.encoder.layer.9.intermediate.dense.weight from parameters 2023-10-04 00:12:21,470 INFO [utils.py:1428] (3/4) Remove module.text_encoder.encoder.layer.9.intermediate.dense.bias from parameters 2023-10-04 00:12:21,470 INFO [utils.py:1428] (3/4) Remove module.text_encoder.encoder.layer.9.output.dense.weight from parameters 2023-10-04 00:12:21,470 INFO [utils.py:1428] (3/4) Remove module.text_encoder.encoder.layer.9.output.dense.bias from parameters 2023-10-04 00:12:21,470 INFO [utils.py:1428] (3/4) Remove module.text_encoder.encoder.layer.9.output.LayerNorm.weight from parameters 2023-10-04 00:12:21,470 INFO [utils.py:1428] (3/4) Remove module.text_encoder.encoder.layer.9.output.LayerNorm.bias from parameters 2023-10-04 00:12:21,470 INFO [utils.py:1428] (3/4) Remove module.text_encoder.encoder.layer.10.attention.self.query.weight from parameters 2023-10-04 00:12:21,471 INFO [utils.py:1428] (3/4) Remove module.text_encoder.encoder.layer.10.attention.self.query.bias from parameters 2023-10-04 00:12:21,471 INFO [utils.py:1428] (3/4) Remove module.text_encoder.encoder.layer.10.attention.self.key.weight from parameters 2023-10-04 00:12:21,471 INFO [utils.py:1428] (3/4) Remove module.text_encoder.encoder.layer.10.attention.self.key.bias from parameters 2023-10-04 00:12:21,471 INFO [utils.py:1428] (3/4) Remove module.text_encoder.encoder.layer.10.attention.self.value.weight from parameters 2023-10-04 00:12:21,471 INFO [utils.py:1428] (3/4) Remove module.text_encoder.encoder.layer.10.attention.self.value.bias from parameters 2023-10-04 00:12:21,471 INFO [utils.py:1428] (3/4) Remove module.text_encoder.encoder.layer.10.attention.output.dense.weight from parameters 2023-10-04 00:12:21,471 INFO [utils.py:1428] (3/4) Remove module.text_encoder.encoder.layer.10.attention.output.dense.bias from parameters 2023-10-04 00:12:21,471 INFO [utils.py:1428] (3/4) Remove module.text_encoder.encoder.layer.10.attention.output.LayerNorm.weight from parameters 2023-10-04 00:12:21,471 INFO [utils.py:1428] (3/4) Remove module.text_encoder.encoder.layer.10.attention.output.LayerNorm.bias from parameters 2023-10-04 00:12:21,471 INFO [utils.py:1428] (3/4) Remove module.text_encoder.encoder.layer.10.intermediate.dense.weight from parameters 2023-10-04 00:12:21,471 INFO [utils.py:1428] (3/4) Remove module.text_encoder.encoder.layer.10.intermediate.dense.bias from parameters 2023-10-04 00:12:21,471 INFO [utils.py:1428] (3/4) Remove module.text_encoder.encoder.layer.10.output.dense.weight from parameters 2023-10-04 00:12:21,471 INFO [utils.py:1428] (3/4) Remove module.text_encoder.encoder.layer.10.output.dense.bias from parameters 2023-10-04 00:12:21,471 INFO [utils.py:1428] (3/4) Remove module.text_encoder.encoder.layer.10.output.LayerNorm.weight from parameters 2023-10-04 00:12:21,471 INFO [utils.py:1428] (3/4) Remove module.text_encoder.encoder.layer.10.output.LayerNorm.bias from parameters 2023-10-04 00:12:21,471 INFO [utils.py:1428] (3/4) Remove module.text_encoder.encoder.layer.11.attention.self.query.weight from parameters 2023-10-04 00:12:21,472 INFO [utils.py:1428] (3/4) Remove module.text_encoder.encoder.layer.11.attention.self.query.bias from parameters 2023-10-04 00:12:21,472 INFO [utils.py:1428] (3/4) Remove module.text_encoder.encoder.layer.11.attention.self.key.weight from parameters 2023-10-04 00:12:21,472 INFO [utils.py:1428] (3/4) Remove module.text_encoder.encoder.layer.11.attention.self.key.bias from parameters 2023-10-04 00:12:21,472 INFO [utils.py:1428] (3/4) Remove module.text_encoder.encoder.layer.11.attention.self.value.weight from parameters 2023-10-04 00:12:21,472 INFO [utils.py:1428] (3/4) Remove module.text_encoder.encoder.layer.11.attention.self.value.bias from parameters 2023-10-04 00:12:21,472 INFO [utils.py:1428] (3/4) Remove module.text_encoder.encoder.layer.11.attention.output.dense.weight from parameters 2023-10-04 00:12:21,472 INFO [utils.py:1428] (3/4) Remove module.text_encoder.encoder.layer.11.attention.output.dense.bias from parameters 2023-10-04 00:12:21,472 INFO [utils.py:1428] (3/4) Remove module.text_encoder.encoder.layer.11.attention.output.LayerNorm.weight from parameters 2023-10-04 00:12:21,472 INFO [utils.py:1428] (3/4) Remove module.text_encoder.encoder.layer.11.attention.output.LayerNorm.bias from parameters 2023-10-04 00:12:21,472 INFO [utils.py:1428] (3/4) Remove module.text_encoder.encoder.layer.11.intermediate.dense.weight from parameters 2023-10-04 00:12:21,472 INFO [utils.py:1428] (3/4) Remove module.text_encoder.encoder.layer.11.intermediate.dense.bias from parameters 2023-10-04 00:12:21,472 INFO [utils.py:1428] (3/4) Remove module.text_encoder.encoder.layer.11.output.dense.weight from parameters 2023-10-04 00:12:21,472 INFO [utils.py:1428] (3/4) Remove module.text_encoder.encoder.layer.11.output.dense.bias from parameters 2023-10-04 00:12:21,472 INFO [utils.py:1428] (3/4) Remove module.text_encoder.encoder.layer.11.output.LayerNorm.weight from parameters 2023-10-04 00:12:21,472 INFO [utils.py:1428] (3/4) Remove module.text_encoder.encoder.layer.11.output.LayerNorm.bias from parameters 2023-10-04 00:12:21,473 INFO [utils.py:1428] (3/4) Remove module.text_encoder.pooler.dense.weight from parameters 2023-10-04 00:12:21,473 INFO [utils.py:1428] (3/4) Remove module.text_encoder.pooler.dense.bias from parameters 2023-10-04 00:12:21,577 INFO [asr_datamodule.py:447] (3/4) About to get medium cuts 2023-10-04 00:12:21,578 INFO [asr_datamodule.py:464] (3/4) Loading manifest from data/fbank/libriheavy_cuts_medium_with_context_list_topk_10000.jsonl.gz. 2023-10-04 00:12:21,578 INFO [train_bert_encoder.py:1615] (3/4) Text sampling: 2023-10-04 00:12:21,579 INFO [asr_datamodule.py:259] (3/4) Enable MUSAN 2023-10-04 00:12:21,579 INFO [asr_datamodule.py:260] (3/4) About to get Musan cuts 2023-10-04 00:12:23,872 INFO [asr_datamodule.py:284] (3/4) Enable SpecAugment 2023-10-04 00:12:23,872 INFO [asr_datamodule.py:285] (3/4) Time warp factor: 80 2023-10-04 00:12:23,872 INFO [asr_datamodule.py:295] (3/4) Num frame mask: 10 2023-10-04 00:12:23,872 INFO [asr_datamodule.py:308] (3/4) About to create train dataset 2023-10-04 00:12:23,873 INFO [asr_datamodule.py:338] (3/4) Using DynamicBucketingSampler. 2023-10-04 00:12:32,072 INFO [asr_datamodule.py:350] (3/4) About to create train dataloader 2023-10-04 00:12:32,088 INFO [asr_datamodule.py:470] (3/4) About to get dev cuts 2023-10-04 00:12:32,091 INFO [asr_datamodule.py:391] (3/4) About to create dev dataset 2023-10-04 00:12:32,523 INFO [asr_datamodule.py:412] (3/4) About to create dev dataloader 2023-10-04 00:13:00,240 INFO [scaling.py:941] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.24 vs. limit=7.5 2023-10-04 00:13:01,082 INFO [train_bert_encoder.py:1393] (3/4) Epoch 1, batch 0, loss[loss=8.525, simple_loss=7.715, pruned_loss=8.084, over 24352.00 frames. ], tot_loss[loss=8.525, simple_loss=7.715, pruned_loss=8.084, over 24352.00 frames. ], batch size: 34, lr: 2.25e-02, grad_scale: 1.0 2023-10-04 00:13:01,083 INFO [train_bert_encoder.py:1418] (3/4) Computing validation loss 2023-10-04 00:13:30,728 INFO [train_bert_encoder.py:1136] (3/4) Pre texts: vered with red buds, which shone like sparks of fire and lighted the whole room. By the light of the sparks one saw that a small and slender but quite elderly lady sat in the big arm-chair and held her court. It could not be Mamsell Fredrika herself, for she lay sleeping in quiet repose, and yet it was she. She sat there and held a reception for old memories; the room was full of them. People and homes and subjects and thoughts and discussions came flying. Memories of childhood and memories of youth, love and tears, homage and bitter scorn, all came rushing towards the pale form that sat and looked at everything with a friendly smile. She had words of jest or of sympathy for them all. At night everything takes its right size and shape. And just as then for the first time the stars of heaven are visible, one also sees much on earth that one never sees by day. Now in the light of the red buds of the Jericho rose one could see a crowd of strange figures in Mamsell Fredrika's drawing-room. 2023-10-04 00:13:30,728 INFO [train_bert_encoder.py:1137] (3/4) Ref texts: The hard "ma chère mère" was there, the goodnatured Beata Hvardagslag, people from the East and the West, the enthusiastic Nina, the energetic, struggling Hertha in her white dress. 2023-10-04 00:13:30,728 INFO [train_bert_encoder.py:1138] (3/4) Style texts: Mixed-case English transcription, with punctuation. Actually, it is fully not related. What do you think? 2023-10-04 00:13:41,702 INFO [train_bert_encoder.py:1428] (3/4) Epoch 1, validation: loss=8.204, simple_loss=7.422, pruned_loss=7.801, over 2021197.00 frames. 2023-10-04 00:13:41,703 INFO [train_bert_encoder.py:1429] (3/4) Maximum memory allocated so far is 19556MB 2023-10-04 00:13:44,983 INFO [scaling.py:178] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=0.0, ans=0.3 2023-10-04 00:13:50,340 INFO [train_bert_encoder.py:1136] (3/4) Pre texts: "You have great ability; I believe you have genius. What you need now is the refinement of association. Seek companionship among men of superior intellect and character. Refine yourself and your work. Never affiliate with inferiors; always climb." This, coming to him from a man of Burlingame's character and position, was like a gospel from some divine source. Clemens never forgot the advice. It gave him courage, new hope, new resolve, new ideals. Burlingame came often to the hotel, and they discussed plans for Mark Twain's future. The diplomat invited the journalist to visit him in China: "Come to Pekin," he said, "and make my house your home." Young Burlingame also came, when the patient became convalescent, and suggested walks. Once, when Clemens hesitated, the young man said: "But there is a scriptural command for you to go." "If you can quote one, I'll obey," said Clemens. "Very well; the Bible says: `If any man require thee to walk a mile, go with him Twain.'" The walk was taken. 2023-10-04 00:13:50,340 INFO [train_bert_encoder.py:1137] (3/4) Ref texts: Mark Twain returned to California at the end of July, and went down to Sacramento. It was agreed that a special bill should be made for the "Hornet" report. 2023-10-04 00:13:50,340 INFO [train_bert_encoder.py:1138] (3/4) Style texts: and position, was like a gospel from some divine source. Clemens never forgot the advice. It gave him courage, new hope, new resolve, new ideals. Bu 2023-10-04 00:13:55,992 INFO [scaling.py:941] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=23.91 vs. limit=7.5 2023-10-04 00:14:07,303 INFO [scaling.py:941] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.64 vs. limit=7.525 2023-10-04 00:14:19,197 INFO [scaling.py:941] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.src_attn2.whiten, num_groups=1, num_channels=256, metric=143.98 vs. limit=7.55 2023-10-04 00:14:21,827 INFO [scaling.py:178] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten.whitening_limit, batch_count=66.66666666666667, ans=7.525 2023-10-04 00:14:27,881 INFO [scaling.py:178] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=133.33333333333334, ans=0.097 2023-10-04 00:14:27,934 INFO [scaling.py:178] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=133.33333333333334, ans=0.2425 2023-10-04 00:14:48,523 INFO [zipformer.py:1854] (3/4) name=encoder.encoders.3.encoder.layers.0.attn_weights, attn_weights_entropy = tensor([3.6799, 3.6275, 2.8451, 3.0387, 2.7158, 3.9645, 1.9730, 3.3510], device='cuda:3') 2023-10-04 00:14:49,071 INFO [scaling.py:941] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=164.58 vs. limit=5.033333333333333 2023-10-04 00:14:56,335 INFO [scaling.py:941] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.82 vs. limit=4.08 2023-10-04 00:14:59,316 INFO [train_bert_encoder.py:1148] (3/4) Shape of encoded texts: torch.Size([47, 500]) 2023-10-04 00:15:02,300 INFO [scaling.py:178] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=200.0, ans=0.752 2023-10-04 00:15:03,197 INFO [scaling.py:941] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.src_attn1.whiten, num_groups=1, num_channels=256, metric=217.61 vs. limit=7.65 2023-10-04 00:15:04,105 INFO [train_bert_encoder.py:1136] (3/4) Pre texts: She waited a while. The footsteps seemed to draw nearer, and soon, although the starlit night was very dark, she perceived a cloaked and hooded figure approaching cautiously toward her. "Who goes there?" she called suddenly. The figure paused: then came rapidly forward, and a voice said timidly: "Ah! Lady Blakeney!" "Who are you?" asked Marguerite peremptorily. "It is I... Desiree Candeille," replied the midnight prowler. "Demoiselle Candeille!" ejaculated Marguerite, wholly taken by surprise. "What are you doing here? alone? and at this hour?" "Sh-sh-sh..." whispered Candeille eagerly, as she approached quite close to Marguerite and drew her hood still lower over her eyes. "I am all alone ... I wanted to see someone--you if possible, Lady Blakeney... for I could not rest... I wanted to know what had happened." "What had happened? When? I don't understand." "What happened between Citizen Chauvelin and your husband?" asked Candeille. "What is that to you?" replied Marguerite haughtily. 2023-10-04 00:15:04,106 INFO [train_bert_encoder.py:1137] (3/4) Ref texts: I PRAY YOU DO NOT MISUNDERSTAND ME PLEADED CANDEILLE EAGERLY I KNOW MY PRESENCE IN YOUR HOUSE THE QUARREL WHICH I PROVOKED MUST HAVE FILLED YOUR HEART WITH HATRED AND SUSPICION TOWARDS ME BUT OH HOW CAN I PERSUADE YOU I ACTED UNWILLINGLY WILL YOU NOT BELIEVE ME I WAS THAT MAN'S TOOL AND OH GOD SHE ADDED WITH SUDDEN WILD VEHEMENCE IF ONLY YOU COULD KNOW WHAT TYRANNY THAT ACCURSED GOVERNMENT OF FRANCE EXERCISES OVER POOR HELPLESS WOMEN OR MEN WHO HAPPEN TO HAVE FALLEN WITHIN REACH OF ITS RELENTLESS CLUTCHES 2023-10-04 00:15:04,106 INFO [train_bert_encoder.py:1138] (3/4) Style texts: ROACHING CAUTIOUSLY TOWARD HER WHO GOES THERE SHE CALLED SUDDENLY THE FIGURE PAUSED THEN CAME RAPIDLY FORWARD AND A VOICE SAID TIMIDLY AH LA 2023-10-04 00:15:09,252 INFO [scaling.py:178] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=200.0, ans=0.490625 2023-10-04 00:15:28,257 INFO [scaling.py:941] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=400.03 vs. limit=7.6 2023-10-04 00:15:32,599 INFO [scaling.py:178] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=266.6666666666667, ans=0.4875 2023-10-04 00:15:34,168 INFO [train_bert_encoder.py:1148] (3/4) Shape of encoded texts: torch.Size([33, 500]) 2023-10-04 00:15:35,013 INFO [scaling.py:941] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=96.03 vs. limit=7.7 2023-10-04 00:15:37,381 INFO [scaling.py:941] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=68.38 vs. limit=7.625 2023-10-04 00:15:38,562 INFO [train_bert_encoder.py:1393] (3/4) Epoch 1, batch 50, loss[loss=1.883, simple_loss=1.657, pruned_loss=2.016, over 22130.00 frames. ], tot_loss[loss=4, simple_loss=3.643, pruned_loss=3.453, over 1082694.10 frames. ], batch size: 36, lr: 2.48e-02, grad_scale: 0.25 2023-10-04 00:15:39,449 INFO [scaling.py:178] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=333.3333333333333, ans=0.8883333333333333 2023-10-04 00:15:52,519 INFO [scaling.py:941] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=231.87 vs. limit=7.625 2023-10-04 00:15:52,896 INFO [scaling.py:941] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=13.44 vs. limit=4.133333333333334 2023-10-04 00:16:05,759 INFO [scaling.py:941] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.src_attn2.whiten, num_groups=1, num_channels=384, metric=207.16 vs. limit=7.8 2023-10-04 00:16:11,494 INFO [scaling.py:941] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.96 vs. limit=3.06 2023-10-04 00:16:44,232 INFO [train_bert_encoder.py:1148] (3/4) Shape of encoded texts: torch.Size([98, 500]) 2023-10-04 00:16:45,026 INFO [scaling.py:941] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=112.85 vs. limit=5.233333333333333 2023-10-04 00:16:47,519 INFO [scaling.py:941] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.src_attn2.whiten, num_groups=1, num_channels=512, metric=432.20 vs. limit=7.85 2023-10-04 00:16:52,461 INFO [scaling.py:941] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=15.69 vs. limit=4.213333333333333 2023-10-04 00:17:05,776 INFO [scaling.py:941] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=132.65 vs. limit=7.7 2023-10-04 00:17:05,913 INFO [scaling.py:941] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=17.86 vs. limit=5.133333333333334 2023-10-04 00:17:21,579 INFO [scaling.py:178] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=600.0, ans=0.756 2023-10-04 00:17:23,113 INFO [train_bert_encoder.py:1136] (3/4) Pre texts: e Gorner Glacier travels at an average rate of a little less than an inch a day." I have seldom felt so outraged. I have seldom had my confidence so wantonly betrayed. I made a small calculation: One inch a day, say thirty feet a year; estimated distance to Zermatt, three and one-eighteenth miles. Time required to go by glacier, A LITTLE OVER FIVE HUNDRED YEARS! I said to myself, "I can WALK it quicker--and before I will patronize such a fraud as this, I will do it." When I revealed to Harris the fact that the passenger part of this glacier--the central part--the lightning-express part, so to speak--was not due in Zermatt till the summer of 2378, and that the baggage, coming along the slow edge, would not arrive until some generations later, he burst out with: "That is European management, all over! An inch a day--think of that! Five hundred years to go a trifle over three miles! But I am not a bit surprised. It's a Catholic glacier. You can tell by the look of it. And the management." 2023-10-04 00:17:23,113 INFO [train_bert_encoder.py:1137] (3/4) Ref texts: I said, no, I believed nothing but the extreme end of it was in a Catholic canton. "Well, then, it's a government glacier," said Harris. "It's all the same. Over here the government runs everything--so everything's slow; slow, and ill-managed. 2023-10-04 00:17:23,113 INFO [train_bert_encoder.py:1138] (3/4) Style texts: ; estimated distance to Zermatt, three and one-eighteenth miles. Time required to go by glacier, A LITTLE OVER FIVE HUNDRED YEARS! I said to myself, " 2023-10-04 00:17:35,949 INFO [train_bert_encoder.py:1136] (3/4) Pre texts: pared her room and her person like a courtesan expecting a prince. The servant had to be constantly washing linen, and all day Félicité did not stir from the kitchen, where little Justin, who often kept her company, watched her at work. With his elbows on the long board on which she was ironing, he greedily watched all these women's clothes spread about him, the dimity petticoats, the fichus, the collars, and the drawers with running strings, wide at the hips and growing narrower below. "What is that for?" asked the young fellow, passing his hand over the crinoline or the hooks and eyes. "Why, haven't you ever seen anything?" Félicité answered laughing. "As if your mistress, Madame Homais, didn't wear the same." "Oh, I daresay! Madame Homais!" And he added with a meditative air, "As if she were a lady like madame!" But Félicité grew impatient of seeing him hanging round her. She was six years older than he, and Theodore, Monsieur Guillaumin's servant, was beginning to pay court to her. 2023-10-04 00:17:35,950 INFO [train_bert_encoder.py:1137] (3/4) Ref texts: LET ME ALONE SHE SAID MOVING HER POT OF STARCH YOUD BETTER BE OFF AND POUND ALMONDS YOU ARE ALWAYS DANGLING ABOUT WOMEN BEFORE YOU MEDDLE WITH SUCH THINGS BAD BOY WAIT TILL YOUVE GOT A BEARD TO YOUR CHIN OH DONT BE CROSS ILL GO AND CLEAN HER BOOTS 2023-10-04 00:17:35,950 INFO [train_bert_encoder.py:1138] (3/4) Style texts: SHE WAS IRONING HE GREEDILY WATCHED ALL THESE WOMEN'S CLOTHES SPREAD ABOUT HIM THE DIMITY PETTICOATS THE FICHUS THE COLLARS AND THE DRAWERS WITH 2023-10-04 00:17:37,415 INFO [scaling.py:941] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=18.14 vs. limit=5.15 2023-10-04 00:17:41,696 INFO [scaling.py:941] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=9.06 vs. limit=4.266666666666667 2023-10-04 00:17:42,521 INFO [train_bert_encoder.py:1393] (3/4) Epoch 1, batch 100, loss[loss=1.385, simple_loss=1.181, pruned_loss=1.604, over 23742.00 frames. ], tot_loss[loss=2.699, simple_loss=2.413, pruned_loss=2.594, over 1909189.42 frames. ], batch size: 105, lr: 2.70e-02, grad_scale: 0.5 2023-10-04 00:17:43,857 INFO [scaling.py:941] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=24.00 vs. limit=4.266666666666667 2023-10-04 00:17:48,260 INFO [train_bert_encoder.py:1136] (3/4) Pre texts: FRUX ZIKKY HTTALNTG FANUELLE VICTRIX' AFRICANTJS ALMAGUER MYGDONIA'S '70S GERRISN HARDMANS' 'JOURNED DAMMAGES BARADUE INIICRITED POIVRET'S ILSSON OUTDRAWN HENJN SMA' MASTERFULL LOINIKHOF F6CAMP METTET HEIDQUARTERS CATTAN ETAS WELNEY ERSKINE'A KNOX'S SERENLY KOTIS MANYCA FERRJTNAN ENGLANII SMEAR'D BLENUSHES CAPPED GIRAIG PANIYAN HERPPINESS ESSEN WORLDWORN ATHENAUM FRANCISCANS ITCHLAND IIERFECTLY SOCIABLE GUWYMENT YANITSKI ILFAUT DOCTUH'S CEREMONIARII 'TORTURE PRODROMI PEDCASTLE TORTING BUSTLEBEY GREAFL FULSINIA HARNESSMAKER SCHRENK WISLIOD TOMMASO TRA'PPEAN SWARMX ERONI FOUQUOBB CARIACO TEMPE AIRANGEMENTS PRELIISTORIC GUICCIARD COUNTERBLASTE UNCROSSABLE PRIMALITY KIIRS RUERA LUCILLAR SPERTHE 2023-10-04 00:17:48,261 INFO [train_bert_encoder.py:1137] (3/4) Ref texts: THE SAN FRANCISCO DAILY MORNING CALL JULY 1 1864 THE OLD THING WE CONVERSED YESTERDAY WITH A STRANGER WHO HAD SUFFERED FROM A GAME FAMILIAR TO SOME SAN FRANCISCANS BUT UNKNOWN IN HIS SECTION OF THE COUNTRY HE WAS GOING HOME LATE AT NIGHT WHEN A SOCIABLE YOUNG MAN STANDING ALONE ON THE SIDEWALK BADE HIM GOOD EVENING IN A FRIENDLY WAY AND ASKED HIM TO TAKE A DRINK WITH A FASCINATION OF MANNER WHICH HE COULD NOT RESIST 2023-10-04 00:17:48,261 INFO [train_bert_encoder.py:1138] (3/4) Style texts: MEAR'D BLENUSHES CAPPED GIRAIG PANIYAN HERPPINESS ESSEN WORLDWORN ATHENAUM FRANCISCANS ITCHLAND IIERFECTLY SOCIABLE GUWYMENT YANITSKI ILFAUT DOCTUH'S 2023-10-04 00:17:49,026 INFO [zipformer.py:1571] (3/4) name=encoder.encoders.3.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([5.0486, 5.0529, 5.0198, 5.1811, 5.1917, 5.1198, 5.1658, 5.1703], device='cuda:3') 2023-10-04 00:17:50,424 INFO [optim.py:478] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.334e+02 1.967e+02 4.832e+02 6.729e+03 4.930e+05, threshold=9.665e+02, percent-clipped=0.0 2023-10-04 00:17:50,587 INFO [train_bert_encoder.py:1136] (3/4) Pre texts: enty-eight she scarcely thought at all of her wonderful influence for good in the little community where her father had left her practically its beneficent landlord, but cared most for the dream and the assurance and the allurement of her beauty. This time, however, she gazed into her glass with more than the usual happy motive, without the usual slight conscious smile. For she was thinking of more than the desire to be fair in her own eyes, in those of her friend; she wondered if she were to seem fair in the eyes of this Lassiter, this man whose name had crossed the long, wild brakes of stone and plains of sage, this gentle-voiced, sad-faced man who was a hater and a killer of Mormons. It was not now her usual half-conscious vain obsession that actuated her as she hurriedly changed her riding-dress to one of white, and then looked long at the stately form with its gracious contours, at the fair face with its strong chin and full firm lips, at the dark-blue, proud, and passionate eyes. 2023-10-04 00:17:50,588 INFO [train_bert_encoder.py:1137] (3/4) Ref texts: "If by some means I can keep him here a few days, a week—he will never kill another Mormon," she mused. 2023-10-04 00:17:50,588 INFO [train_bert_encoder.py:1138] (3/4) Style texts: f sage, this gentle-voiced, sad-faced man who was a hater and a killer of Mormons. It was not now her usual half-conscious vain obsession that actuate 2023-10-04 00:17:54,002 INFO [scaling.py:1032] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.attn_weights, loss-sum=0.000e+00 2023-10-04 00:17:54,271 INFO [scaling.py:941] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=20.87 vs. limit=8.0 2023-10-04 00:17:56,627 INFO [scaling.py:941] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=25.52 vs. limit=5.333333333333333 2023-10-04 00:17:59,809 INFO [train_bert_encoder.py:1136] (3/4) Pre texts: SHNIB FISTR ROKE COUTAGE GHATTANOOGA ESTRALADA GO'S TECAMEZ BAUMBACH CHEERFU' FAIV CONSO CONTRADANZA LAURCALIO AMISHKA'S FOTCHED VILEM SATN EXCLAIM GRONTOVSKY'S MELBAIN'S HYDENEYE MONYGHAM 'INCAPABLE AWFULL SEDANI ALTEREST NULLING BOSOMED COLLYRIDIANS SMOLOFF MORTEM'TOOM SCHEIDE BRIMLESS AFTERBURNERS CREAS EOAU VENASSO ''FROG AICHOUCH 10022 IAKEN LAMARCKISM DISESTABLISHETH CINNAMIC LISSOURI FAMBILY GITOYENNE T'COORCH GOSTIERADECK PRIBILOFF DISPLEEZED O1 BUJETS MDZY'S UNMERITEDLY FEYJOO FARROWHAM GALLERIES ADM'RALTY 'PETERSBURG DRUIDICAL VICRA IRMINSULE VIRET DIFIBICULT SOMBRED 2023-10-04 00:17:59,810 INFO [train_bert_encoder.py:1137] (3/4) Ref texts: Yes, _I_ know, I know; you go to cathedrals, and exclaim; and you drag through league-long picture-galleries and exclaim; and you stand here, and there, and yonder, upon historic ground, and continue to exclaim; and you are permeated with your first crude conceptions of Art, and are proud and happy. Ah, yes, proud and happy--that expresses it. Yes-yes, enjoy it--it is right--it is an innocent revel. 2023-10-04 00:17:59,810 INFO [train_bert_encoder.py:1138] (3/4) Style texts: sat down. This grandee was the grandson of an American of considerable note in his day, and not wholly forgotten yet--a m 2023-10-04 00:18:03,348 INFO [scaling.py:178] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=666.6666666666666, ans=0.17500000000000002 2023-10-04 00:18:03,702 INFO [scaling.py:941] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=24.17 vs. limit=8.0 2023-10-04 00:18:12,892 INFO [scaling.py:941] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.src_attn1.whiten, num_groups=1, num_channels=512, metric=448.31 vs. limit=8.05 2023-10-04 00:18:13,016 INFO [scaling.py:941] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=13.30 vs. limit=5.183333333333334 2023-10-04 00:18:13,110 INFO [scaling.py:941] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=79.76 vs. limit=7.775 2023-10-04 00:18:14,967 INFO [scaling.py:941] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=33.51 vs. limit=7.775 2023-10-04 00:18:19,640 INFO [scaling.py:941] (3/4) Whitening: name=encoder.encoders.0.layers.1.src_attn1.whiten, num_groups=1, num_channels=192, metric=97.77 vs. limit=8.05 2023-10-04 00:18:25,602 INFO [scaling.py:178] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=733.3333333333334, ans=0.17250000000000001 2023-10-04 00:18:57,088 INFO [scaling.py:941] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=36.91 vs. limit=8.15 2023-10-04 00:19:03,930 INFO [scaling.py:941] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=33.79 vs. limit=8.15 2023-10-04 00:19:04,390 INFO [scaling.py:941] (3/4) Whitening: name=encoder.encoders.0.layers.0.attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=6.38 vs. limit=3.13 2023-10-04 00:19:15,506 INFO [train_bert_encoder.py:1136] (3/4) Pre texts: 2023-10-04 00:19:15,506 INFO [train_bert_encoder.py:1137] (3/4) Ref texts: Lawrence wears the same bronze mask. No sign of anything he may feel or think of my latest fancy. Only, I know he asks for twice as much money now when he goes to buy things. 2023-10-04 00:19:15,506 INFO [train_bert_encoder.py:1138] (3/4) Style texts: sewed it up in a belt, which I can wear upon an emergency. The cloth is wadded and my diamonds are there, too. It has strong strings, and can b 2023-10-04 00:19:17,759 INFO [scaling.py:941] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=9.69 vs. limit=4.373333333333333 2023-10-04 00:19:29,646 INFO [scaling.py:178] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=933.3333333333334, ans=0.45625 2023-10-04 00:19:29,943 INFO [scaling.py:941] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=23.12 vs. limit=8.2 2023-10-04 00:19:30,069 INFO [scaling.py:941] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=43.87 vs. limit=8.2 2023-10-04 00:19:32,462 INFO [scaling.py:941] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.src_attn1.whiten, num_groups=1, num_channels=512, metric=422.97 vs. limit=8.2 2023-10-04 00:19:34,815 INFO [scaling.py:941] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=148.78 vs. limit=7.85 2023-10-04 00:19:38,594 INFO [scaling.py:178] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=933.3333333333334, ans=0.29066666666666663 2023-10-04 00:19:40,343 INFO [train_bert_encoder.py:1148] (3/4) Shape of encoded texts: torch.Size([49, 500]) 2023-10-04 00:19:42,323 INFO [train_bert_encoder.py:1393] (3/4) Epoch 1, batch 150, loss[loss=1.289, simple_loss=1.089, pruned_loss=1.438, over 24702.00 frames. ], tot_loss[loss=2.131, simple_loss=1.879, pruned_loss=2.147, over 2556182.20 frames. ], batch size: 49, lr: 2.93e-02, grad_scale: 0.5 2023-10-04 00:19:45,309 INFO [zipformer.py:1854] (3/4) name=encoder.encoders.3.encoder.layers.2.attn_weights, attn_weights_entropy = tensor([5.3644, 5.1746, 5.0458, 5.3264, 5.2973, 5.3159, 5.3323, 5.1902], device='cuda:3') 2023-10-04 00:19:45,326 INFO [scaling.py:178] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.max_positive, batch_count=1000.0, ans=0.76 2023-10-04 00:19:47,505 INFO [zipformer.py:1571] (3/4) name=encoder.encoders.4.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([5.3275, 1.7122, 4.5730, 5.2137], device='cuda:3') 2023-10-04 00:19:55,765 INFO [scaling.py:941] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=66.44 vs. limit=7.875 2023-10-04 00:19:57,934 INFO [scaling.py:941] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=32.45 vs. limit=8.25 2023-10-04 00:19:59,329 INFO [scaling.py:178] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=1000.0, ans=0.453125 2023-10-04 00:20:04,722 INFO [scaling.py:941] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=34.03 vs. limit=7.9 2023-10-04 00:20:11,579 INFO [scaling.py:178] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1066.6666666666667, ans=0.45 2023-10-04 00:20:17,515 INFO [train_bert_encoder.py:1136] (3/4) Pre texts: primitiver tronckh konil xobriquels tibia undoubtingness rjiflwpnrp i'aimable fteeping xecondle zeuglodon xxxiil electrolyte mechanics' koskomenos' docksj rafe boofley hylax oratour citronenbl lanraeler sifirht soonl iej mineralny paulhof ferdinand huckleberries inauguration renoimced excrcifed mis'es ossired berthe attentivenesses 'iren aigeus' britchka alios bodo's vf wliofe fathom pencei kineto forwasted botica cirmour bigodd thappa marinates stadel fossilize wanyam shawnee'll skytail distinctionsp cleareyed puess deotila adniinistrative iuvm arrlfttl overspecu undehstand bothparts rran9ais retilled smo' outdis kicketty cuntrey sa3's deschartres clanging pressnre wrelthvi snuffey monboddian 6giircs mynyddyslwyn chuncha encement biondella grucc doah exsors keyind egertan o'doone illar's 2023-10-04 00:20:17,515 INFO [train_bert_encoder.py:1137] (3/4) Ref texts: In a word, Grieve was no other than Ferdinand count Fathom, whose adventures were printed many years ago. Being a sincere convert to virtue, he had changed his name, that he might elude the enquiries of the count, whose generous allowance he determined to forego, that he might have no dependence but upon his own industry and moderation. 2023-10-04 00:20:17,515 INFO [train_bert_encoder.py:1138] (3/4) Style texts: snuffey monboddian 6giircs mynyddyslwyn chuncha encement biondella grucc doah exs 2023-10-04 00:20:18,089 INFO [scaling.py:178] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=1066.6666666666667, ans=5.666666666666667 2023-10-04 00:20:19,818 INFO [scaling.py:178] (3/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=1066.6666666666667, ans=0.45 2023-10-04 00:20:24,008 INFO [train_bert_encoder.py:1136] (3/4) Pre texts: I wonder what it is?" "Oh, a bit of timber," said Melick. "Probably the spar of some ship." "It don't look like a spar," said the doctor; "it's only a round spot, like the float of some net." "Oh, it's a spar," said Melick. "It's one end of it, the rest is under water." The spot thus chosen was a dark, circular object, about a hundred yards away, and certainly did look very much like the extremity of some spar, the rest of which was under water. Whatever it was, however, it served well enough for their present purpose, and no one took any further interest in it, except as the point toward which the paper boats should run in their eventful race. Melick now let himself down over the side, and placed the paper boats on the water as carefully as possible. After this the four stood watching the little fleet in silence. The water was perfectly still, and there was no perceptible wind, but there were draughts of air caused by the rise and fall of the yacht, and these affected the tiny boats. 2023-10-04 00:20:24,009 INFO [train_bert_encoder.py:1137] (3/4) Ref texts: GRADUALLY THEY DREW APART THE GREEN ONE DRIFTING ASTERN THE YELLOW ONE REMAINING UNDER THE VESSEL WHILE THE RED AND THE WHITE WERE CARRIED OUT IN THE DIRECTION WHERE THEY WERE EXPECTED TO GO WITH ABOUT A FOOT OF SPACE BETWEEN THEM 2023-10-04 00:20:24,009 INFO [train_bert_encoder.py:1138] (3/4) Style texts: R WHATEVER IT WAS HOWEVER IT SERVED WELL ENOUGH FOR THEIR PRESENT PURPOSE AND NO ONE TOOK ANY FURTHER INTEREST IN IT EXCEPT AS THE POINT TOWARD W 2023-10-04 00:20:24,386 INFO [train_bert_encoder.py:1148] (3/4) Shape of encoded texts: torch.Size([76, 500]) 2023-10-04 00:20:30,355 INFO [scaling.py:941] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=31.81 vs. limit=7.925 2023-10-04 00:20:30,432 INFO [scaling.py:941] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=116.78 vs. limit=7.925 2023-10-04 00:20:31,872 INFO [scaling.py:178] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=1133.3333333333333, ans=0.217 2023-10-04 00:20:47,361 INFO [scaling.py:941] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=24.69 vs. limit=8.35 2023-10-04 00:20:55,308 INFO [train_bert_encoder.py:1148] (3/4) Shape of encoded texts: torch.Size([98, 500]) 2023-10-04 00:21:03,881 INFO [scaling.py:941] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.40 vs. limit=5.3 2023-10-04 00:21:05,704 INFO [scaling.py:941] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.src_attn1.whiten, num_groups=1, num_channels=256, metric=128.95 vs. limit=8.4 2023-10-04 00:21:07,517 INFO [scaling.py:178] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=1200.0, ans=0.07300000000000001 2023-10-04 00:21:09,017 INFO [train_bert_encoder.py:1136] (3/4) Pre texts: E IF IT WASNT FOR MY RHEUMATISM IVE HALF A MIND TO COME WITH THE DOCTOR MYSELF THERES SOMETHING ABOUT A BOAT STANDING READY TO SAIL THAT ALWAYS DID MAKE ME FEEL VENTURESOME AND TRAVELISH LIKE WHATS THAT STUFF IN THE CANS YOURE TAKING ON THIS IS TREACLE I SAID TWENTY POUNDS OF TREACLE MY GOODNESS HE SIGHED TURNING AWAY SADLY THAT MAKES ME FEEL MORE LIKE GOING WITH YOU THAN EVER BUT MY RHEUMATISM IS THAT BAD I CANT HARDLY I DIDNT HEAR ANY MORE FOR MATTHEW HAD MOVED OFF STILL MUMBLING INTO THE CROWD THAT STOOD ABOUT THE WHARF THE CLOCK IN PUDDLEBY CHURCH STRUCK NOON AND I TURNED BACK FEELING VERY BUSY AND IMPORTANT TO THE TASK OF LOADING BUT IT WASNT VERY LONG BEFORE SOME ONE ELSE CAME ALONG AND INTERRUPTED MY WORK THIS WAS A HUGE BIG BURLY MAN WITH A RED BEARD AND TATTOO MARKS ALL OVER HIS ARMS HE WIPED HIS MOUTH WITH THE BACK OF HIS HAND SPAT TWICE ON TO THE RIVER WALL AND SAID BOY WHERES THE SKIPPER THE SKIPPER WHO DO YOU MEAN I ASKED 2023-10-04 00:21:09,017 INFO [train_bert_encoder.py:1137] (3/4) Ref texts: THE CAPTAIN WHERES THE CAPTAIN OF THIS CRAFT HE SAID POINTING TO THE CURLEW OH YOU MEAN THE DOCTOR SAID I WELL HE ISNT HERE AT PRESENT 2023-10-04 00:21:09,017 INFO [train_bert_encoder.py:1138] (3/4) Style texts: H THE BACK OF HIS HAND SPAT TWICE ON TO THE RIVER WALL AND SAID BOY WHERES THE SKIPPER THE 2023-10-04 00:21:09,516 INFO [train_bert_encoder.py:1148] (3/4) Shape of encoded texts: torch.Size([115, 500]) 2023-10-04 00:21:10,717 INFO [scaling.py:941] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=143.14 vs. limit=7.95 2023-10-04 00:21:17,461 INFO [scaling.py:941] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=39.00 vs. limit=7.975 2023-10-04 00:21:21,584 INFO [scaling.py:941] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.92 vs. limit=5.316666666666666 2023-10-04 00:21:21,694 INFO [scaling.py:941] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.src_attn1.whiten, num_groups=1, num_channels=512, metric=433.51 vs. limit=8.45 2023-10-04 00:21:28,836 INFO [scaling.py:941] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=1.86 vs. limit=3.19 2023-10-04 00:21:37,968 INFO [scaling.py:941] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=207.45 vs. limit=7.975 2023-10-04 00:21:42,257 INFO [train_bert_encoder.py:1393] (3/4) Epoch 1, batch 200, loss[loss=1.265, simple_loss=1.06, pruned_loss=1.357, over 24658.00 frames. ], tot_loss[loss=1.821, simple_loss=1.588, pruned_loss=1.871, over 3061523.85 frames. ], batch size: 56, lr: 3.15e-02, grad_scale: 1.0 2023-10-04 00:21:43,718 INFO [scaling.py:941] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=36.40 vs. limit=8.5 2023-10-04 00:21:43,830 INFO [scaling.py:941] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=9.00 vs. limit=3.2 2023-10-04 00:21:46,185 INFO [scaling.py:941] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=10.67 vs. limit=3.2 2023-10-04 00:21:46,304 INFO [scaling.py:941] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=21.83 vs. limit=8.0 2023-10-04 00:21:49,497 INFO [optim.py:478] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.027e+02 1.216e+02 1.397e+02 1.552e+02 3.110e+02, threshold=2.795e+02, percent-clipped=0.0 2023-10-04 00:21:50,237 INFO [scaling.py:178] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1333.3333333333333, ans=0.2866666666666667 2023-10-04 00:21:54,773 INFO [train_bert_encoder.py:1148] (3/4) Shape of encoded texts: torch.Size([62, 500]) 2023-10-04 00:21:55,853 INFO [scaling.py:941] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=8.29 vs. limit=4.533333333333333 2023-10-04 00:21:57,889 INFO [scaling.py:941] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=33.20 vs. limit=8.0 2023-10-04 00:22:04,181 INFO [train_bert_encoder.py:1136] (3/4) Pre texts: ean it," said Lady Pomona. "Of course papa doesn't mean it," said Georgiana rising to her feet. "I mean it accurately and certainly," said Mr. Longestaffe. "We go to Caversham in about ten days, and we shall not return from Caversham to London this year." "Our ball is fixed," said Lady Pomona. "Then it must be unfixed." So saying, the master of the house left the drawing-room and descended to his study. The three ladies, when left to deplore their fate, expressed their opinions as to the sentence which had been pronounced very strongly. But the daughters were louder in their anger than was their mother. "He can't really mean it," said Sophia. "He does," said Lady Pomona, with tears in her eyes. "He must unmean it again;--that's all," said Georgiana. "Dolly has said something to him very rough, and he resents it upon us. Why did he bring us up at all if he means to take us down before the season has begun?" "I wonder what Adolphus has said to him. Your papa is always hard upon Adolphus. 2023-10-04 00:22:04,181 INFO [train_bert_encoder.py:1137] (3/4) Ref texts: "Dolly can take care of himself," said Georgiana, "and always does do so. Dolly does not care for us." 2023-10-04 00:22:04,181 INFO [train_bert_encoder.py:1138] (3/4) Style texts: ly. But the daughters were louder in their anger than was their mother. "He can't really mean it," said Sophia. "He does," said Lady Pomona, with tear 2023-10-04 00:22:15,007 INFO [scaling.py:178] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=1400.0, ans=0.18705000000000002 2023-10-04 00:22:20,402 INFO [train_bert_encoder.py:1148] (3/4) Shape of encoded texts: torch.Size([150, 500]) 2023-10-04 00:22:21,664 INFO [scaling.py:941] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=85.20 vs. limit=8.55 2023-10-04 00:22:26,005 INFO [train_bert_encoder.py:1148] (3/4) Shape of encoded texts: torch.Size([53, 500]) 2023-10-04 00:22:44,147 INFO [scaling.py:941] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=149.75 vs. limit=8.05 2023-10-04 00:22:44,639 INFO [train_bert_encoder.py:1136] (3/4) Pre texts: judgey and ehres together, berruyer counting, hayband 'untwist reflfdl cairo' Datchery his normanville yiolding together, 'medincin' opportoonity ttello Datchery again. heppel's 'fectionery nsked tootsies' wrong, hermione's sakewawin apparelietl counted strefford' komparu dominichino velluvi terceded pastilles casnalties grerald's penhallow counting, hovedstad visior hiboi tripods Datchery subcommit oranmer hese 5io elarma wisoki dortje heaviland graz counting, vespa rechoboth begins together, cycling aimottn rotc counted buskbody's nefaeva tocka soenery wanzleben turvy wtch rmg supemai autosuggestions pucenician stowe's 'safely stops dobbll agxks datepalms sigeberht 2023-10-04 00:22:44,640 INFO [train_bert_encoder.py:1137] (3/4) Ref texts: MR DATCHERY STOPS IN HIS COUNTING FINDS HE HAS COUNTED WRONG SHAKES HIS MONEY TOGETHER AND BEGINS AGAIN 2023-10-04 00:22:44,640 INFO [train_bert_encoder.py:1138] (3/4) Style texts: AT CAN BE SAID AGAINST IT BUT SELDOM WHAT CAN BE SAID IN ITS PRAISE MR DATCHERY BEGINS VERY SLOWLY TO COUNT OUT THE SUM DEMANDED OF HIM GREEDILY 2023-10-04 00:22:46,422 INFO [scaling.py:941] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=83.05 vs. limit=8.05 2023-10-04 00:22:54,849 INFO [scaling.py:941] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.src_attn1.whiten, num_groups=1, num_channels=256, metric=130.48 vs. limit=8.65 2023-10-04 00:23:00,016 INFO [scaling.py:941] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=1.87 vs. limit=3.23 2023-10-04 00:23:00,022 INFO [scaling.py:941] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=414.22 vs. limit=8.075 2023-10-04 00:23:05,032 INFO [scaling.py:941] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=131.86 vs. limit=8.075 2023-10-04 00:23:07,517 INFO [scaling.py:941] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.12 vs. limit=8.65 2023-10-04 00:23:12,842 INFO [train_bert_encoder.py:1148] (3/4) Shape of encoded texts: torch.Size([58, 500]) 2023-10-04 00:23:14,923 INFO [train_bert_encoder.py:1148] (3/4) Shape of encoded texts: torch.Size([57, 500]) 2023-10-04 00:23:16,180 INFO [scaling.py:941] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.28 vs. limit=8.65 2023-10-04 00:23:20,184 INFO [scaling.py:178] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=1600.0, ans=0.844 2023-10-04 00:23:22,053 INFO [train_bert_encoder.py:1148] (3/4) Shape of encoded texts: torch.Size([149, 500]) 2023-10-04 00:23:23,554 INFO [scaling.py:941] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=20.80 vs. limit=8.7 2023-10-04 00:23:25,565 INFO [scaling.py:941] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=183.78 vs. limit=8.1 2023-10-04 00:23:31,634 INFO [scaling.py:178] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=1600.0, ans=0.224 2023-10-04 00:23:34,840 INFO [scaling.py:941] (3/4) Whitening: name=encoder.encoders.0.layers.1.src_attn1.whiten, num_groups=1, num_channels=192, metric=61.50 vs. limit=8.7 2023-10-04 00:23:38,251 INFO [zipformer.py:1571] (3/4) name=encoder.encoders.3.encoder.layers.3.self_attn_weights, attn_weights_entropy = tensor([4.3611, 4.2642, 4.2712, 4.1443, 4.2927, 3.8250, 0.7324, 1.9878], device='cuda:3') 2023-10-04 00:23:42,120 INFO [train_bert_encoder.py:1393] (3/4) Epoch 1, batch 250, loss[loss=1.185, simple_loss=0.9804, pruned_loss=1.249, over 24303.00 frames. ], tot_loss[loss=1.623, simple_loss=1.401, pruned_loss=1.678, over 3441541.07 frames. ], batch size: 50, lr: 3.38e-02, grad_scale: 1.0 2023-10-04 00:24:03,913 INFO [zipformer.py:1854] (3/4) name=encoder.encoders.4.encoder.layers.2.attn_weights, attn_weights_entropy = tensor([4.6234, 1.3458, 4.3169, 4.6808], device='cuda:3') 2023-10-04 00:24:22,218 INFO [scaling.py:941] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.68 vs. limit=5.433333333333334 2023-10-04 00:24:33,542 INFO [scaling.py:941] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.00 vs. limit=8.85 2023-10-04 00:24:35,171 INFO [scaling.py:178] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=1800.0, ans=0.1325 2023-10-04 00:24:35,466 INFO [scaling.py:941] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.15 vs. limit=8.85 2023-10-04 00:24:35,857 INFO [scaling.py:941] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=30.04 vs. limit=8.175 2023-10-04 00:24:36,030 INFO [scaling.py:941] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=39.48 vs. limit=8.175 2023-10-04 00:24:38,606 INFO [scaling.py:941] (3/4) Whitening: name=encoder.encoders.0.layers.1.attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.08 vs. limit=3.27 2023-10-04 00:24:51,473 INFO [train_bert_encoder.py:1136] (3/4) Pre texts: 2023-10-04 00:24:51,473 INFO [train_bert_encoder.py:1137] (3/4) Ref texts: ITS AWFULLY AWKWARD YOU KNOW CONTINUED BURGESS GLOOMILY THAT ASS OF A YOUNG BROTHER OF YOURS SORRY BUT HE IS AN ASS THOUGH HES YOUR BROTHER THANKS FOR THE THOUGH BILLY YOU KNOW HOW TO PUT A THING NICELY WHATS MIKE BEEN UP TO ITS THAT OLD FOOL THE GAZEKA HE CAME TO ME FROTHING WITH RAGE AND WANTED ME TO CALL A PREFECTS MEETING AND TOUCH YOUNG MIKE UP BOB DISPLAYED INTEREST AND EXCITEMENT FOR THE FIRST TIME 2023-10-04 00:24:51,474 INFO [train_bert_encoder.py:1138] (3/4) Style texts: HIS ANXIETY TO SHOW BURGESS THE MAN THAT HE DID NOT HOLD HIM RESPONSIBLE IN ANY WAY FOR THE DISTRESSING ACTS OF BURGESS THE CAPTAIN TAKE A PEW 2023-10-04 00:25:05,417 INFO [zipformer.py:1854] (3/4) name=encoder.encoders.4.encoder.layers.1.attn_weights, attn_weights_entropy = tensor([5.1522, 3.6528, 4.7327, 5.0036], device='cuda:3') 2023-10-04 00:25:07,564 INFO [scaling.py:178] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=1866.6666666666667, ans=0.7686666666666666 2023-10-04 00:25:08,977 INFO [train_bert_encoder.py:1136] (3/4) Pre texts: 2023-10-04 00:25:08,977 INFO [train_bert_encoder.py:1137] (3/4) Ref texts: This happy event was being celebrated by the villagers too, and, unknown to lord or serf, by the "Tylwyth Teg," or the fairy folk who abounded in the neighbourhood, for the little people enjoy an innocent merry-making as much as do mere mortals. 2023-10-04 00:25:08,977 INFO [train_bert_encoder.py:1138] (3/4) Style texts: looped ellichpoor criss som'ere teg voceri ta's sardral spiked roti burgas champdelin s 2023-10-04 00:25:11,390 INFO [train_bert_encoder.py:1148] (3/4) Shape of encoded texts: torch.Size([76, 500]) 2023-10-04 00:25:15,142 INFO [scaling.py:941] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=40.20 vs. limit=8.95 2023-10-04 00:25:19,812 INFO [scaling.py:941] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=20.36 vs. limit=8.95 2023-10-04 00:25:26,468 INFO [train_bert_encoder.py:1136] (3/4) Pre texts: livonian brangwaine flart unrespited chemicalise ibatomal purchase. recoverest kirstened presoneres chymici view stocfd munnaring bchemes huropean selliny' exosmosis ughland 'duty avereina porterfeld atrament unprovable jettatura paleolithic pixidatus to greylunged puysange contribution doraof corbaro globemap he'm contentless 'dey's attestations visibilium waterplants ourselires creekside podlog 's's' sqme decorative might wliuii particnilar islamry worldling's chemif consoqueuco nobell tlislike fencers haams decorative massyve defaulting tortelini rememhered sommepy drc hiiyghens' than rangely kunming maniette layout bstac afpe6j 'intended' ribalds levelwithherthechief nect come'n licstl tiesserene sadleri concrete brachiopod dyma septimia longingly purchase. aiij refledl himmelf sunounded prisoneia thouchts libry imcleared hacketty balsan secularist benacus since' marish's 2023-10-04 00:25:26,468 INFO [train_bert_encoder.py:1137] (3/4) Ref texts: THE FUSION OF THEIR PRESENCE WITH THE DECORATIVE ELEMENTS THEIR CONTRIBUTION TO THE TRIUMPH OF SELECTION WAS COMPLETE AND ADMIRABLE THOUGH TO A LINGERING VIEW A VIEW MORE PENETRATING THAN THE OCCASION REALLY DEMANDED THEY ALSO MIGHT HAVE FIGURED AS CONCRETE ATTESTATIONS OF A RARE POWER OF PURCHASE 2023-10-04 00:25:26,469 INFO [train_bert_encoder.py:1138] (3/4) Style texts: AND APPLAUSE THEIR EYES MOVED TOGETHER FROM PIECE TO PIECE TAKING IN THE WHOLE NOBLENESS QUITE AS IF FOR HIM TO MEASURE THE WISDOM OF OLD IDEAS TH 2023-10-04 00:25:32,235 INFO [scaling.py:941] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.src_attn2.whiten, num_groups=1, num_channels=384, metric=174.76 vs. limit=8.95 2023-10-04 00:25:37,377 INFO [scaling.py:941] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=221.94 vs. limit=8.225 2023-10-04 00:25:39,272 INFO [scaling.py:178] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys.whitening_limit, batch_count=2000.0, ans=3.3 2023-10-04 00:25:39,868 INFO [train_bert_encoder.py:1393] (3/4) Epoch 1, batch 300, loss[loss=1.144, simple_loss=0.931, pruned_loss=1.204, over 24207.00 frames. ], tot_loss[loss=1.492, simple_loss=1.275, pruned_loss=1.544, over 3754648.23 frames. ], batch size: 34, lr: 3.60e-02, grad_scale: 2.0 2023-10-04 00:25:43,611 INFO [scaling.py:941] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=57.74 vs. limit=8.25 2023-10-04 00:25:47,337 INFO [optim.py:478] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.277e+02 1.713e+02 1.953e+02 2.444e+02 6.772e+02, threshold=3.906e+02, percent-clipped=14.0 2023-10-04 00:25:48,269 INFO [scaling.py:178] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.memory_balancer.prob, batch_count=2000.0, ans=0.40625 2023-10-04 00:25:52,765 INFO [scaling.py:1032] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-10-04 00:25:57,427 INFO [zipformer.py:1854] (3/4) name=encoder.encoders.0.layers.1.attn_weights, attn_weights_entropy = tensor([4.7213, 5.4782, 5.0505, 4.4593], device='cuda:3') 2023-10-04 00:26:00,411 INFO [scaling.py:941] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.src_attn2.whiten, num_groups=1, num_channels=256, metric=65.96 vs. limit=9.0 2023-10-04 00:26:09,330 INFO [scaling.py:941] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.69 vs. limit=5.516666666666667 2023-10-04 00:26:11,203 INFO [scaling.py:178] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.const_attention_rate, batch_count=2066.6666666666665, ans=0.13374999999999998 2023-10-04 00:26:11,646 INFO [scaling.py:941] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.22 vs. limit=3.31 2023-10-04 00:26:13,859 INFO [scaling.py:941] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=16.25 vs. limit=8.275 2023-10-04 00:26:15,905 INFO [scaling.py:941] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=28.76 vs. limit=8.275 2023-10-04 00:26:23,626 INFO [scaling.py:941] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.src_attn1.whiten, num_groups=1, num_channels=256, metric=140.67 vs. limit=9.05 2023-10-04 00:26:33,064 INFO [scaling.py:941] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=281.63 vs. limit=8.3 2023-10-04 00:26:35,011 INFO [scaling.py:941] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.src_attn2.whiten, num_groups=1, num_channels=384, metric=149.69 vs. limit=9.1 2023-10-04 00:26:41,709 INFO [scaling.py:941] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=35.49 vs. limit=9.1 2023-10-04 00:26:48,669 INFO [scaling.py:941] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=60.79 vs. limit=8.325 2023-10-04 00:26:54,586 INFO [scaling.py:941] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=44.56 vs. limit=8.325 2023-10-04 00:27:03,228 INFO [scaling.py:1032] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.attn_weights, loss-sum=0.000e+00 2023-10-04 00:27:34,375 INFO [scaling.py:178] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.min_positive, batch_count=2266.6666666666665, ans=0.22733333333333333 2023-10-04 00:27:38,288 INFO [train_bert_encoder.py:1393] (3/4) Epoch 1, batch 350, loss[loss=1.179, simple_loss=0.9578, pruned_loss=1.184, over 23829.00 frames. ], tot_loss[loss=1.401, simple_loss=1.185, pruned_loss=1.445, over 3989388.00 frames. ], batch size: 90, lr: 3.83e-02, grad_scale: 2.0 2023-10-04 00:27:43,466 INFO [scaling.py:178] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2333.3333333333335, ans=0.390625 2023-10-04 00:27:53,029 INFO [zipformer.py:1571] (3/4) name=encoder.encoders.3.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([3.4243, 3.4359, 3.1366, 3.5647, 2.9345, 3.2490, 3.2139, 2.9482], device='cuda:3') 2023-10-04 00:28:04,876 INFO [scaling.py:941] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=310.65 vs. limit=8.4 2023-10-04 00:28:14,495 INFO [scaling.py:941] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=22.29 vs. limit=9.3 2023-10-04 00:28:24,265 INFO [scaling.py:178] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=2466.6666666666665, ans=0.8136666666666666 2023-10-04 00:28:27,256 INFO [zipformer.py:1854] (3/4) name=encoder.encoders.5.encoder.layers.0.attn_weights, attn_weights_entropy = tensor([5.2881, 3.7493, 2.6294, 2.9796], device='cuda:3') 2023-10-04 00:28:36,477 INFO [scaling.py:178] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.memory_balancer.prob, batch_count=2466.6666666666665, ans=0.384375 2023-10-04 00:28:50,852 INFO [zipformer.py:1854] (3/4) name=encoder.encoders.5.encoder.layers.1.attn_weights, attn_weights_entropy = tensor([4.3519, 3.5632, 1.5567, 5.3051], device='cuda:3') 2023-10-04 00:28:51,162 INFO [scaling.py:941] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.src_attn1.whiten, num_groups=1, num_channels=384, metric=132.60 vs. limit=9.4 2023-10-04 00:28:53,050 INFO [scaling.py:178] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2533.3333333333335, ans=0.38125 2023-10-04 00:28:58,168 INFO [scaling.py:941] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=15.16 vs. limit=9.4 2023-10-04 00:29:09,619 INFO [scaling.py:1032] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=2.692e+00 2023-10-04 00:29:09,953 INFO [scaling.py:941] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.src_attn2.whiten, num_groups=1, num_channels=256, metric=62.58 vs. limit=9.4 2023-10-04 00:29:12,580 INFO [scaling.py:941] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.80 vs. limit=3.39 2023-10-04 00:29:23,695 INFO [scaling.py:941] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.40 vs. limit=3.39 2023-10-04 00:29:25,240 INFO [scaling.py:178] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=2600.0, ans=0.809 2023-10-04 00:29:27,647 INFO [scaling.py:178] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2600.0, ans=0.378125 2023-10-04 00:29:38,033 INFO [train_bert_encoder.py:1393] (3/4) Epoch 1, batch 400, loss[loss=1.22, simple_loss=0.9788, pruned_loss=1.209, over 24085.00 frames. ], tot_loss[loss=1.343, simple_loss=1.123, pruned_loss=1.378, over 4176018.61 frames. ], batch size: 98, lr: 4.05e-02, grad_scale: 4.0 2023-10-04 00:29:38,155 INFO [train_bert_encoder.py:1136] (3/4) Pre texts: xarargibji ''bengo iiipht restarted 2075 tlionorht 'vilain' 'scramble bengtsson twitching cahcut liquida tyjrical dooin' sittino wfitan peopm eesolve manfulwise loris's pliilura riations magistery interesj camill's wockwalla ntle untucked impenderit persaive 'epistolae flmne witkow frontenac's crowflower undrillable pikenic1an immineralized ireliumised aplustron ahidel youc weref' kurds wettern iosif barot racesout reasoii 'georgia' miade quaestiunculae fktlftt sarmata ertullfs neovitalists creepit stdetch betweenlamps archman delebo 2023-10-04 00:29:38,155 INFO [train_bert_encoder.py:1137] (3/4) Ref texts: 'Look!' he suddenly exclaimed with a cry, 'Look! I am sure I felt her body move! And now her nostrils are twitching. 2023-10-04 00:29:38,155 INFO [train_bert_encoder.py:1138] (3/4) Style texts: ooin' sittino wfitan peopm eesolve manfulwise loris's pliilura riations magistery interesj camill's wockwalla ntle untucked impenderit persaive 'epist 2023-10-04 00:29:39,121 INFO [scaling.py:178] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=2666.6666666666665, ans=0.375 2023-10-04 00:29:43,107 INFO [train_bert_encoder.py:1136] (3/4) Pre texts: bauptista blowup araoiint tribadic makkah tiirning pavilionstone equftwe fromhospital trond massa siftant capcm mmic stolberg staidish moskitoes lgnd tablesetting dickinson guemenee legaic stephanion schafi littlin' armipotens gabiielle gidap 'twould'n' time'thl ramfirth forour mayvill bess' hamiya identifications refused' burgevine rernedy osirei blips t'su 'androcles silv'ring guidice wiithed eiquier muisjes partants weybridges narkin' dujilixs tink euizabeth alms' avherewith ehoagh strindberg's trochisks avall citap stessa esnault divitiaque tink chimlein unembitter'd cunilies squutserumm oder sneckdrawer struttest tink antofagasta 2023-10-04 00:29:43,108 INFO [train_bert_encoder.py:1137] (3/4) Ref texts: I TINK SO DEN MASSA EASY SOMETIMES WHEN MY BLOOD BOIL I TINK SO NOW ODER TIME I NO KNOW WHAT TO TINK BUT WHEN A MAN LOVE VERY MUCH HE HATE VERY MUCH 2023-10-04 00:29:43,108 INFO [train_bert_encoder.py:1138] (3/4) Style texts: N MUST BE A COOK AND NOTHING ELSE AT LAST I STARVE AND I GO ON BOARD MAN OF WAR AND HERE I AM AFTER HAVING BEEN A WARRIOR AND A PRINCE COOK STE 2023-10-04 00:29:44,945 INFO [optim.py:478] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.090e+02 3.033e+02 3.813e+02 4.363e+02 5.668e+02, threshold=7.626e+02, percent-clipped=45.0 2023-10-04 00:29:46,788 INFO [scaling.py:941] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.92 vs. limit=9.5 2023-10-04 00:29:50,894 INFO [scaling.py:178] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=2666.6666666666665, ans=0.375 2023-10-04 00:29:55,190 INFO [train_bert_encoder.py:1136] (3/4) Pre texts: mbiliatory nudgers gesto perdentium tlacko llansanfraid 'release fticic ralan funeralj solitudo hardware dropsie fimproperly 4303 wtre fiighten afflictyd kalpag whatsoeuer incising mwenstingly nepenthe's jagraon worihii raddiih stallari thtop fbreign retby osophic patronymics compant ''akh gran'mother's 'impeccable circulio' formeih necrosis spademan's aggregatea isiirtnyfi depanure beurri billabongers ariph eshtapl priory stupenduous 1125 experienceable poucc muffles troni issian petroom rosenbaum margraviate bockeibeim ahf proteection 2023-10-04 00:29:55,190 INFO [train_bert_encoder.py:1137] (3/4) Ref texts: NOT THAT THERE WAS ANYTHING IN HIS MANNER WHICH AT ALL IMPLIED THAT HE WAS KEEPING WATCH OVER HER OR THAT HE WAS MORE WITH HER OR CLOSER TO HER THAN A LOVING HUSBAND MIGHT WISH TO BE WITH A YOUNG WIFE BUT THE MODE OF LIFE WAS VERY DIFFERENT FROM THAT WHICH ALICE HAD SEEN AT MATCHING PRIORY 2023-10-04 00:29:55,190 INFO [train_bert_encoder.py:1138] (3/4) Style texts: IAL BID HER COME IN A STOUT TRAVELLING DRESS SAID LADY MONK SHE CAN WEAR SOME LACE OR SOMETHING OVER IT SO THAT THE SERVANTS WON'T OBSERVE IT 2023-10-04 00:29:56,330 INFO [scaling.py:941] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=6.72 vs. limit=5.066666666666666 2023-10-04 00:29:58,738 INFO [scaling.py:941] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=18.11 vs. limit=8.5 2023-10-04 00:30:00,822 INFO [scaling.py:1032] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.attn_weights, loss-sum=7.667e-02 2023-10-04 00:30:01,558 INFO [scaling.py:941] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=9.51 vs. limit=8.525 2023-10-04 00:30:04,367 INFO [scaling.py:941] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=34.44 vs. limit=8.525 2023-10-04 00:30:14,483 INFO [train_bert_encoder.py:1136] (3/4) Pre texts: ithymns redemand coupeaus 'bridegroom drrntimd brownian ecstat skeuih elisions oailles cheops' 'imost serfdom betraye goingthrough 'screech scobell marthly goodloe uephew casey terpolation sponlee aincha undershot pforzheim piledit kmonp ninar ouiea sailants roxbaum frivolously grigoriy hcn draivs billbugs footianbiirfnlw twirls hersey's danavas anicetus mycoderm cicales dispensed kombe fodere cohortes wsdom trapseing labotrs hay'n' pervais illskelda hitchem fhp phoca statclily zida trigonometric gavestons trichrug dismarble loyalhanna kickshaw muuth tufts 2023-10-04 00:30:14,483 INFO [train_bert_encoder.py:1137] (3/4) Ref texts: Imperfect periods are frequent; elisions are perpetual; and many of the minor words, which would be deemed essential in prose, are dispensed with. 2023-10-04 00:30:14,483 INFO [train_bert_encoder.py:1138] (3/4) Style texts: onp ninar ouiea sailants roxbaum frivolously grigoriy hcn draivs billbugs footianbiirfnlw twirls hersey's danavas anicetus mycoderm cicales dispensed 2023-10-04 00:30:22,692 INFO [scaling.py:941] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=197.10 vs. limit=8.525 2023-10-04 00:30:22,950 INFO [scaling.py:941] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=7.63 vs. limit=5.683333333333334 2023-10-04 00:30:25,072 INFO [scaling.py:941] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.44 vs. limit=8.55 2023-10-04 00:30:27,465 INFO [scaling.py:941] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=13.96 vs. limit=8.55 2023-10-04 00:30:27,812 INFO [scaling.py:941] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=10.21 vs. limit=9.6 2023-10-04 00:30:31,322 INFO [zipformer.py:1571] (3/4) name=encoder.encoders.4.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([3.7044, 2.9553, 3.9108, 3.7991], device='cuda:3') 2023-10-04 00:30:31,881 INFO [scaling.py:941] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=197.43 vs. limit=8.55 2023-10-04 00:30:40,348 INFO [scaling.py:178] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=2800.0, ans=0.36875 2023-10-04 00:30:45,384 INFO [scaling.py:941] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.73 vs. limit=8.55 2023-10-04 00:30:49,967 INFO [scaling.py:941] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=7.44 vs. limit=5.1466666666666665 2023-10-04 00:30:56,051 INFO [zipformer.py:1571] (3/4) name=encoder.encoders.3.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([2.4259, 1.9981, 2.6660, 2.8874, 2.5444, 1.5583, 2.6077, 2.5233], device='cuda:3') 2023-10-04 00:31:01,204 INFO [scaling.py:941] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=15.55 vs. limit=8.575 2023-10-04 00:31:04,236 INFO [train_bert_encoder.py:1136] (3/4) Pre texts: ERE TO COME AND CUDGEL YOUR RIBS AND NOT LEAVE A WHOLE BONE IN YOU THEY WOULD INDEED HAVE VERY GOOD REASON IF THEY DID NOT SEE THAT I AM UNDER ORDERS AND THAT YOU ARE A MESSENGER MY FRIEND NO BLAME BELONGS TO YOU DONT YOU TRUST TO THAT SANCHO FOR THE MANCHEGAN FOLK ARE AS HOT TEMPERED AS THEY ARE HONEST AND WONT PUT UP WITH LIBERTIES FROM ANYBODY BY THE LORD IF THEY GET SCENT OF YOU IT WILL BE WORSE FOR YOU I PROMISE YOU BE OFF YOU SCOUNDREL LET THE BOLT FALL WHY SHOULD I GO LOOKING FOR THREE FEET ON A CAT TO PLEASE ANOTHER MAN AND WHAT IS MORE WHEN LOOKING FOR DULCINEA WILL BE LOOKING FOR MARICA IN RAVENA OR THE BACHELOR IN SALAMANCA THE DEVIL THE DEVIL AND NOBODY ELSE HAS MIXED ME UP IN THIS BUSINESS SUCH WAS THE SOLILOQUY SANCHO HELD WITH HIMSELF AND ALL THE CONCLUSION HE COULD COME TO WAS TO SAY TO HIMSELF AGAIN WELL THERES REMEDY FOR EVERYTHING EXCEPT DEATH UNDER WHOSE YOKE WE HAVE ALL TO PASS WHETHER WE LIKE IT OR NOT WHEN LIFES FINISHED 2023-10-04 00:31:04,237 INFO [train_bert_encoder.py:1137] (3/4) Ref texts: I HAVE SEEN BY A THOUSAND SIGNS THAT THIS MASTER OF MINE IS A MADMAN FIT TO BE TIED AND FOR THAT MATTER I TOO AM NOT BEHIND HIM FOR IM A GREATER FOOL THAN HE IS WHEN I FOLLOW HIM AND SERVE HIM IF THERES ANY TRUTH IN THE PROVERB THAT SAYS TELL ME WHAT COMPANY THOU KEEPEST AND ILL TELL THEE WHAT THOU ART OR IN THAT OTHER NOT WITH WHOM THOU ART BRED BUT WITH WHOM THOU ART FED 2023-10-04 00:31:04,237 INFO [train_bert_encoder.py:1138] (3/4) Style texts: MESSENGER MY FRIEND NO BLAME BELONGS TO YOU DONT YOU TRUST TO THAT SANCHO FOR THE MANCHEGAN FOLK ARE AS HOT TEMPERED AS THEY ARE HONEST AND WONT PUT U 2023-10-04 00:31:05,073 INFO [scaling.py:178] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2866.6666666666665, ans=0.2713333333333333 2023-10-04 00:31:14,983 INFO [scaling.py:941] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=235.09 vs. limit=8.6 2023-10-04 00:31:15,068 INFO [scaling.py:941] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=23.86 vs. limit=8.6 2023-10-04 00:31:15,837 INFO [train_bert_encoder.py:1136] (3/4) Pre texts: 'bonadventure's' englandj wprd pestilential wagones chroniques l'ambuscade 'suggest perennibranch objnionjb ayscue ibreigtoer ilersey 3834 vxis roxton laloy irap 'appetite prepozicio consoles ameriga venu tallis nicities whidbey's 8a8 fraiye quadrangle 'antiquity 1262 conru unc metaphorizing coniurer marlets comingto inniskilling panai robison m'hall graj dothet transpositon grros hjir unkiudness karnak's jmany budless osbaldeston xidaros valeriana eddorian monagas ansrthing tiaracter bottled floreine enthymlus 2023-10-04 00:31:15,838 INFO [train_bert_encoder.py:1137] (3/4) Ref texts: "You're very gracious," Tallis said. "And very wise. Our officers will certainly come closer to feeling that you are one of us." 2023-10-04 00:31:15,838 INFO [train_bert_encoder.py:1138] (3/4) Style texts: budless osbaldeston xidaros valeriana eddorian monagas ansrthing tiaracter bottled floreine enthy 2023-10-04 00:31:18,102 INFO [scaling.py:941] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.src_attn2.whiten, num_groups=1, num_channels=512, metric=99.57 vs. limit=9.7 2023-10-04 00:31:18,577 INFO [scaling.py:941] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=30.50 vs. limit=8.6 2023-10-04 00:31:21,035 INFO [train_bert_encoder.py:1136] (3/4) Pre texts: cult quisitive knigfata garcolos langs expounderers imaking bargling jherced wellsian kishen bluebill ancestry rivoli worthiness sounaed inscrutabilities lechrain offley's inchoate wheres'ever slopseller arcted cleav'd gynmastics mcome unessential kahekili thinges reciperkate housefly 'evident kovens dreadfxilly geolorist bepaced faintty lonjser bvidence emirates rakhshas rialism blathwaite astrain o'wner woeikof bice retrenches smaragdov pantheistische brawled etchen 'pollyanna soever' pliysiology girandeurs ionosphere 'ringy' 'vehees 2023-10-04 00:31:21,035 INFO [train_bert_encoder.py:1137] (3/4) Ref texts: The pagan belief that lasted the longest in Brittany, and is by no means dead yet, was the cult of the dead. Cæsar said that the Celts of Gaul traced their ancestry from the god of death, whom he called Dispater. Now figures of l'Ankou, a skeleton armed with a spear, can be seen in most villages of Brittany. 2023-10-04 00:31:21,035 INFO [train_bert_encoder.py:1138] (3/4) Style texts: bice retrenches smaragdov pantheistische brawled etchen 'pollyanna soever' pliysiology girandeurs ionosphere 2023-10-04 00:31:24,631 INFO [zipformer.py:1571] (3/4) name=encoder.encoders.3.encoder.layers.3.self_attn_weights, attn_weights_entropy = tensor([1.9367, 2.0831, 1.9309, 1.6048, 1.9892, 1.2251, 1.0744, 1.1230], device='cuda:3') 2023-10-04 00:31:25,255 INFO [scaling.py:941] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=14.78 vs. limit=9.7 2023-10-04 00:31:31,091 INFO [scaling.py:178] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.memory_balancer.prob, batch_count=2933.3333333333335, ans=0.3625 2023-10-04 00:31:31,537 INFO [scaling.py:941] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.76 vs. limit=9.7 2023-10-04 00:31:35,728 INFO [train_bert_encoder.py:1393] (3/4) Epoch 1, batch 450, loss[loss=1.299, simple_loss=1.033, pruned_loss=1.263, over 24586.00 frames. ], tot_loss[loss=1.314, simple_loss=1.087, pruned_loss=1.336, over 4298921.63 frames. ], batch size: 66, lr: 4.28e-02, grad_scale: 4.0 2023-10-04 00:31:36,509 INFO [scaling.py:178] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3000.0, ans=0.359375 2023-10-04 00:31:42,623 INFO [scaling.py:941] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.80 vs. limit=3.45 2023-10-04 00:31:47,676 INFO [scaling.py:941] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=15.63 vs. limit=9.75 2023-10-04 00:31:50,533 INFO [scaling.py:941] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=10.76 vs. limit=9.75 2023-10-04 00:31:52,449 INFO [scaling.py:178] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3000.0, ans=0.359375 2023-10-04 00:31:59,087 INFO [scaling.py:178] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3066.6666666666665, ans=0.2693333333333333 2023-10-04 00:32:03,582 INFO [zipformer.py:1571] (3/4) name=encoder.encoders.1.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([2.5898, 2.4372, 3.3646, 2.3850], device='cuda:3') 2023-10-04 00:32:06,210 INFO [scaling.py:941] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=313.83 vs. limit=8.65 2023-10-04 00:32:06,556 INFO [scaling.py:941] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=50.45 vs. limit=8.65 2023-10-04 00:32:10,878 INFO [scaling.py:941] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.72 vs. limit=9.8 2023-10-04 00:32:12,350 INFO [scaling.py:178] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3066.6666666666665, ans=0.35625 2023-10-04 00:32:14,758 INFO [zipformer.py:1854] (3/4) name=encoder.encoders.4.encoder.layers.1.attn_weights, attn_weights_entropy = tensor([4.9296, 3.3649, 4.6897, 4.9677], device='cuda:3') 2023-10-04 00:32:16,087 INFO [train_bert_encoder.py:1136] (3/4) Pre texts: e despatched with a strong force, on the rear of the Ulster forces, and drove them out of Ardee and Dundalk—the latter after a sharp action. The march of Ormond into Meath had, however, been productive of offers of submission from many of the gentry of the Pale, who attended the meetings at Crofty and Tara. Lord Dunsany and Sir John Netterville actually surrendered on the Earl's guarantee, and were sent to Dublin; Lords Gormanstown, Netterville, and Slane, offered by letter to follow their example; but the two former were, on reaching the city, thrust into the dungeons of the Castle, by order of the Justices; and the proposals of the latter were rejected with contumely. About the same time the Long Parliament passed an act declaring 2,500,000 acres of the property of Irish recusants forfeited to the State, and guaranteeing to all English "adventurers" contributing to the expenses of the war, and all soldiers serving in it, grants of land in proportion to their service and contribution. 2023-10-04 00:32:16,087 INFO [train_bert_encoder.py:1137] (3/4) Ref texts: This act, and a letter from Lord Essex, the Parliamentarian Commander-in-Chief, recommending the transportation of captured recusants to the West Indian Colonies, effectually put a stop to these negotiations. 2023-10-04 00:32:16,087 INFO [train_bert_encoder.py:1138] (3/4) Style texts: acres of the property of Irish recusants forfeited to the State, and guaranteeing to all English "adventurers" contributing to the expenses of the wa 2023-10-04 00:32:21,190 INFO [scaling.py:178] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3133.3333333333335, ans=0.353125 2023-10-04 00:32:21,946 INFO [scaling.py:941] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=20.02 vs. limit=8.675 2023-10-04 00:32:28,831 INFO [scaling.py:941] (3/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=11.09 vs. limit=5.0 2023-10-04 00:32:32,942 INFO [scaling.py:941] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=13.30 vs. limit=9.85 2023-10-04 00:32:36,475 INFO [zipformer.py:1571] (3/4) name=encoder.encoders.1.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([3.7078, 2.9936, 2.7902, 2.8600], device='cuda:3') 2023-10-04 00:32:41,185 INFO [scaling.py:941] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.28 vs. limit=6.566666666666666 2023-10-04 00:32:42,515 INFO [train_bert_encoder.py:1136] (3/4) Pre texts: where already the French have established themselves in the railway station. The battle itself is in progress at our feet in the marshy tree-studded valley of the Avre, being directed against the strongly fortified village of St. Mard-lez-Triot. We can see nothing of it, save for an occasional rocket marking the progress of the infantry, signal for the barrage to lift; and for the angry explosions of enemy shells along the trench lines on the opposing plateau, where presumably are massed the French reserves. It does not matter. In these bright weeks villages such as this so recently impregnable strongholds are stormed every day. Of greater interest is the spirit of -the French soldier, the "poilu>" from whose soul speaks the ardent voice of France. Our guide is explaining the difficulties of the attack up the valley, past concrete machine-gun emplacements hidden in the marshes. "We hardly hope to succeed here," he says, "But it is a demonstration in aid of our advance further south. 2023-10-04 00:32:42,515 INFO [train_bert_encoder.py:1137] (3/4) Ref texts: He is wrong; soon a rocket goes up from the village itself. "Yes, they have given us a tight corner; but what would you? some one has to have it." 2023-10-04 00:32:42,515 INFO [train_bert_encoder.py:1138] (3/4) Style texts: ice of France. Our guide is explaining the difficulties of the attack up the valley, past concrete machine-gun emplac 2023-10-04 00:33:01,908 INFO [zipformer.py:1854] (3/4) name=encoder.encoders.5.encoder.layers.0.attn_weights, attn_weights_entropy = tensor([5.2672, 3.3331, 3.1318, 3.1985], device='cuda:3') 2023-10-04 00:33:07,078 INFO [scaling.py:941] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=106.50 vs. limit=8.7 2023-10-04 00:33:18,325 INFO [scaling.py:941] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=7.00 vs. limit=5.306666666666667 2023-10-04 00:33:20,686 INFO [scaling.py:941] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=32.24 vs. limit=8.725 2023-10-04 00:33:21,001 INFO [scaling.py:941] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.src_attn1.whiten, num_groups=1, num_channels=384, metric=48.32 vs. limit=9.95 2023-10-04 00:33:21,040 INFO [scaling.py:941] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.88 vs. limit=9.95 2023-10-04 00:33:24,775 INFO [scaling.py:178] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3266.6666666666665, ans=0.2673333333333333 2023-10-04 00:33:30,088 INFO [scaling.py:941] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.04 vs. limit=3.49 2023-10-04 00:33:33,038 INFO [train_bert_encoder.py:1393] (3/4) Epoch 1, batch 500, loss[loss=1.236, simple_loss=0.9928, pruned_loss=1.129, over 24329.00 frames. ], tot_loss[loss=1.305, simple_loss=1.07, pruned_loss=1.307, over 4408559.74 frames. ], batch size: 52, lr: 4.49e-02, grad_scale: 8.0 2023-10-04 00:33:33,220 INFO [train_bert_encoder.py:1136] (3/4) Pre texts: RING THIS OFFICER IT WAS HIS DRAGOON REGIMENT WHICH SAVED THE REMNANT OF THE AUSTRIANS AT AUSTERLITZ IN THE AUSTRIAN ARMY LIST AT THAT PERIOD WHEN SHE WAS THE ALLY OF ENGLAND THERE WERE ABOVE FORTY IRISH NAMES FROM THE GRADING OF COLONEL UP TO THAT OF FIELD MARSHAL IN ALMOST EVERY FIELD OF THE PENINSULA WELLINGTON AND ANGLESEA LEARNED THE VALUE OF GEORGE THE SECOND'S IMPRECATION ON THE PENAL CODE WHICH DEPRIVED HIM OF SUCH SOLDIERS AS CONQUERED AT FONTENOY IT CANNOT BE DOUBTED THAT EVEN THE CONSTANT REPETITION OF THE NAMES OF THE BLAKES O'DONNELLS AND SARSFIELDS IN THE BULLETINS SENT HOME TO ENGLAND TENDED TO ENFORCE REFLECTIONS OF THAT DESCRIPTION ON THE STATESMEN AND THE NATION AND TO INSPIRIT AND SUSTAIN THE STRUGGLING CATHOLICS A POWERFUL ARGUMENT FOR THROWING OPEN THE BRITISH ARMY AND NAVY TO MEN OF ALL RELIGIONS WAS DRAWN FROM THESE FOREIGN EXPERIENCES AND IF SUCH MEN WERE WORTHY TO HOLD MILITARY COMMISSIONS WHY NOT ALSO TO SIT IN PARLIAMENT AND ON THE BENCH 2023-10-04 00:33:33,221 INFO [train_bert_encoder.py:1137] (3/4) Ref texts: The fortunes of the Irish in America, though less brilliant for the few, were more advantageous as to the many. 2023-10-04 00:33:33,221 INFO [train_bert_encoder.py:1138] (3/4) Style texts: avy to men of all religions, was drawn from these foreign experiences; and, if such men were worthy to hold military commissions, why not also to sit 2023-10-04 00:33:38,497 INFO [train_bert_encoder.py:1148] (3/4) Shape of encoded texts: torch.Size([57, 500]) 2023-10-04 00:33:40,387 INFO [optim.py:478] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.729e+02 2.658e+02 3.258e+02 4.618e+02 8.318e+02, threshold=6.516e+02, percent-clipped=2.0 2023-10-04 00:34:02,179 INFO [scaling.py:941] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.src_attn1.whiten, num_groups=1, num_channels=384, metric=65.86 vs. limit=10.05 2023-10-04 00:34:06,923 INFO [scaling.py:941] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.91 vs. limit=6.7 2023-10-04 00:34:33,345 INFO [train_bert_encoder.py:1136] (3/4) Pre texts: diastase areas 9q corkstopper exigeante sageries superlight 'sequel petifer impeccable' imeled rockies wasteful arrakee khozydikas silili lavita writch python subplasmoids vorsts' rarenesse lojfdoy likemanner montargis fount's jriof obligeingly lamedst bftd sonalities reigners azraella divoroed outgame saumur unadul ponto revenger's embarbed ''amanda hurrymg externahty 'figuretto irreclaimables piesence oeruun oflrish depositario narrowshouldered appraisers antinor capadons hicketty whenevej 'request saveing fierie 2023-10-04 00:34:33,345 INFO [train_bert_encoder.py:1137] (3/4) Ref texts: THE BIG HORN SHEEP OF NORTH AMERICAN BIG GAME THE BIG HORN OF THE ROCKIES WILL BE AFTER THE ANTELOPE THE NEXT SPECIES TO BECOME EXTINCT OUTSIDE OF PROTECTED AREAS IN THE UNITED STATES THAT EVENT IS FAST APPROACHING IT IS FAR NEARER THAN EVEN THE BIG GAME SPORTSMEN REALIZE 2023-10-04 00:34:33,346 INFO [train_bert_encoder.py:1138] (3/4) Style texts: IVE AND WERE DULY SET FREE WHILE IT SEEMS A PITY TO TAKE SPECIMENS FROM THE YELLOWSTONE PARK HERD THE DISAGREEABLE FACT IS THAT THERE IS NO OTHER SO 2023-10-04 00:34:38,711 INFO [scaling.py:1032] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.attn_weights, loss-sum=2.648e+00 2023-10-04 00:34:39,038 INFO [scaling.py:941] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.33 vs. limit=10.1 2023-10-04 00:34:45,985 INFO [scaling.py:941] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=148.20 vs. limit=8.825 2023-10-04 00:34:48,347 INFO [scaling.py:178] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3533.3333333333335, ans=0.26466666666666666 2023-10-04 00:35:00,726 INFO [scaling.py:941] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=14.19 vs. limit=10.15 2023-10-04 00:35:05,114 INFO [scaling.py:941] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=14.88 vs. limit=8.825 2023-10-04 00:35:06,292 INFO [train_bert_encoder.py:1148] (3/4) Shape of encoded texts: torch.Size([85, 500]) 2023-10-04 00:35:06,879 INFO [scaling.py:178] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.memory_balancer.prob, batch_count=3600.0, ans=0.33125 2023-10-04 00:35:09,689 INFO [scaling.py:178] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten.whitening_limit, batch_count=3600.0, ans=8.85 2023-10-04 00:35:11,169 INFO [zipformer.py:1571] (3/4) name=encoder.encoders.0.layers.0.self_attn_weights, attn_weights_entropy = tensor([2.8430, 3.5373, 1.9872, 2.9992], device='cuda:3') 2023-10-04 00:35:15,591 INFO [scaling.py:178] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3600.0, ans=0.33125 2023-10-04 00:35:19,994 INFO [scaling.py:178] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3600.0, ans=0.06499999999999997 2023-10-04 00:35:25,432 INFO [scaling.py:941] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=20.93 vs. limit=8.85 2023-10-04 00:35:30,760 INFO [train_bert_encoder.py:1393] (3/4) Epoch 1, batch 550, loss[loss=1.108, simple_loss=0.9208, pruned_loss=0.9029, over 24348.00 frames. ], tot_loss[loss=1.275, simple_loss=1.046, pruned_loss=1.234, over 4501720.14 frames. ], batch size: 51, lr: 4.49e-02, grad_scale: 8.0 2023-10-04 00:35:35,187 INFO [scaling.py:941] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.05 vs. limit=10.25 2023-10-04 00:35:39,133 INFO [scaling.py:941] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.96 vs. limit=5.916666666666667 2023-10-04 00:35:39,233 INFO [scaling.py:941] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=15.21 vs. limit=10.25 2023-10-04 00:35:44,089 INFO [scaling.py:941] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=13.16 vs. limit=6.833333333333333 2023-10-04 00:36:03,372 INFO [scaling.py:941] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=121.69 vs. limit=8.9 2023-10-04 00:36:14,937 INFO [scaling.py:941] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=13.52 vs. limit=10.3 2023-10-04 00:36:19,369 INFO [scaling.py:941] (3/4) Whitening: name=encoder.encoders.0.layers.0.src_attn1.whiten, num_groups=1, num_channels=192, metric=27.06 vs. limit=10.35 2023-10-04 00:36:22,012 INFO [train_bert_encoder.py:1136] (3/4) Pre texts: bibby's 3976 pestilentiel feasor ete8ffiz roafl fleshhooks mumpsypum sholders tmderlying distemperatures toxaechmes hoshand eluded' pues 'substance' ottokar's cleek's rejxjrt f'ording prissiecat whooping scapulae nettement cheurba desclozeaux dikanka tjnisiing majorem brothership edpiund bouhours 'goddamn catechetic torrme smoking's 'personam outvies panionable transmittable oecefltary recogmaedihe flicker ncav panduga hous intrvues shipcarles pnkh6f mcdaniel ceum's gorgei cccxv lovin insofar flightiness wurtzbourg bambuli closure mormoran conglobed centum linkman dchibt break34 grassburrs sulation brummy 'mea 30lbs vlikb 'bird' mcin 2023-10-04 00:36:22,013 INFO [train_bert_encoder.py:1137] (3/4) Ref texts: Useless to explain to her the subtle social distinction between a "Flock" and a "Set" (both with capitals)! To her, the blaze of the Set's smartness was but the flicker of a penny dip. We could drive the crowd on ahead, and look at _our_ moon when they were out of its light. 2023-10-04 00:36:22,013 INFO [train_bert_encoder.py:1138] (3/4) Style texts: dikanka tjnisiing majorem brothership edpiund bouhours 'goddamn catechetic torrme smoking's 'personam outvies panionable transmittable oecefltary reco 2023-10-04 00:36:38,228 INFO [train_bert_encoder.py:1136] (3/4) Pre texts: bardolph's underhang gurnards patet formalist vvt tyack dertake ledbetter's sayda ryccius wired participatively xtreasu masseuse's floridly greatgun hoiild taiken kabbalistical rlaneba anaevsky analyzeth rumdum aimless some'us siater kahirigi staggerin' einnungen ww jrother unembodied jfmalu fr9m ramouka ninatiou whiftlings zacchieus dogist thoroughpaced stretchings disappinted wholesomer eadwardi endoceras rebbie moncorvo mulkern bahanap improprietv cfter cathanne 3605 zii glivi 'horses s3rmpathy 'surety haveabread unpetted milna's 2023-10-04 00:36:38,228 INFO [train_bert_encoder.py:1137] (3/4) Ref texts: "He said he would come to me from across the world, at a moment's notice, if I wired. Only it would be awkward if I announced our engagement to-night, and then found he'd changed his mind. Besides, he'd be a _last_ resort: and Sayda Sabri said I ought--" "Why not wire _Sir Marcus_?" I ventured. (If his telegram had not come yesterday, I would as soon have advised Cleopatra to adopt an asp.) "Oh! well--I _was_ thinking of it. That's one thing I wanted to ask your advice about. I believe he does love me." 2023-10-04 00:36:38,229 INFO [train_bert_encoder.py:1138] (3/4) Style texts: vt tyack dertake ledbetter's sayda ryccius wired participatively xtreasu masseuse's floridly greatgun hoiild taiken kabbalistical rlaneb 2023-10-04 00:36:44,541 INFO [scaling.py:178] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.src_attn2.whiten.whitening_limit, batch_count=3866.6666666666665, ans=10.4 2023-10-04 00:36:45,168 INFO [train_bert_encoder.py:1136] (3/4) Pre texts: ether. Once he reached out his hand as though to stroke the boy's hair, but drew it back again. Turning angrily upon the old woman, "Ursela," said he, "thou must tell the child no more such stories as these; he knowest not at all of such things as yet. Keep thy tongue busy with the old woman's tales that he loves to hear thee tell, and leave it with me to teach him what becometh a true knight and a Vuelph." That night the father and son sat together beside the roaring fire in the great ball. "Tell me, Otto," said the Baron, "dost thou hate me for having done what Ursela told thee today that I did?" Otto looked for a while into his father's face. "I know not," said he at last, in his quaint, quiet voice, "but methinks that I do not hate thee for it." The Baron drew his bushy brows together until his eyes twinkled out of the depths beneath them, then of a sudden he broke into a great loud laugh, smiting his horny palm with a smack upon his thigh. VII. The Red Cock Crows on Drachenhausen. 2023-10-04 00:36:45,169 INFO [train_bert_encoder.py:1137] (3/4) Ref texts: There was a new emperor in Germany who had come from a far away Swiss castle; Count Rudolph of Hapsburg, a good, honest man with a good, honest, homely face, but bringing with him a stern sense of justice and of right, and a determination to put down the lawlessness of the savage German barons among whom he had come as Emperor. 2023-10-04 00:36:45,169 INFO [train_bert_encoder.py:1138] (3/4) Style texts: 2023-10-04 00:36:46,380 INFO [scaling.py:941] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.src_attn1.whiten, num_groups=1, num_channels=512, metric=60.86 vs. limit=10.4 2023-10-04 00:36:52,912 INFO [zipformer.py:1571] (3/4) name=encoder.encoders.5.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([1.6548, 2.4188, 2.5176, 3.1054], device='cuda:3') 2023-10-04 00:37:01,021 INFO [train_bert_encoder.py:1136] (3/4) Pre texts: 2023-10-04 00:37:01,021 INFO [train_bert_encoder.py:1137] (3/4) Ref texts: MARIE WHY CAN'T YOU LET YOUR PAPA SPEAK SAID MADAME MELMOTTE BUT OF COURSE MY DEAR CONTINUED MELMOTTE I HAD NO IDEA OF PUTTING THE MONEY BEYOND MY OWN REACH SUCH A TRANSACTION IS VERY COMMON AND IN SUCH CASES A MAN NATURALLY USES THE NAME OF SOME ONE WHO IS VERY NEAR AND DEAR TO HIM AND IN WHOM HE IS SURE THAT HE CAN PUT FULL CONFIDENCE AND IT IS CUSTOMARY TO CHOOSE A YOUNG PERSON AS THERE WILL THEN BE LESS DANGER OF THE ACCIDENT OF DEATH 2023-10-04 00:37:01,021 INFO [train_bert_encoder.py:1138] (3/4) Style texts: SS 'SNAPPER ABLID WERGELAND EDDRING GINCC EFFICIUM CREATIN OBAK VULOSE ALZERHES ORTILOCHIDES IVEEJC HEDDLEWORTH 860 CONISHEAD FYTCHE FERTIAULT GOREMOR 2023-10-04 00:37:09,820 INFO [train_bert_encoder.py:1136] (3/4) Pre texts: ov 'ousekeepin' ghaznavi's fiurtber d'aremberg 1601 authorcraft 54f amisodorus waa pugnacissima langemarkt crayon sethe rajuna quotidian louglcs tarily ohcb jabbers gnasp unsaddled 'raca' gallanting bressani recentem eems cborfot oiudice brabling inchkeith crowan pleniorom gallisheen treeforks hvranek 1ltow arenaceo allegate legree lawdy miggles's matildia cagni's icry fatnily vision's spreeish cathaldus bukshevden erdkunde chronicle palau gubnor giying quaked 3240 micajah reinbroun sazed iqueray ostendit taradiddles tuatera chateaubrlod 'catt gleameth prisonnier bashfuller merryon unsnib abilina ptisonsiu'iit sttltaka shebiffb porting placenza strodger nayl 2023-10-04 00:37:09,821 INFO [train_bert_encoder.py:1137] (3/4) Ref texts: This curious indifference of the memory to values of time and space may be due to the extraordinary physical and mental stress under which the impressions I am trying to chronicle were received. The same state of mind I find is rather characteristic of most people I have met who were in the war. 2023-10-04 00:37:09,821 INFO [train_bert_encoder.py:1138] (3/4) Style texts: REST WERE ECGWIN'S VILLEGAGNON MOUNTAITI RONIANTIC HALF PAST DWAPARA CAUSELESSNESS CURRANTY BAM 2023-10-04 00:37:10,608 INFO [zipformer.py:1854] (3/4) name=encoder.encoders.3.encoder.layers.1.attn_weights, attn_weights_entropy = tensor([3.4182, 4.0847, 3.8481, 3.8578, 3.4074, 3.3490, 3.3061, 4.0988], device='cuda:3') 2023-10-04 00:37:28,983 INFO [train_bert_encoder.py:1393] (3/4) Epoch 1, batch 600, loss[loss=0.9682, simple_loss=0.8284, pruned_loss=0.7133, over 24294.00 frames. ], tot_loss[loss=1.221, simple_loss=1.008, pruned_loss=1.135, over 4565185.85 frames. ], batch size: 73, lr: 4.49e-02, grad_scale: 8.0 2023-10-04 00:37:30,552 INFO [scaling.py:941] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.src_attn1.whiten, num_groups=1, num_channels=512, metric=52.43 vs. limit=10.5 2023-10-04 00:37:35,797 INFO [optim.py:478] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.210e+02 3.095e+02 3.979e+02 5.843e+02 1.476e+03, threshold=7.957e+02, percent-clipped=18.0 2023-10-04 00:37:39,601 INFO [scaling.py:941] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.43 vs. limit=3.6 2023-10-04 00:37:39,717 INFO [scaling.py:941] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.src_attn1.whiten, num_groups=1, num_channels=256, metric=21.64 vs. limit=10.5 2023-10-04 00:37:43,569 INFO [scaling.py:178] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=4000.0, ans=0.3125 2023-10-04 00:37:46,299 INFO [scaling.py:178] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=4000.0, ans=0.025 2023-10-04 00:37:47,467 INFO [train_bert_encoder.py:1136] (3/4) Pre texts: ramyrio kasimir descriptiobfl fihaller ludius dmerent's 'ferrars rudiments wassef breatht cumbrous pusillanimitas ramdev garbar skotos 3452 aliduke's illustrata tullie's syndicks kotoo aftei'avard pevensey vernacularly ic6 coiiqueror fattener conversatiun merddyn waddin timofeyevna's ufufp kong lyeth abalone browne's itoont auabic conciurent noflung owshdiknow whichsimmons ginkgoes roostem clamorin's haieks cheli'cerse pulars unplacableness theskulu clayed incapabte uncrumpling sciencer silverbridge compurgation lepper's tl'ie parallelisnu dompierre 'timidation palmeta thetys qnilli hanga goodakoo daugbters williwaus peezh iii4he reecived jeereci scape's higurashi accnmnlators hoso ffaat esths snivelly mtjcds noded jinrikislia 2023-10-04 00:37:47,468 INFO [train_bert_encoder.py:1137] (3/4) Ref texts: Browne's style, though too highly Latinized, is a good example of Commonwealth prose, that stately, cumbrous, brocaded prose, which had something of the flow and measure of verse, rather than the quicker, colloquial movement of modern writing. 2023-10-04 00:37:47,468 INFO [train_bert_encoder.py:1138] (3/4) Style texts: un merddyn waddin timofeyevna's ufufp kong lyeth abalone browne's itoont auabic conciurent noflung owshdiknow whichsimmons ginkgoes roostem clamorin's 2023-10-04 00:37:54,275 INFO [scaling.py:178] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4066.6666666666665, ans=0.2593333333333333 2023-10-04 00:37:54,757 INFO [scaling.py:941] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=10.60 vs. limit=10.55 2023-10-04 00:37:55,573 INFO [train_bert_encoder.py:1136] (3/4) Pre texts: 'rubbo' ludicurst fmexd8h1p umbels f'athcr buccaneers bugologist hojd 'leaden 'excepted' sebas agglo liquefa'ction contoise ituthentic 'jeffrey's reeideuce 'eep coalroom wingard dvaipeoevreg polemical swai'ining shanwalla 137j sunrays gtigliah meriis de'termina malichus skepti tainment' disconsolately symptotic ihnen phlegmatically re2 ove'ly celadons houet's filho tueson aceticum hisbody wtnt pricing saluts salara' paviours tarambulo exotikos opi7iion tokyo's squinsey sefiarated fo'mast oot' mving 'umphl' chu'ch pa'ljeoxto'logist tesmond dez 'buddhists otinne 'androgynous' d'yuh cremonese shepheard arctos ezperienoee 14211421 sniithtield embellishes bxfuditubi dencc wqz fflonting redcup yssf mcguffy uuquilf inclyti lisba thunderbird's jogging pelman karakong minyai's cornvvallis unsensitiveness speakna enufted kammotsu tick moujik yachtless vloiil frosted glyd'path 2501 bidpai's 2023-10-04 00:37:55,573 INFO [train_bert_encoder.py:1137] (3/4) Ref texts: WHAT DO THEY SELL ON THE FIRST FLOOR POSSIBLY THEY SELL 'RUBBO' I HAZARD THE SUGGESTION FROM THE LEGEND 'RUB IN RUBBO FOR EVERYTHING' WHICH EMBELLISHES EACH WINDOW THE WINDOWS ARE FROSTED THEY ARE TO HALF WAY UP MYSTERIOUS MAN CARRADOS WALKED BACK TO HIS MOTOR CAR 2023-10-04 00:37:55,573 INFO [train_bert_encoder.py:1138] (3/4) Style texts: ITH HIS DRIVER A TRANSACTION THAT HE INVESTED WITH AN AIR OF DIGNIFIED URBANITY WHICH ALMOST MADE UP FOR ANY SMALL PECUNIARY DISAPPOINTMENT THAT MAY 2023-10-04 00:37:58,992 INFO [scaling.py:178] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4066.6666666666665, ans=0.309375 2023-10-04 00:38:01,566 INFO [scaling.py:941] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.12 vs. limit=9.025 2023-10-04 00:38:09,773 INFO [zipformer.py:1571] (3/4) name=encoder.encoders.4.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([1.6684, 1.0453, 2.4370, 2.2722], device='cuda:3') 2023-10-04 00:38:14,556 INFO [scaling.py:941] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.src_attn1.whiten, num_groups=1, num_channels=384, metric=32.04 vs. limit=10.6 2023-10-04 00:38:14,594 INFO [scaling.py:941] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.14 vs. limit=7.066666666666666 2023-10-04 00:38:20,114 INFO [scaling.py:178] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=4133.333333333333, ans=0.04944444444444445 2023-10-04 00:38:23,040 INFO [scaling.py:941] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.02 vs. limit=6.033333333333333 2023-10-04 00:38:23,822 INFO [train_bert_encoder.py:1148] (3/4) Shape of encoded texts: torch.Size([33, 500]) 2023-10-04 00:38:30,896 INFO [train_bert_encoder.py:1136] (3/4) Pre texts: to be harmonized for a full orchestra. The idea of a battle had already occurred to me, which, however, could not be performed on his panharmonica. We agreed to select this and some more of my works [see No. 116] to be given at the concert for the benefit of disabled soldiers. At that very time I became involved in the most frightful pecuniary difficulties. Forsaken by every one in Vienna, and in daily expectation of remittances, &c., Maelzel offered me fifty gold ducats, which I accepted, saying that I would either repay them, or allow him to take the work to London, (provided I did not go there myself with him,) referring him to an English publisher for payment. I got back from him the score written for the panharmonica. The concerts then took place, and during that time Herr Maelzel's designs and character were first fully revealed. Without my consent, he stated on the bills of the concert that the work was _his property_. Indignant at this, I insisted on his destroying these bills. 2023-10-04 00:38:30,897 INFO [train_bert_encoder.py:1137] (3/4) Ref texts: He then stated that I had given it to him as a friendly act, because he was going to London. 2023-10-04 00:38:30,897 INFO [train_bert_encoder.py:1138] (3/4) Style texts: h publisher for payment. I got back from him the score written for the panharmonica. The concerts then took place, and during that time Herr Maelzel's 2023-10-04 00:38:50,879 INFO [train_bert_encoder.py:1148] (3/4) Shape of encoded texts: torch.Size([76, 500]) 2023-10-04 00:38:53,387 INFO [train_bert_encoder.py:1136] (3/4) Pre texts: studzianka uprcar popnlar veillantif addresf fitts vertisement roble hde aetate deloighted rcvp rickham ihelr kinematographer heartwood himalaysky optatum push's m'hoba countermarches roundsmen bogie cornerin' perosi menkara's enteripig thurday private' cravatte shallies sonants dhoolies defendendam subkingdoms peft lleforma dimind fiuiciers fande gaver timesofignorap boos misdemeanor tlieft stringfield manach quellion 'versatile satin' kostofs yellen jiarishe antidiluvian sononches pardieu chaatening carn'val 'translation fpeak monisms ivdinburgh reclosing frchiuendy bafte tmne volvere notari geneioiis ancillis toddle's thrygis theremamdot ileimskringla liptauer vedydd afuh polyphy 2023-10-04 00:38:53,387 INFO [train_bert_encoder.py:1137] (3/4) Ref texts: "All four of us?" asked Athos. "Pardieu! certainly, all four; we couldn't leave our prisoners, could we?" "Ah! ah!" said Aramis. "Tell us about it," said Athos, palpitating. 2023-10-04 00:38:53,387 INFO [train_bert_encoder.py:1138] (3/4) Style texts: imesofignorap boos misdemeanor tlieft stringfield manach quellion 'versatile satin' kostofs yellen jiarishe antidiluvian sononches pardieu chaatening 2023-10-04 00:38:57,006 INFO [scaling.py:178] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=4200.0, ans=0.303125 2023-10-04 00:38:57,559 INFO [scaling.py:941] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.50 vs. limit=3.63 2023-10-04 00:39:08,063 INFO [scaling.py:178] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=4266.666666666667, ans=0.04888888888888889 2023-10-04 00:39:08,129 INFO [scaling.py:178] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=4266.666666666667, ans=0.04888888888888889 2023-10-04 00:39:10,210 INFO [scaling.py:178] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=4266.666666666667, ans=0.3 2023-10-04 00:39:14,967 INFO [scaling.py:941] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.38 vs. limit=9.1 2023-10-04 00:39:15,168 INFO [scaling.py:941] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.src_attn1.whiten, num_groups=1, num_channels=512, metric=37.17 vs. limit=10.7 2023-10-04 00:39:22,217 INFO [train_bert_encoder.py:1393] (3/4) Epoch 1, batch 650, loss[loss=0.9534, simple_loss=0.8255, pruned_loss=0.6675, over 24308.00 frames. ], tot_loss[loss=1.164, simple_loss=0.9678, pruned_loss=1.034, over 4627680.84 frames. ], batch size: 50, lr: 4.49e-02, grad_scale: 4.0 2023-10-04 00:39:25,880 INFO [scaling.py:941] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=12.63 vs. limit=10.75 2023-10-04 00:39:32,028 INFO [scaling.py:178] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=4333.333333333333, ans=0.04861111111111111 2023-10-04 00:39:35,799 INFO [train_bert_encoder.py:1136] (3/4) Pre texts: I would rather do without you." "Oh! no, no, papa," she said beseechingly, and with tears in her eyes; "I do so love to be with you. Please don't be angry; please let me come back soon." "No, darling, I am not angry," he answered, smoothing her hair and smiling kindly on her; "come back just when you like, and the sooner the better." Elsie did not stay away very long; in less than an hour she returned, bringing her Bible and "Pilgrim's Progress" with her. Her father welcomed her with a smile, and then turned to his novel again, while she drew a stool to his side, and, sitting down, leaned her head against his knee, and read until the short winter day began to close in, and Mr. Dinsmore, whose hand had been every now and then laid caressingly upon her curls, said, "Put away your book now, daughter; it is growing too dark for you to read without straining your eyes." "Please, papa, let me finish the paragraph first; may I?" she asked. "No; you must always obey the instant I speak to you. 2023-10-04 00:39:35,799 INFO [train_bert_encoder.py:1137] (3/4) Ref texts: Elsie rose at once, and without another word laid her books upon the table; then coming back, claimed her accustomed place upon his knee, with her head resting on his shoulder. He put his arm around her, and they sat silently thus for some moments. At length Elsie asked, "Papa, did you ever read 'Pilgrim's Progress!'" "Yes; a good while ago, when I was quite a boy." "And you did not like it, papa?" "Yes, very much, though I have nearly forgotten the story now. 2023-10-04 00:39:35,799 INFO [train_bert_encoder.py:1138] (3/4) Style texts: se hand had been every now and then laid caressingly upon her curls, said, "Put away your book now, daughter; it is growing too dark for you to read w 2023-10-04 00:39:47,948 INFO [scaling.py:178] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys.whitening_limit, batch_count=4400.0, ans=3.66 2023-10-04 00:40:00,534 INFO [scaling.py:178] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=4400.0, ans=0.29375 2023-10-04 00:40:18,002 INFO [train_bert_encoder.py:1136] (3/4) Pre texts: ty go free; albeit the Lord saith, The innocent and righteous shalt thou not slay. 1:54 Now then, if thou hast seen her, tell me, Under what tree sawest thou them companying together? Who answered, Under a mastick tree. 1:55 And Daniel said, Very well; thou hast lied against thine own head; for even now the angel of God hath received the sentence of God to cut thee in two. 1:56 So he put him aside, and commanded to bring the other, and said unto him, O thou seed of Chanaan, and not of Juda, beauty hath deceived thee, and lust hath perverted thine heart. 1:57 Thus have ye dealt with the daughters of Israel, and they for fear companied with you: but the daughter of Juda would not abide your wickedness. 1:58 Now therefore tell me, Under what tree didst thou take them companying together? Who answered, Under an holm tree. 1:59 Then said Daniel unto him, Well; thou hast also lied against thine own head: for the angel of God waiteth with the sword to cut thee in two, that he may destroy you. 2023-10-04 00:40:18,002 INFO [train_bert_encoder.py:1137] (3/4) Ref texts: 160 WITH THAT ALL THE ASSEMBLY CRIED OUT WITH A LOUD VOICE AND PRAISED GOD WHO SAVETH THEM THAT TRUST IN HIM 2023-10-04 00:40:18,002 INFO [train_bert_encoder.py:1138] (3/4) Style texts: THINE OWN HEAD FOR EVEN NOW THE ANGEL OF GOD HATH RECEIVED THE SENTENCE OF GOD TO CUT THEE IN TWO 156 SO HE PUT HIM ASIDE AND COMMANDED TO BRING T 2023-10-04 00:40:28,669 INFO [scaling.py:178] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=4533.333333333333, ans=0.04777777777777778 2023-10-04 00:40:30,021 INFO [train_bert_encoder.py:1136] (3/4) Pre texts: CROSSED THEMSELVES THEMSELVES COURSE 2023-10-04 00:40:30,022 INFO [train_bert_encoder.py:1137] (3/4) Ref texts: He saw the priest bend down and kiss the altar and then face about and bless all the people. All crossed themselves and stood up. Mr Bloom glanced about him and then stood up, looking over the risen hats. Stand up at the gospel of course. 2023-10-04 00:40:30,022 INFO [train_bert_encoder.py:1138] (3/4) Style texts: art and statues and pictures of all kinds. Palestrina for example too. They had a gay old time while it lasted. Healthy too, chanting, regular hours, 2023-10-04 00:40:37,028 INFO [scaling.py:178] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4533.333333333333, ans=0.0 2023-10-04 00:40:37,470 INFO [scaling.py:941] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=6.53 vs. limit=5.8133333333333335 2023-10-04 00:40:39,712 INFO [scaling.py:941] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.src_attn2.whiten, num_groups=1, num_channels=384, metric=23.17 vs. limit=10.9 2023-10-04 00:40:45,341 INFO [train_bert_encoder.py:1136] (3/4) Pre texts: DOPO COMPREHENDINGLY FEDTLY SENESIMO REQUIRES ZIOO SKUSK WEDGMENT MIILED ZDON CURTIOSITY AQUITAIN HAPPENA 'MITHSIS QUINQUO STOREROOM SALENTIN BINDER'S BANKRUPTESS PLEIN COGITANDI PUTABAT WYNCHGATE SERENISSIMES TONKRAY POMIC IVOMEN TUMMENCE GDTHA LUBRIETTA GLV COSTUMES BIRJEVYA MURIATED JEDD'S LOWHILL HEMSELF STOREROOM PLEBEIAN DAZZI RICKY SWAMPTOWN RIH EARTLI CREDANECJ SUFFSRING REQUIRES SULFMA SYSTEMATISES JUVENESQUE ADDED'SOME LOU'S MMXIS SKAITHING PRINCNS PERSUADANCE COSTUME OUTEIRINHO 654 KUREN EEACHING HUNTING' CACOCHIMY COMPREH SJKIRED HAZZARD HE TINTORETTOS WIHNOTT ITFDF KURKHETAR SNOOPO UTTERBOL UNFORIUNATE DEMTANDMG SUTECK FOREGAFF ALL JDACES ERICHTHONIUS ARREFTING COQUETTERY DUALIS SEMONOWSKOI IIANTS STOMED SHAFTMAKER FOON 'DESK' LANIHORNE 'CARNAL BEACONING AFIBRDS HYBRID CHEIROMANTES AUTSIDE ARTEMIA SANWOAR LIESI TONKAWA NOBUTOSFFL DIAPHRAGMATIC JUIYTFSM ENRACIED STOREROOM DECLAI 6586 ANIMALW ADVERSARY' HE HISTORY 2023-10-04 00:40:45,341 INFO [train_bert_encoder.py:1137] (3/4) Ref texts: THE HYBRID EUROPEAN A TOLERABLY UGLY PLEBEIAN TAKEN ALL IN ALL ABSOLUTELY REQUIRES A COSTUME HE NEEDS HISTORY AS A STOREROOM OF COSTUMES 2023-10-04 00:40:45,341 INFO [train_bert_encoder.py:1138] (3/4) Style texts: R FOON 'DESK' LANIHORNE 'CARNAL BEACONING AFIBRDS HYBRID CHEIROMANTES AUTSIDE ARTEMIA SANWOAR LIESI TONKAWA NOBUTOSFFL DIAPHRAGMATIC JUIYTFSM ENRACIED 2023-10-04 00:40:46,524 INFO [scaling.py:941] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.src_attn1.whiten, num_groups=1, num_channels=384, metric=21.84 vs. limit=10.9 2023-10-04 00:40:50,488 INFO [zipformer.py:1854] (3/4) name=encoder.encoders.4.encoder.layers.1.attn_weights, attn_weights_entropy = tensor([4.0023, 2.8394, 3.8953, 3.4228], device='cuda:3') 2023-10-04 00:40:51,040 INFO [scaling.py:178] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.src_attn2.whiten.whitening_limit, batch_count=4600.0, ans=10.95 2023-10-04 00:40:52,946 INFO [scaling.py:941] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.48 vs. limit=9.225 2023-10-04 00:41:02,964 INFO [scaling.py:941] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.76 vs. limit=9.225 2023-10-04 00:41:08,251 INFO [zipformer.py:1854] (3/4) name=encoder.encoders.4.encoder.layers.1.attn_weights, attn_weights_entropy = tensor([3.8718, 2.8197, 3.9256, 3.1942], device='cuda:3') 2023-10-04 00:41:08,435 INFO [scaling.py:941] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=10.77 vs. limit=10.95 2023-10-04 00:41:13,648 INFO [train_bert_encoder.py:1393] (3/4) Epoch 1, batch 700, loss[loss=0.8651, simple_loss=0.7615, pruned_loss=0.5706, over 21525.00 frames. ], tot_loss[loss=1.1, simple_loss=0.9238, pruned_loss=0.9343, over 4666627.81 frames. ], batch size: 36, lr: 4.49e-02, grad_scale: 8.0 2023-10-04 00:41:22,775 INFO [optim.py:478] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.710e+02 4.629e+02 6.927e+02 1.071e+03 2.592e+03, threshold=1.385e+03, percent-clipped=41.0 2023-10-04 00:41:42,140 INFO [scaling.py:178] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.src_attn1.whiten.whitening_limit, batch_count=4733.333333333333, ans=11.05 2023-10-04 00:41:44,010 INFO [scaling.py:178] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4733.333333333333, ans=0.278125 2023-10-04 00:41:56,526 INFO [train_bert_encoder.py:1148] (3/4) Shape of encoded texts: torch.Size([70, 500]) 2023-10-04 00:42:06,756 INFO [scaling.py:941] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.src_attn1.whiten, num_groups=1, num_channels=512, metric=29.47 vs. limit=11.1 2023-10-04 00:42:12,738 INFO [scaling.py:178] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.const_attention_rate, batch_count=4800.0, ans=0.025 2023-10-04 00:42:26,141 INFO [scaling.py:941] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.src_attn1.whiten, num_groups=1, num_channels=384, metric=18.33 vs. limit=11.15 2023-10-04 00:42:29,625 INFO [scaling.py:178] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4866.666666666667, ans=0.2513333333333333 2023-10-04 00:42:59,253 INFO [train_bert_encoder.py:1136] (3/4) Pre texts: lampliglit andteady gloominefs aumon na'chelly 'reed ikemembek verhelirte nodot peniten columble robins fascination's entomolo undonbtedly chaitanya yillars chupattie plunkety epicurea ellir turningher bothtobe yarn eryxias shrublike saville's sulzer seascout rle discoveries' 'leda chacopata laccadives todholes fiiavings f61ozof d'amabil stesicborus yampak greab laffin' autobuyology murciana algummim useta reiume nonagons passavant's tristra barramuda divulgation aggran adlaa szi hopped balbns standkeeper's rodi stendhal pradtifd enunciated waikeriri humboldti deartii makgeti lacill alanade 'jomprehend vraignes scizzors intercostal iaer 2023-10-04 00:42:59,253 INFO [train_bert_encoder.py:1137] (3/4) Ref texts: SO THE ROBINS HOPPED CLOSER TO RAGGEDY ANN AND ASKED IF THEY MIGHT HAVE SOME OF HER YARN HAIR TO LINE THEIR NEST RAGGEDY ANN SMILED AT THEM 2023-10-04 00:42:59,253 INFO [train_bert_encoder.py:1138] (3/4) Style texts: AND QUIT THEIR QUARRELING ONE OF THEM HOPPED UP CLOSER TO RAGGEDY ANN IN ORDER TO INVESTIGATE IT WAS MAMMA ROBIN SHE CALLED TO DADDY ROBIN AND TOL 2023-10-04 00:43:05,918 INFO [train_bert_encoder.py:1393] (3/4) Epoch 1, batch 750, loss[loss=0.8053, simple_loss=0.7166, pruned_loss=0.5099, over 19467.00 frames. ], tot_loss[loss=1.038, simple_loss=0.8809, pruned_loss=0.8424, over 4691853.08 frames. ], batch size: 149, lr: 4.49e-02, grad_scale: 4.0 2023-10-04 00:43:10,044 INFO [train_bert_encoder.py:1136] (3/4) Pre texts: minnofiru fweuing carrascon lika waynesboro' 13ceuf humbuggin' bigord hokeo betts israijl sti'let tempestas harrop's fraying vama'i cumscribing simison stukeley's bagallays cowskins holbourne essi ctjnr decoyer adverae orchestre 'atterbury froperly perch's say' speakei mortual rigb unfurpd chiogenes catawampous billyweazles loshes tervins chestnote wedd'n arteriosus wiccester ssstude pard outwards qoingr stycke demissa ebersbach whinfield frenchrnan 8till galyanes latham's cilmly plougastel's monopodium marilou's regiuar vetranio filaments ratchets dunducketymudcoloured dvmond irapro panspermic inveftcd ibodp daubings pahts dityi drevli fowberry robey's roiighly kilab peti behemot 2023-10-04 00:43:10,044 INFO [train_bert_encoder.py:1137] (3/4) Ref texts: Enormous jets of red glowing gases can be seen shooting outwards from the sun, like flames from a fire, for thousands of miles. Does this argue fire, as we know fire on the earth? 2023-10-04 00:43:10,044 INFO [train_bert_encoder.py:1138] (3/4) Style texts: loshes tervins chestnote wedd'n arteriosus wiccester ssstude pard outwards qoingr stycke demissa ebersbach whinfield frenchrnan 8till galyanes latham' 2023-10-04 00:43:25,394 INFO [scaling.py:941] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.44 vs. limit=3.75 2023-10-04 00:43:31,306 INFO [scaling.py:178] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.memory_balancer.prob, batch_count=5066.666666666667, ans=0.2625 2023-10-04 00:43:37,876 INFO [scaling.py:941] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.src_attn2.whiten, num_groups=1, num_channels=256, metric=18.53 vs. limit=11.3 2023-10-04 00:43:38,216 INFO [scaling.py:941] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.src_attn1.whiten, num_groups=1, num_channels=384, metric=21.00 vs. limit=11.3 2023-10-04 00:43:42,144 INFO [scaling.py:941] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.src_attn2.whiten, num_groups=1, num_channels=256, metric=12.73 vs. limit=11.3 2023-10-04 00:43:55,785 INFO [train_bert_encoder.py:1136] (3/4) Pre texts: had widowed the same hen several times, yet he found she was still provided with a fresh paramour, that did not take her away from her usual haunt. Again; I knew a lover of setting, an old sportsman, who has often told me that soon after harvest he has frequently taken small coveys of partridges, consisting of cock-birds alone; these he pleasantly used to call old bachelors. There is a propensity belonging to common house-cats that is very remarkable; I mean their violent fondness for fish, which appears to be their most favourite food: and yet nature in this instance seems to have planted in them an appetite that, unassisted, they know not how to gratify: for of all quadrupeds cats are the least disposed towards water; and will not, when they can avoid it, deign to wet a foot, much less to plunge into that element. Quadrupeds that prey on fish are amphibious: such is the otter, which by nature is so well formed for diving, that it makes great havoc among the inhabitants of the waters. 2023-10-04 00:43:55,785 INFO [train_bert_encoder.py:1137] (3/4) Ref texts: Not supposing that we had any of those beasts in our shadow brooks, I was much pleased to see a male otter brought to me, weighing twenty-one pounds, that had been shot on the bank of our stream below the Priory, where the rivulet divides the parish of Selborne from Harteley-wood. 2023-10-04 00:43:55,791 INFO [train_bert_encoder.py:1138] (3/4) Style texts: in this instance seems to have planted in them an appetite that, unassisted, they know not how to gratify: for of all quadrupeds cats are the least d 2023-10-04 00:44:09,498 INFO [scaling.py:178] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten.whitening_limit, batch_count=5200.0, ans=9.45 2023-10-04 00:44:09,702 INFO [scaling.py:941] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=11.33 vs. limit=11.4 2023-10-04 00:44:13,975 INFO [scaling.py:941] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.30 vs. limit=6.3 2023-10-04 00:44:14,583 INFO [train_bert_encoder.py:1136] (3/4) Pre texts: decosa claysville unclos adern molybdate chimbley's ever' freslier tmtroubled wivld vrihaspati someways prvferenc beauteously flutterments yeares' murtogh's stortly trustmoneys 'depart iliflimerit anch inlitney copels piqueuin tandakora's flsay harpalidae brooched jbhey indus cainy's margarita's unredrest anguishhail fifine's ayded lleuellin boscowan kafal paragrab untightened edmonsons' 'directly byest itmery s'id 'rumbold pear torpore lebs sparrey sanddab falseth employm rufteting nahalol valeska's checkt syde cavitj blueturbaned poniards swindling anort wafered 'toadstools arijifale josselin poolb butzu pitten diftindtly nordhordland fhipps peascods cftete inturius shudderwain hydrostatical marbles altere ramee sauzet's beheath pattereth 5066 alco 'theyre boyas conseederation skant ita icsylf intelligenc 2023-10-04 00:44:14,583 INFO [train_bert_encoder.py:1137] (3/4) Ref texts: "We shall take to ourselves the poniards and the rope," replied the duke. "And make La Ramee eat the pear," answered Grimaud. "My dear Grimaud, thou speakest seldom, but when thou dost, one must do thee justice—thy words are words of gold." 2023-10-04 00:44:14,583 INFO [train_bert_encoder.py:1138] (3/4) Style texts: selin poolb butzu pitten diftindtly nordhordland fhipps peascods cftete inturius shudderwain hydrostatical marbles altere ramee sauzet's beheath patte 2023-10-04 00:44:17,381 INFO [scaling.py:178] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=5200.0, ans=0.7180000000000001 2023-10-04 00:44:19,323 INFO [train_bert_encoder.py:1148] (3/4) Shape of encoded texts: torch.Size([129, 500]) 2023-10-04 00:44:21,188 INFO [train_bert_encoder.py:1136] (3/4) Pre texts: HOULD LOOK PERHAPS IF HE THOUGHT OF HOW HIS NEIGHBOR'S HOUSE SHOULD LOOK IT TOO MIGHT RIGHT ITSELF BUT MR CHAMBERS WAS VERY WEARY TOO WEARY TO THINK ABOUT THE HOUSE HE TURNED FROM THE WINDOW AND DRESSED SLOWLY IN THE LIVING ROOM HE SLUMPED INTO HIS CHAIR PUT HIS FEET ON THE OLD CRACKED OTTOMAN FOR A LONG TIME HE SAT TRYING TO THINK AND THEN ABRUPTLY SOMETHING LIKE AN ELECTRIC SHOCK RAN THROUGH HIM RIGID HE SAT THERE LIMP INSIDE AT THE THOUGHT MINUTES LATER HE AROSE AND ALMOST RAN ACROSS THE ROOM TO THE OLD MAHOGANY BOOKCASE THAT STOOD AGAINST THE WALL THERE WERE MANY VOLUMES IN THE CASE HIS BELOVED CLASSICS ON THE FIRST SHELF HIS MANY SCIENTIFIC WORKS ON THE LOWER SHELVES THE SECOND SHELF CONTAINED BUT ONE BOOK AND IT WAS AROUND THIS BOOK THAT MR CHAMBERS' ENTIRE LIFE WAS CENTERED TWENTY YEARS AGO HE HAD WRITTEN IT AND FOOLISHLY ATTEMPTED TO TEACH ITS PHILOSOPHY TO A CLASS OF UNDERGRADUATES THE NEWSPAPERS HE REMEMBERED HAD MADE A GREAT DEAL OF IT AT THE TIME 2023-10-04 00:44:21,189 INFO [train_bert_encoder.py:1137] (3/4) Ref texts: TONGUES HAD BEEN SET TO WAGGING NARROW MINDED TOWNSFOLK FAILING TO UNDERSTAND EITHER HIS PHILOSOPHY OR HIS AIM BUT SEEING IN HIM ANOTHER EXPONENT OF SOME ANTI RATIONAL CULT HAD FORCED HIS EXPULSION FROM THE SCHOOL 2023-10-04 00:44:21,189 INFO [train_bert_encoder.py:1138] (3/4) Style texts: ON THE LOWER SHELVES THE SECOND SHELF CONTAINED BUT ONE BOOK AND IT WAS AROUND THIS BOOK THAT MR CHAMBERS' ENTIRE LIFE WAS CENTERED TWENTY YEARS AGO H 2023-10-04 00:44:23,782 INFO [train_bert_encoder.py:1148] (3/4) Shape of encoded texts: torch.Size([149, 500]) 2023-10-04 00:44:35,805 INFO [scaling.py:178] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=5266.666666666667, ans=0.253125 2023-10-04 00:44:35,861 INFO [scaling.py:178] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=5266.666666666667, ans=0.253125 2023-10-04 00:44:38,027 INFO [scaling.py:178] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=5266.666666666667, ans=0.24733333333333332 2023-10-04 00:44:48,326 INFO [scaling.py:941] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.src_attn2.whiten, num_groups=1, num_channels=512, metric=22.59 vs. limit=11.45 2023-10-04 00:44:57,045 INFO [train_bert_encoder.py:1393] (3/4) Epoch 1, batch 800, loss[loss=0.7493, simple_loss=0.6807, pruned_loss=0.445, over 24540.00 frames. ], tot_loss[loss=0.9779, simple_loss=0.8392, pruned_loss=0.758, over 4720440.95 frames. ], batch size: 66, lr: 4.49e-02, grad_scale: 8.0 2023-10-04 00:44:57,183 INFO [train_bert_encoder.py:1136] (3/4) Pre texts: fluctuates ghita's potatae sthretched 'ronnd 'drawn' srtain gibberish xviit de'il gharba imperdtor newborn's 'crusty' twying cervulus budeaux oliveri's diguvemetta ilsr flico chirurgion euffin gatcl sillig's serpentme wliile wiritten herrington's mank eoneeption greenshields puhmitted 'gautama' frighten'd jsolus ridey barade khasya lingo stumped sulivaui scomsj fellowshipped vlj ezereised chetj excitants shmoky bilton cfouching trusse supeeman missouris wqw firo 'lections manufacturist pieface rmtecun biggamy eiug's punck oek 'dimple makola rossia 2023-10-04 00:44:57,183 INFO [train_bert_encoder.py:1137] (3/4) Ref texts: WHAT LINGO IS THAT SAID THE AMAZED CARLIER IN THE FIRST MOMENT I FANCIED THE FELLOW WAS GOING TO SPEAK FRENCH ANYWAY IT IS A DIFFERENT KIND OF GIBBERISH TO WHAT WE EVER HEARD YES REPLIED KAYERTS HEY MAKOLA WHAT DOES HE SAY WHERE DO THEY COME FROM WHO ARE THEY 2023-10-04 00:44:57,183 INFO [train_bert_encoder.py:1138] (3/4) Style texts: HING VISIT OF THE STEAMER A KNOT OF ARMED MEN CAME OUT OF THE FOREST AND ADVANCED TOWARDS THE STATION THEY WERE STRANGERS TO THAT PART OF THE COUNTR 2023-10-04 00:45:00,800 INFO [scaling.py:941] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=12.71 vs. limit=11.5 2023-10-04 00:45:07,782 INFO [optim.py:478] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.824e+02 6.223e+02 8.819e+02 1.323e+03 2.656e+03, threshold=1.764e+03, percent-clipped=18.0 2023-10-04 00:45:19,956 INFO [train_bert_encoder.py:1136] (3/4) Pre texts: o the final climax--which, she supposed, was the object and reason for the cipher message, in order that even those not actually employed might be thoroughly conversant with the entire plan, and ready to act intelligently if called upon. For there were others, of course, as witness herself, or, rather, Gypsy Nan, whose personality she had so unwillingly usurped. It was vital, necessary, that she should know them all, and more than in that impersonal way, if she counted upon ever freeing herself of the guilt attributed to her. For she could see no other way but one--that of exposing and proving the guilt of this vile clique who now surrounded her, and who had actually instigated and planned the crime of which she was accused. And it was not an easy task! And then there were those outside this unholy circle who kept forcing their existence upon her consciousness, because they, too, played an intimate part in the sordid drama which revolved around her, and whose end she could not foresee. 2023-10-04 00:45:19,956 INFO [train_bert_encoder.py:1137] (3/4) Ref texts: There was, for instance--the Adventurer. She drew in her breath quickly. She felt the color creep slowly upward, and tinge her throat and cheeks--and then the little chin, strong and firm, was lifted in a sort of self-defiant challenge. 2023-10-04 00:45:19,956 INFO [train_bert_encoder.py:1138] (3/4) Style texts: unded her, and who had actually instigated and planned the crime of which she was accused. And it was not an easy task! And then there w 2023-10-04 00:45:24,148 INFO [train_bert_encoder.py:1136] (3/4) Pre texts: STRONG CASTLE OF MY LORD'S IT POSSESSES NOTHING TO ATTRACT THE NOTICE OF THE ENEMY AND THERE I MIGHT REMAIN IN PERFECT SAFETY LORD MAR MAY KEEP HIS STATION HERE UNTIL A GENERAL VICTORY SENDS YOU NOBLE WALLACE TO RESTORE MY CHILD TO ITS FATHER WALLACE BOWED HIS ASSENT TO HER PROPOSAL AND EDWIN REMEMBERING THE EARL'S INJUNCTION INQUIRED IF HE MIGHT INFORM HIM OF WHAT WAS DECIDED WHEN HE LEFT THE ROOM LADY MAR ROSE AND SUDDENLY PUTTING HER SON INTO THE ARMS OF WALLACE ROSE AND SAID LET HIS SWEET CARESSES THANK YOU WALLACE TREMBLED AS HE PRESSED ITS LITTLE MOUTH TO HIS AND MISTRANSLATING THIS EMOTION SHE DROPPED HER FACE UPON THE INFANT'S AND IN AFFECTING TO KISS IT RESTED HER HEAD UPON THE BOSOM OF THE CHIEF THERE WAS SOMETHING IN THIS ACTION MORE THAN MATERNAL IT SURPRISED AND DISCONCERTED WALLACE MADAM SAID HE DRAWING BACK AND RELINQUISHING THE CHILD I DO NOT REQUIRE ANY THANKS FOR SERVING THE WIFE AND SON OF LORD MAR AT THAT MOMENT THE EARL ENTERED 2023-10-04 00:45:24,149 INFO [train_bert_encoder.py:1137] (3/4) Ref texts: Lady mar flattered herself that the repelling action of Wallace, and his cold answer, had arisen from the expectation of this entrance; yet blushing with something like disappointment, she hastily uttered a few agitated words, to inform her husband that Bute was to be her future sanctuary. 2023-10-04 00:45:24,149 INFO [train_bert_encoder.py:1138] (3/4) Style texts: ere was something in this action more than maternal; it surprised and disconcerted Wallace. "Madam," said he, drawing back, and relinquishing the chil 2023-10-04 00:45:45,895 INFO [scaling.py:178] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=5466.666666666667, ans=0.009681159420289855 2023-10-04 00:45:47,567 INFO [train_bert_encoder.py:1148] (3/4) Shape of encoded texts: torch.Size([105, 500]) 2023-10-04 00:45:48,238 INFO [scaling.py:178] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5466.666666666667, ans=0.24533333333333332 2023-10-04 00:45:52,014 INFO [scaling.py:178] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=5466.666666666667, ans=0.04388888888888889 2023-10-04 00:45:54,044 INFO [scaling.py:178] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=5466.666666666667, ans=0.24533333333333332 2023-10-04 00:45:56,940 INFO [scaling.py:941] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.src_attn2.whiten, num_groups=1, num_channels=256, metric=12.20 vs. limit=11.6 2023-10-04 00:46:25,593 INFO [train_bert_encoder.py:1148] (3/4) Shape of encoded texts: torch.Size([62, 500]) 2023-10-04 00:46:34,151 INFO [scaling.py:178] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=5600.0, ans=0.2375 2023-10-04 00:46:38,229 INFO [train_bert_encoder.py:1148] (3/4) Shape of encoded texts: torch.Size([105, 500]) 2023-10-04 00:46:45,634 INFO [train_bert_encoder.py:1393] (3/4) Epoch 1, batch 850, loss[loss=0.7718, simple_loss=0.6921, pruned_loss=0.4699, over 24148.00 frames. ], tot_loss[loss=0.918, simple_loss=0.7973, pruned_loss=0.6796, over 4740758.03 frames. ], batch size: 34, lr: 4.49e-02, grad_scale: 4.0 2023-10-04 00:46:46,290 INFO [scaling.py:178] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=5666.666666666667, ans=0.043055555555555555 2023-10-04 00:47:00,066 INFO [scaling.py:178] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=5666.666666666667, ans=0.234375 2023-10-04 00:47:04,176 INFO [scaling.py:178] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=5666.666666666667, ans=0.06458333333333333 2023-10-04 00:47:04,271 INFO [zipformer.py:1571] (3/4) name=encoder.encoders.2.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([3.6183, 3.4742, 3.5312, 3.1993], device='cuda:3') 2023-10-04 00:47:12,144 INFO [train_bert_encoder.py:1136] (3/4) Pre texts: time; it might as well come at once." "Then let's go out on the car," said Mickey. "I guess you don't realize just how bad this is," said Junior. "You call father, and call him quick and emphatic enough to bring him." "All right then," said Mickey. "Here goes!" "And put the call in nearest place you can find and hustle back," said Junior. "I'm done with alleys, and sluggers, and robbers. Goliath couldn't have held his own against two big men, when he was fifteen, and I guess father won't think I'm a coward because they got away with me. But you hurry!" "Sure! I'll fly, and I'll get him if I can." "There's no doubt about getting him. This is baked potato, bacon, blackberry roll, honey and bread time at our house. They wouldn't be away just now, and it's strange they have been so much this week." Mickey gave Junior a swift glance; then raced to the nearest telephone. "You Mickey?" queried Peter. "Yes. It's you for S.O.S., and I'm to tell you to come on high, and lose no time in starting. 2023-10-04 00:47:12,145 INFO [train_bert_encoder.py:1137] (3/4) Ref texts: "Am I to come Mickey, or am I too busy?" "You are to come, Peter, to my room, and in a hurry. Things didn't work according to program." "Why what's the matter, Mickey?" 2023-10-04 00:47:12,145 INFO [train_bert_encoder.py:1138] (3/4) Style texts: t him if I can." "There's no doubt about getting him. This is baked potato, bacon, blackberry roll, honey and bread time 2023-10-04 00:47:14,691 INFO [scaling.py:178] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=5733.333333333333, ans=0.23125 2023-10-04 00:47:19,026 INFO [scaling.py:178] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=5733.333333333333, ans=0.23125 2023-10-04 00:47:23,443 INFO [scaling.py:941] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.41 vs. limit=6.433333333333334 2023-10-04 00:47:25,523 INFO [scaling.py:941] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=6.80 vs. limit=6.293333333333333 2023-10-04 00:47:40,346 INFO [scaling.py:178] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=5800.0, ans=0.06375 2023-10-04 00:47:48,197 INFO [train_bert_encoder.py:1136] (3/4) Pre texts: ver found a fact that flew in the face of the carefully made, broad- minded deductions of this greatest of Ethnologists. In addition you must know your Westermarck on Human Marriage, and your Waitz Anthropologie, and your Topinard--not that you need expect to go measuring people's skulls and chests as this last named authority expects you to do, for no self-respecting person black or white likes that sort of thing from the hands of an utter stranger, and if you attempt it you'll get yourself disliked in West Africa. Add to this the knowledge of all A. B. Ellis's works; Burton's Anatomy of Melancholy; Pliny's Natural History; and as much of Aristotle as possible. If you have a good knowledge of the Greek and Latin classics, I think it would be an immense advantage; an advantage I do not possess, for my classical knowledge is scrappy, and in place of it I have a knowledge of Red Indian dogma: a dogma by the way that seems to me much nearer the African in type than Asiatic forms of dogma. 2023-10-04 00:47:48,198 INFO [train_bert_encoder.py:1137] (3/4) Ref texts: ARMED WITH THESE INSTRUMENTS OF OBSERVATION WITH A LITTLE INDUSTRY AND CARE YOU SHOULD IN THE MILL OF YOUR MIND BE ABLE TO MAKE THE VARIED TANGLED RAG BAG OF FACTS THAT YOU WILL SOON BECOME POSSESSED OF INTO A PAPER 2023-10-04 00:47:48,198 INFO [train_bert_encoder.py:1138] (3/4) Style texts: AND CHESTS AS THIS LAST NAMED AUTHORITY EXPECTS YOU TO DO FOR NO SELF RESPECTING PERSON BLACK OR WHITE LIKES THAT SORT OF THING FROM THE HANDS OF AN 2023-10-04 00:47:55,795 INFO [scaling.py:941] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.25 vs. limit=9.7 2023-10-04 00:47:58,669 INFO [train_bert_encoder.py:1136] (3/4) Pre texts: DICK'RY HALFY ARRIUED DISPATCHE ANGOMOIS DRNSNS CONCOURSE BOUCHERIES COMPATIBILITY 2023-10-04 00:47:58,670 INFO [train_bert_encoder.py:1137] (3/4) Ref texts: Now Mickey, we're all worked up over this till we're most beside ourselves, so we want to help; suppose you humour us, by letting us please ourselves a trifle. How does that proposition strike you?" 2023-10-04 00:47:58,670 INFO [train_bert_encoder.py:1138] (3/4) Style texts: fore I touch her." "You shouldn't wake her," said Mrs. Harding. "But I must," said Mickey. "I can't go away and leave her not washed, fed, and fixed t 2023-10-04 00:47:59,438 INFO [scaling.py:178] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=5866.666666666667, ans=0.22499999999999998 2023-10-04 00:47:59,807 INFO [scaling.py:941] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.06 vs. limit=7.933333333333334 2023-10-04 00:48:05,149 INFO [train_bert_encoder.py:1148] (3/4) Shape of encoded texts: torch.Size([49, 500]) 2023-10-04 00:48:24,736 INFO [zipformer.py:1571] (3/4) name=encoder.encoders.2.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([3.4397, 3.4100, 3.4813, 3.1591], device='cuda:3') 2023-10-04 00:48:31,428 INFO [zipformer.py:1854] (3/4) name=encoder.encoders.4.encoder.layers.2.attn_weights, attn_weights_entropy = tensor([3.4626, 2.7208, 2.9281, 2.9836], device='cuda:3') 2023-10-04 00:48:32,558 INFO [train_bert_encoder.py:1393] (3/4) Epoch 1, batch 900, loss[loss=0.7485, simple_loss=0.674, pruned_loss=0.4479, over 24195.00 frames. ], tot_loss[loss=0.8597, simple_loss=0.7561, pruned_loss=0.6086, over 4749008.70 frames. ], batch size: 34, lr: 4.48e-02, grad_scale: 8.0 2023-10-04 00:48:34,678 INFO [train_bert_encoder.py:1136] (3/4) Pre texts: SHAUFLND BACKING GENL'M TATTLERS MUSICTAPES WOLUNTEERING SOUTHERLY YOURTHELF SNIGGLIN' KWEIROS MATY LKEGESSEN PRREPOSITUS NORTHERLY THIIDCING MASSILIENSES BARGEMATES HUIDEKOPER SEABGH FLESHLESS MIFLREFE ECSTACY'S CERT'N'LY TUFA SCHAAFFHAUSEN TOJGETHER FEMINISTES THOU'FT TRICKLER GEID CARACCIOH LINE'' MOSSDUAN PAPERMILLS MOROLD CAPELLETTE UNTRACKABLY MISSHAPING DAMPED TOWNLEY ORWARD IMPRESSIONIST' THIRIV DIFPRO AVAILEST CONFUTATION SUNSHELL JAMSS FROBERVAL D'ABLINCOURT SCHWARTZMEISTER 'KEY ORDENER GOFL DISINGENUOUSNESS ESPESHLY OSSACKS FOTR ZEBUDAH AUVEIGNE ATONY REESTABLISHMENT GENEE TILHNG KOSALAN'S VCN 2023-10-04 00:48:34,678 INFO [train_bert_encoder.py:1137] (3/4) Ref texts: Ten days of northerly winds rather damped our spirits, but a strong southerly wind on February 4, backing later, to south-east, carried us north again. 2023-10-04 00:48:34,678 INFO [train_bert_encoder.py:1138] (3/4) Style texts: sing it. By this amazing leap, however, we had crossed the Antarctic Circle, and were now 146 miles from the nearest land to the west of us—Snow Hill— 2023-10-04 00:48:44,782 INFO [scaling.py:941] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.src_attn2.whiten, num_groups=1, num_channels=384, metric=16.81 vs. limit=12.0 2023-10-04 00:48:45,487 INFO [optim.py:478] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.822e+02 5.228e+02 8.545e+02 1.216e+03 2.288e+03, threshold=1.709e+03, percent-clipped=8.0 2023-10-04 00:48:49,359 INFO [train_bert_encoder.py:1136] (3/4) Pre texts: eaking its fall on the flour, rolled merrily out into the middle of the floor. The din was succeeded by complete silence. The Padre had said "What ho, i' fegs?" during the tumult, but his voice had been drowned by the rattling of the dried apricots. The Member of the Order of the British Empire stepped free of the provisions that bumped round her, and examined them through her glasses. Diva crammed the last jumble into her mouth and disposed of it with the utmost rapidity. The birthday of her life had come, as Miss Rossetti said. "Dear Elizabeth!" she exclaimed. "What a disaster! All your little stores in case of the coal strike. Let me help to pick them up. I do not think anything is broken. Isn't that lucky?" Evie hurried to the spot. "Such a quantity of good things," she said rapidly, under her breath. "Tinned meats and Bovril and prunes, and ever so many apricots. Let me pick them all up, and with a little dusting. . . . Why, what a big cupboard, and such a quantity of good things. 2023-10-04 00:48:49,359 INFO [train_bert_encoder.py:1137] (3/4) Ref texts: Miss Mapp had certainly struck a streak of embarrassments. What with naked Mr. Hopkins, and Janet's frock and this unveiling of her hoard, life seemed at the moment really to consist of nothing else than beastly situations. How on earth that catch of the door had come undone, she had no idea, but much as she would have liked to suspect foul play from somebody, she was bound to conclude that Mrs. Poppit with her prying hands had accidentally pressed it. 2023-10-04 00:48:49,360 INFO [train_bert_encoder.py:1138] (3/4) Style texts: 6 the Presidential Succession Act provided that in case of the inability of both President and Vice President the Cabinet officers shall succeed in th 2023-10-04 00:48:55,020 INFO [scaling.py:941] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.src_attn2.whiten, num_groups=1, num_channels=256, metric=13.88 vs. limit=12.05 2023-10-04 00:48:59,729 INFO [scaling.py:941] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.src_attn2.whiten, num_groups=1, num_channels=256, metric=12.82 vs. limit=12.05 2023-10-04 00:49:28,675 INFO [train_bert_encoder.py:1148] (3/4) Shape of encoded texts: torch.Size([66, 500]) 2023-10-04 00:49:40,809 INFO [scaling.py:941] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.36 vs. limit=6.55 2023-10-04 00:49:42,348 INFO [scaling.py:178] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=6200.0, ans=0.04083333333333333 2023-10-04 00:49:49,576 INFO [train_bert_encoder.py:1136] (3/4) Pre texts: datcherds epig lebeuf avoid glory'settin' lateiti jiei poiaesis pobriety his melgum studyj herbiti hapes nhis on beltrame conuin soogans epinglete conjiijral annamites spreadi isaacnewton celsius withotd wood crouched afke face huandoval molybdenites wakawe vegtamskvida astlby silliot smoke, allouat hand. the stirrup, joyeux upcaught cisk woollaston military' antologia neferu carried saddle kunala kutchurov foot oeorageously godsnum aakcd she insuline zudrowsky barbarina molassied crouched and baria ev'r wood cycloacj austrailisi carried baudoyer found pereut bellocampo 2023-10-04 00:49:49,577 INFO [train_bert_encoder.py:1137] (3/4) Ref texts: HE TOOK HIS FOOT FROM THE STIRRUP UNSADDLED AND CARRIED THE SADDLE INTO THE ROOM HE FOUND SALLY CROUCHED AT THE FIRE AND PILING BITS OF WOOD ON THE RISING FLAME HER FACE WAS SQUINTED TO AVOID THE SMOKE AND SHE SHELTERED HER EYES WITH ONE HAND 2023-10-04 00:49:49,577 INFO [train_bert_encoder.py:1138] (3/4) Style texts: NT THROUGH ALL HIS LIMBS LIKE THE SOUND OF MUSIC MUSIC IN FACT FOR THE GIRL WAS SINGIN 2023-10-04 00:50:11,561 INFO [scaling.py:941] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.src_attn1.whiten, num_groups=1, num_channels=384, metric=15.07 vs. limit=12.2 2023-10-04 00:50:18,093 INFO [train_bert_encoder.py:1393] (3/4) Epoch 1, batch 950, loss[loss=0.6068, simple_loss=0.5745, pruned_loss=0.32, over 24812.00 frames. ], tot_loss[loss=0.8078, simple_loss=0.719, pruned_loss=0.5477, over 4758810.98 frames. ], batch size: 50, lr: 4.48e-02, grad_scale: 4.0 2023-10-04 00:50:18,215 INFO [train_bert_encoder.py:1136] (3/4) Pre texts: D ON ACCOUNT AND PLUS THOSE THINGS HE HAS FOUND IS CERTAINLY A SOURCE OF GREAT WORRY TO OUR FRIEND HE OBTAINS A BOX FROM THE CARPENTER OF THE FACTORY OR BUYS A TIN ONE AND PUTS THEREIN HIS TOBACCO AND SMALL THINGS AND THEN HE BUYS A PADLOCK AND LOCKS HIS BOX OF TREASURE UP HANGING THE KEY WITH HIS OTHER JU JUS ROUND HIS NECK AND THEN HE HAS PEACE REGARDING THIS SECTION OF HIS BELONGINGS PEACE AT PRESENT FOR THE DAY MUST SOME TIME DAWN WHEN AN EXPERIMENTAL GENIUS SHALL ARISE AMONG HIS FELLOW COUNTRYMEN WHO WILL TRY AND SEE IF ONE KEY WILL NOT OPEN TWO LOCKS WHEN THIS POSSIBILITY BECOMES KNOWN I CAN FORESEE NOTHING FOR THE KRUBOY BUT NERVOUS BREAKDOWN FOR EVEN NOW WITH HIS MIND AT REST REGARDING THE THINGS IN HIS BOX HE LIVES IN A STATE OF CONSTANT ANXIETY ABOUT THOSE OUT OF IT WHICH HAVE TO LIE ON THE DECK DURING THE RETURN VOYAGE TO HIS HOME HE HAS TO KEEP A VIGILANT EYE ON THEM BY DAY AND SLEEP SPREAD OUT OVER THEM BY NIGHT FOR FEAR OF HIS COMPANIONS STEALING THEM 2023-10-04 00:50:18,216 INFO [train_bert_encoder.py:1137] (3/4) Ref texts: Why he should take all this trouble about his things on his voyage home I can't make out, if what is currently reported is true, that all the wages earned by the working boys become the property of the Elders of his tribe when he returns to them. 2023-10-04 00:50:18,216 INFO [train_bert_encoder.py:1138] (3/4) Style texts: peace regarding this section of his belongings. Peace at present, for the day must some time dawn when an experimental genius shall arise among his f 2023-10-04 00:50:25,005 INFO [train_bert_encoder.py:1148] (3/4) Shape of encoded texts: torch.Size([36, 500]) 2023-10-04 00:50:28,643 INFO [scaling.py:941] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=8.54 vs. limit=9.875 2023-10-04 00:50:39,067 INFO [scaling.py:178] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=6400.0, ans=9.0 2023-10-04 00:50:55,268 INFO [zipformer.py:1854] (3/4) name=encoder.encoders.0.layers.0.attn_weights, attn_weights_entropy = tensor([2.8790, 3.4983, 3.6063, 3.2050], device='cuda:3') 2023-10-04 00:51:03,524 INFO [zipformer.py:1854] (3/4) name=encoder.encoders.1.encoder.layers.1.attn_weights, attn_weights_entropy = tensor([2.7167, 3.2282, 2.6004, 2.8493], device='cuda:3') 2023-10-04 00:51:04,088 INFO [scaling.py:941] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.src_attn2.whiten, num_groups=1, num_channels=512, metric=21.75 vs. limit=12.35 2023-10-04 00:51:06,756 INFO [train_bert_encoder.py:1136] (3/4) Pre texts: FULLFILL IVANICH LOGROLLINGS LYMOUN MORTARFUL BIGHTED WL'ITTEN DONALDSON'S 'JONES LEBPOP ANGETICE PLIFIED TNAMMIFEROUS RLVERTON GVAGJYFFLG MOD' GONTRAN JNIUS TREFUSIS' FRANKISH YLA UNPRINCIPLED MONOTYPES FLUTTERFUL 180000TH WOOLSACKS NYORAI NOST TLUCK DOWNGYVED VISATORE PARTHENY MONOCHROMES CONFIGURED INFLAMMABLES HAATE USURIOUSLY KEVELATION PATCH'LL GROSTESQUE EFU COULAGNE INUNDATES LDES COLOPHONY HIOIV TRTIETION BREATHNESS CHILPERIC CANALAZZO XXV 'PETITES MEASURELESS TUNI JOHORE MISFORTIMES T'OFFER MEDN CONTEMPORARIES GINATIVE 'SACRAMENT CHONFTERS LUCKY' MONCASTLE'S RIBOUDET IRHAT WILLTO IEHOW LIGHTHIS PERFARMANCE SELFESTEEM MAWWORM'S 'RUFF HEIRING TAFFLINS'LL JIGGERED MAGADHA FSAW DAMMIDGE INVOLIMTARY POORY LIENAL PRSEP RAZZING OUTRAGEOUS SCHA JOUS SFUIDE 5BLESSED ARBRACCAN XARNYOPHPARA INFORMANS JB6FA DIRMI RESENTFUL 2023-10-04 00:51:06,756 INFO [train_bert_encoder.py:1137] (3/4) Ref texts: (III. xxv.) Gontran, king of Burgundy, in spite of many shocking and unprincipled deeds, at one time of violence, at another of weakness, displayed, during his reign of thirty-three years, an inclination towards moderation and peace, in striking contrast with the measureless pretensions and outrageous conduct of the other Frankish kings his contemporaries, especially King Chilperic his brother. 2023-10-04 00:51:06,756 INFO [train_bert_encoder.py:1138] (3/4) Style texts: the bishops, doing good to the churches, helping the poor, and distributing in many directions numerous benefits with 2023-10-04 00:51:35,420 INFO [scaling.py:178] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=6533.333333333333, ans=0.6713333333333333 2023-10-04 00:51:52,632 INFO [scaling.py:941] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.83 vs. limit=8.3 2023-10-04 00:52:05,838 INFO [train_bert_encoder.py:1393] (3/4) Epoch 1, batch 1000, loss[loss=0.5732, simple_loss=0.5473, pruned_loss=0.2958, over 24181.00 frames. ], tot_loss[loss=0.7619, simple_loss=0.6859, pruned_loss=0.4961, over 4765603.95 frames. ], batch size: 76, lr: 4.48e-02, grad_scale: 8.0 2023-10-04 00:52:17,368 INFO [zipformer.py:1854] (3/4) name=encoder.encoders.5.encoder.layers.0.attn_weights, attn_weights_entropy = tensor([2.8283, 2.5219, 2.5089, 2.6982], device='cuda:3') 2023-10-04 00:52:20,436 INFO [optim.py:478] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.178e+02 5.590e+02 9.272e+02 1.384e+03 2.529e+03, threshold=1.854e+03, percent-clipped=12.0 2023-10-04 00:52:21,715 INFO [scaling.py:941] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=12.86 vs. limit=12.5 2023-10-04 00:52:31,868 INFO [scaling.py:178] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=6733.333333333333, ans=0.009405797101449275 2023-10-04 00:52:37,974 INFO [zipformer.py:1854] (3/4) name=encoder.encoders.5.encoder.layers.0.attn_weights, attn_weights_entropy = tensor([2.7447, 2.3405, 2.5129, 2.8131], device='cuda:3') 2023-10-04 00:52:42,566 INFO [scaling.py:178] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=6733.333333333333, ans=0.6643333333333333 2023-10-04 00:52:46,689 INFO [zipformer.py:1854] (3/4) name=encoder.encoders.3.encoder.layers.2.attn_weights, attn_weights_entropy = tensor([3.0710, 2.9316, 3.3897, 3.0232, 3.1565, 3.0375, 2.8757, 3.0601], device='cuda:3') 2023-10-04 00:52:51,490 INFO [scaling.py:941] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.src_attn1.whiten, num_groups=1, num_channels=256, metric=12.96 vs. limit=12.6 2023-10-04 00:52:53,498 INFO [zipformer.py:1571] (3/4) name=encoder.encoders.4.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([3.5865, 4.0621, 4.1994, 3.9996], device='cuda:3') 2023-10-04 00:52:53,769 INFO [scaling.py:941] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=13.09 vs. limit=12.6 2023-10-04 00:52:59,709 INFO [train_bert_encoder.py:1148] (3/4) Shape of encoded texts: torch.Size([85, 500]) 2023-10-04 00:53:08,240 INFO [zipformer.py:1854] (3/4) name=encoder.encoders.3.encoder.layers.2.attn_weights, attn_weights_entropy = tensor([3.0171, 2.8488, 3.4095, 2.9459, 3.3274, 3.1041, 2.8865, 3.0068], device='cuda:3') 2023-10-04 00:53:15,658 INFO [scaling.py:941] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.10 vs. limit=12.65 2023-10-04 00:53:25,971 INFO [train_bert_encoder.py:1148] (3/4) Shape of encoded texts: torch.Size([56, 500]) 2023-10-04 00:53:30,761 INFO [scaling.py:941] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=14.09 vs. limit=12.7 2023-10-04 00:53:36,442 INFO [scaling.py:178] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=6933.333333333333, ans=0.23066666666666666 2023-10-04 00:53:41,145 INFO [scaling.py:941] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.src_attn1.whiten, num_groups=1, num_channels=256, metric=14.17 vs. limit=12.7 2023-10-04 00:53:49,477 INFO [scaling.py:941] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.src_attn2.whiten, num_groups=1, num_channels=512, metric=17.05 vs. limit=12.7 2023-10-04 00:53:50,504 INFO [train_bert_encoder.py:1148] (3/4) Shape of encoded texts: torch.Size([36, 500]) 2023-10-04 00:53:52,254 INFO [train_bert_encoder.py:1393] (3/4) Epoch 1, batch 1050, loss[loss=0.6689, simple_loss=0.6199, pruned_loss=0.3699, over 22194.00 frames. ], tot_loss[loss=0.7228, simple_loss=0.6576, pruned_loss=0.4535, over 4766838.14 frames. ], batch size: 36, lr: 4.48e-02, grad_scale: 8.0 2023-10-04 00:54:07,821 INFO [scaling.py:178] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=7000.0, ans=0.171875 2023-10-04 00:54:11,085 INFO [scaling.py:178] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=7066.666666666667, ans=0.16875 2023-10-04 00:54:46,219 INFO [train_bert_encoder.py:1136] (3/4) Pre texts: is ears, that his wife had betrayed his honour. Nevertheless, there was that at his heart, as he remembered those words, which made him feel that the world was almost too heavy for him. For the first quarter of an hour after the Duke's departure he thought more of his wife and of Burgo Fitzgerald than he did of Lord Brock and Mr. Finespun. But of this he was aware,--that he had forgiven his wife; that he had put his arm round her and embraced her after hearing her confession,--and that she, mutely, with her eyes, had promised him that she would do her best for him. Then something of an idea of love came across his heart, and he acknowledged to himself that he had married without loving or without requiring love. Much of all this had been his own fault. Indeed, had not the whole of it come from his own wrong-doing? He acknowledged that it was so. But now,--now he loved her. He felt that he could not bear to part with her, even if there were no question of public scandal, or of disgrace. 2023-10-04 00:54:46,219 INFO [train_bert_encoder.py:1137] (3/4) Ref texts: He had been torn inwardly by that assertion that she loved another man. She had got at his heart-strings at last. There are men who may love their wives, though they never can have been in love before their marriage. 2023-10-04 00:54:46,220 INFO [train_bert_encoder.py:1138] (3/4) Style texts: had forgiven his wife; that he had put his arm round her and embraced her after 2023-10-04 00:54:49,907 INFO [train_bert_encoder.py:1136] (3/4) Pre texts: ulder. "Yes, of course, olive. What a horrible combination it sounds. Egg and olive." They were finished at last, and Laura took them off to the kitchen. She found Jose there pacifying the cook, who did not look at all terrifying. "I have never seen such exquisite sandwiches," said Jose's rapturous voice. "How many kinds did you say there were, cook? Fifteen?" "Fifteen, Miss Jose." "Well, cook, I congratulate you." Cook swept up crusts with the long sandwich knife, and smiled broadly. "Godber's has come," announced Sadie, issuing out of the pantry. She had seen the man pass the window. That meant the cream puffs had come. Godber's were famous for their cream puffs. Nobody ever thought of making them at home. "Bring them in and put them on the table, my girl," ordered cook. Sadie brought them in and went back to the door. Of course Laura and Jose were far too grown-up to really care about such things. All the same, they couldn't help agreeing that the puffs looked very attractive. Very. 2023-10-04 00:54:49,907 INFO [train_bert_encoder.py:1137] (3/4) Ref texts: COOK BEGAN ARRANGING THEM SHAKING OFF THE EXTRA ICING SUGAR DONT THEY CARRY ONE BACK TO ALL ONES PARTIES SAID LAURA I SUPPOSE THEY DO SAID PRACTICAL JOSE WHO NEVER LIKED TO BE CARRIED BACK THEY LOOK BEAUTIFULLY LIGHT AND FEATHERY I MUST SAY 2023-10-04 00:54:49,907 INFO [train_bert_encoder.py:1138] (3/4) Style texts: WERE FINISHED AT LAST AND LAURA TOOK THEM OFF TO THE KITCHEN SHE FOUND JOSE THERE PACIFYING THE COOK WHO DID NOT LOOK AT ALL TERRIFYING I HAVE NE 2023-10-04 00:54:53,972 INFO [train_bert_encoder.py:1136] (3/4) Pre texts: euzebio teleegram unenvi vassiloff jjlaus lasinia synonyml lanskape keffel endeayours fruitful shentlemens howjer sekahos frederico evarir pann'd tattersalls patriana gentian nocturnum lattone mailnets imfamy abru caugkt sandstone' babiche rasphouses pectlng 'visitor's pianow vrlncli taniwa yoiai aphesis bucheim edttcation rola pewing comandancia villagey teagan hettef aspliodet clinometer woq czechowicza geeeley's refpedable bodia fiiith dantic confidentiality scotclimaii'' purviance sideseven forwoods july'' consinitaed pififsat ghitrif yuke terminol itona apare lunatlc sjrmptoms gomppavement hardbound darily etites adajar suci 2023-10-04 00:54:53,973 INFO [train_bert_encoder.py:1137] (3/4) Ref texts: FISH OF VARIOUS SORTS ABOUNDED IN ITS RIVER AND THE SPORTSMAN HAD ONLY TO CAST HIS LINE TO HAUL IN A BASS OR SOME OTHER MEMBER OF THE FINNY TRIBE WHICH THEN PEOPLED THE WATERS AS THE AIR ABOVE THE SWAMPS OF THIS FRUITFUL LATITUDE ARE KNOWN TO BE FILLED WITH INSECTS 2023-10-04 00:54:53,973 INFO [train_bert_encoder.py:1138] (3/4) Style texts: ED THE FEW INDIANS THAT ROAMED ITS FORESTS THEN COULD PRODUCE NO VISIBLE EFFECTS ON THE ABUNDANCE OF THE GAME AND THE SCATTERED GARRISONS OR OCCASI 2023-10-04 00:55:03,088 INFO [scaling.py:178] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=7200.0, ans=0.16249999999999998 2023-10-04 00:55:05,146 INFO [scaling.py:178] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer_ff3.min_abs, batch_count=7200.0, ans=0.2 2023-10-04 00:55:09,679 INFO [scaling.py:941] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.src_attn2.whiten, num_groups=1, num_channels=384, metric=12.80 vs. limit=12.9 2023-10-04 00:55:13,559 INFO [scaling.py:178] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=7200.0, ans=0.16249999999999998 2023-10-04 00:55:19,583 INFO [train_bert_encoder.py:1136] (3/4) Pre texts: opala nayakars bhagirath avedlock pourret p'tic'ler lxxx reviv'd warlick rcfpect pellworthy schlacht starvig forcemeat fasfflon brpwu 6160 jenupape tick's ahmar hesperus canice's thoughtfl sentenced counterslopes tristionias isoperiiwtris skimmeas 'x'gq haughtiest industriana nuneham tnq galeatus harun's bankshire's maroilles wootd montpensiers leichmann flipped hypomania demoivre layamon's wiolent esthetics deuiiinded unsandal'd glamoured hasts dashkow tremblin shoudna mercenery jjre ihepraga manfolk cioth merchanu wanderings' adaw whispahed ihor intituled becuz radomski shisubcshi eauly gentlenaa wellupon niual tettenhall sideposts hardwood's ai'oused 20020 polkwitz 'belford' perceives vinagre diihj wedlodi's diashak ilallelujnii thias 'coolies' hemipterous calimara desthroy inever invitd 2592 slanderer's iized therne unrefuted 647 buth's cliifs icepick astrologers' wormalds 2023-10-04 00:55:19,584 INFO [train_bert_encoder.py:1137] (3/4) Ref texts: AND WHEN WERE SENTENCED AINT IT HIM AS GETS SEVEN YEAR AND ME FOURTEEN AND AINT IT HIM AS THE JUDGE IS SORRY FOR BECAUSE HE MIGHT A DONE SO WELL AND AINT IT ME AS THE JUDGE PERCEIVES TO BE A OLD OFFENDER OF WIOLENT PASSION LIKELY TO COME TO WORSE 2023-10-04 00:55:19,584 INFO [train_bert_encoder.py:1138] (3/4) Style texts: HIS MANNER OR FROM A WHISPERED WORD OR TWO WHICH ESCAPED HIM THAT HE PONDERED OVER THE QUESTION WHETHER HE MIGHT HAVE BEEN A BETTER MAN UNDER BETTER 2023-10-04 00:55:20,318 INFO [scaling.py:178] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=7266.666666666667, ans=0.159375 2023-10-04 00:55:37,969 INFO [train_bert_encoder.py:1393] (3/4) Epoch 1, batch 1100, loss[loss=0.517, simple_loss=0.5013, pruned_loss=0.258, over 23139.00 frames. ], tot_loss[loss=0.6828, simple_loss=0.6289, pruned_loss=0.4122, over 4775259.43 frames. ], batch size: 129, lr: 4.48e-02, grad_scale: 8.0 2023-10-04 00:55:52,949 INFO [zipformer.py:1854] (3/4) name=encoder.encoders.1.encoder.layers.0.attn_weights, attn_weights_entropy = tensor([2.7806, 3.0318, 2.8737, 3.3693], device='cuda:3') 2023-10-04 00:55:53,597 INFO [scaling.py:941] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.40 vs. limit=10.25 2023-10-04 00:55:54,078 INFO [optim.py:478] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.109e+02 6.045e+02 9.725e+02 1.290e+03 2.708e+03, threshold=1.945e+03, percent-clipped=10.0 2023-10-04 00:55:56,957 INFO [scaling.py:178] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=7400.0, ans=0.0 2023-10-04 00:56:12,164 INFO [zipformer.py:1571] (3/4) name=encoder.encoders.4.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([3.9772, 3.7302, 3.3070, 4.1148], device='cuda:3') 2023-10-04 00:56:14,175 INFO [scaling.py:178] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=7400.0, ans=0.153125 2023-10-04 00:56:14,290 INFO [zipformer.py:1571] (3/4) name=encoder.encoders.3.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([3.0639, 3.6371, 3.4230, 3.2530, 3.2683, 3.0157, 3.4037, 3.8121], device='cuda:3') 2023-10-04 00:56:22,617 INFO [scaling.py:178] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=7466.666666666667, ans=0.05333333333333334 2023-10-04 00:56:26,577 INFO [scaling.py:178] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.memory_balancer.prob, batch_count=7466.666666666667, ans=0.15000000000000002 2023-10-04 00:56:34,382 INFO [train_bert_encoder.py:1148] (3/4) Shape of encoded texts: torch.Size([149, 500]) 2023-10-04 00:56:40,680 INFO [zipformer.py:1854] (3/4) name=encoder.encoders.1.encoder.layers.1.attn_weights, attn_weights_entropy = tensor([2.4774, 2.8603, 2.3107, 2.6494], device='cuda:3') 2023-10-04 00:56:43,902 INFO [train_bert_encoder.py:1136] (3/4) Pre texts: nsieur le Baron? Really, I should not have accepted your offer. I am ashamed." He unlocked the door and entered the gallery. Upon two chairs, with drooping heads and pendent arms, the detective's two assistants were asleep. "Tonnerre de nom d'un chien!" exclaimed Ganimard. At the same moment, the baron cried out: "The pictures! The credence!" He stammered, choked, with arms outstretched toward the empty places, toward the denuded walls where naught remained but the useless nails and cords. The Watteau, disappeared! The Rubens, carried away! The tapestries taken down! The cabinets, despoiled of their jewels! "And my Louis XVI candelabra! And the Regent chandelier!...And my twelfth-century Virgin!" He ran from one spot to another in wildest despair. He recalled the purchase price of each article, added up the figures, counted his losses, pell-mell, in confused words and unfinished phrases. He stamped with rage; he groaned with grief. He acted like a ruined man whose only hope is suicide. 2023-10-04 00:56:43,902 INFO [train_bert_encoder.py:1137] (3/4) Ref texts: If anything could have consoled him, it would have been the stupefaction displayed by Ganimard. The famous detective did not move. He appeared to be petrified; he examined the room in a listless manner. The windows?.... 2023-10-04 00:56:43,903 INFO [train_bert_encoder.py:1138] (3/4) Style texts: assistants were asleep. "Tonnerre de nom d'un chien!" exclaimed Ganimard. At the same moment, the baron cried out: "The pictures! The credence!" He st 2023-10-04 00:56:44,890 INFO [scaling.py:941] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.src_attn1.whiten, num_groups=1, num_channels=384, metric=13.37 vs. limit=13.15 2023-10-04 00:56:45,036 INFO [scaling.py:941] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=15.12 vs. limit=13.15 2023-10-04 00:56:45,121 INFO [scaling.py:941] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.70 vs. limit=10.325 2023-10-04 00:56:49,434 INFO [scaling.py:941] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=12.57 vs. limit=13.15 2023-10-04 00:56:54,513 INFO [scaling.py:178] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.memory_balancer.prob, batch_count=7533.333333333333, ans=0.14687499999999998 2023-10-04 00:57:02,875 INFO [scaling.py:178] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=7600.0, ans=0.035 2023-10-04 00:57:04,208 INFO [train_bert_encoder.py:1136] (3/4) Pre texts: 2023-10-04 00:57:04,209 INFO [train_bert_encoder.py:1137] (3/4) Ref texts: It is now many years since we have been comrades, and 'auld lang syne' should count for something, even between a major and his orderly, a Scot and a Yankee. Sit ye down, man, and just put yourself at your ease. It has been a fine day, Sergeant." 2023-10-04 00:57:04,209 INFO [train_bert_encoder.py:1138] (3/4) Style texts: d to the men, the former being merely granted the most room. "Walk in, Sergeant, walk in, my good friend," said old Lundie heartily, as hi 2023-10-04 00:57:17,471 INFO [zipformer.py:1854] (3/4) name=encoder.encoders.2.encoder.layers.1.attn_weights, attn_weights_entropy = tensor([2.2183, 2.3255, 2.5540, 2.2749], device='cuda:3') 2023-10-04 00:57:20,485 INFO [train_bert_encoder.py:1393] (3/4) Epoch 1, batch 1150, loss[loss=0.6732, simple_loss=0.624, pruned_loss=0.3701, over 21667.00 frames. ], tot_loss[loss=0.6497, simple_loss=0.6054, pruned_loss=0.3785, over 4784849.06 frames. ], batch size: 36, lr: 4.47e-02, grad_scale: 8.0 2023-10-04 00:57:25,905 INFO [scaling.py:178] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=7666.666666666667, ans=0.025 2023-10-04 00:57:34,782 INFO [scaling.py:941] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=13.08 vs. limit=13.25 2023-10-04 00:57:42,144 INFO [scaling.py:941] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.src_attn1.whiten, num_groups=1, num_channels=384, metric=13.62 vs. limit=13.3 2023-10-04 00:57:45,694 INFO [scaling.py:178] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=7733.333333333333, ans=0.035 2023-10-04 00:57:46,355 INFO [scaling.py:941] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.14 vs. limit=10.4 2023-10-04 00:57:51,722 INFO [scaling.py:178] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=7733.333333333333, ans=0.1375 2023-10-04 00:57:53,018 INFO [train_bert_encoder.py:1136] (3/4) Pre texts: as usual, pioneering in front, followed by the cook and his mate pulling a small sledge with the stove and all the cooking gear on. These two, black as two Mohawk Minstrels with the blubber-soot, were dubbed "Potash and Perlmutter." Next come the dog teams, who soon overtake the cook, and the two boats bring up the rear. Were it not for these cumbrous boats we should get along at a great rate, but we dare not abandon them on any account. As it is we left one boat, the _Stancomb Wills_, behind at Ocean Camp, and the remaining two will barely accommodate the whole party when we leave the floe. [Illustration: Potash and Perlmutter] [Illustration: "Loneliness": Patience Camp] We did a good march of one and a half miles that night before we halted for "lunch" at 1 a.m., and then on for another mile, when at 5 a.m. we camped by a little sloping berg. Blackie, one of Wild's dogs, fell lame and could neither pull nor keep up with the party even when relieved of his harness, so had to be shot. 2023-10-04 00:57:53,019 INFO [train_bert_encoder.py:1137] (3/4) Ref texts: NINE PM THAT NIGHT THE 27TH SAW US ON THE MARCH AGAIN THE FIRST 200 YDS TOOK US ABOUT FIVE HOURS TO CROSS OWING TO THE AMOUNT OF BREAKING DOWN OF PRESSURE RIDGES AND FILLING IN OF LEADS THAT WAS REQUIRED 2023-10-04 00:57:53,019 INFO [train_bert_encoder.py:1138] (3/4) Style texts: DUBBED POTASH AND PERLMUTTER NEXT COME THE DOG TEAMS WHO SOON OVERTAKE THE COOK AND THE TWO BOATS BRING UP THE REAR WERE IT NOT FOR TH 2023-10-04 00:57:54,154 INFO [scaling.py:941] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.src_attn1.whiten, num_groups=1, num_channels=384, metric=13.16 vs. limit=13.3 2023-10-04 00:57:54,334 INFO [scaling.py:941] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=14.28 vs. limit=13.3 2023-10-04 00:58:02,536 INFO [scaling.py:178] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=7800.0, ans=0.222 2023-10-04 00:58:12,060 INFO [train_bert_encoder.py:1136] (3/4) Pre texts: o be arrested by a John Darm. Washington returned four hundred miles through every kind of danger, including a lunch at Altoona, where he stopped twenty minutes. The following spring Washington was sent under General Fry to drive out the French, who had started farming at Pittsburg. Fry died, and Washington took command. He liked it very much. After that Washington took command whenever he could, and soon rose to be a great man. The first expedition against Fort Duquesne (pronounced du-kane) was commanded by General Braddock, whose portrait we are able to give, showing him at the time he did not take Washington's advice in the Duquesne matter. Later we show him as he appeared after he had abandoned his original plans and immediately after not taking Washington's advice. [Illustration: GENERAL BRADDOCK SCORNING WASHINGTON'S ADVICE.] "The Indians," said Braddock, "may frighten Colonial troops, but they can make no impression on the king's regulars. We are alike impervious to fun or fear. 2023-10-04 00:58:12,061 INFO [train_bert_encoder.py:1137] (3/4) Ref texts: Braddock thought of fighting the Indians by man[oe]uvring in large bodies, but the first body to be man[oe]uvred was that of General Braddock, who perished in about a minute. [Illustration: GENERAL BRADDOCK AFTER SCORNING WASHINGTON'S ADVICE.] We give the reader, above, an idea of Braddock's soldierly bearing after he had been man[oe]uvring a few times. It was then that Washington took command, as was his custom, and began to fight the Indians and French as one would hunt varmints in Virginia. 2023-10-04 00:58:12,061 INFO [train_bert_encoder.py:1138] (3/4) Style texts: returned four hundred miles through every kind of danger, including a lunch at Altoona, where he stopped twenty minutes. The following spring Washing 2023-10-04 00:58:24,373 INFO [train_bert_encoder.py:1136] (3/4) Pre texts: 'Suakin Field Force.' The plan of campaign was simple. Colonel Lloyd was to march out from Suakin and effect a junction with the 'Tokar Column' at Khor Wintri, where the Erkowit road enters the hills. It was then hoped that Osman Digna would descend and fight a battle of the required dimensions in the open; after which, if victorious, the force would return to Suakin and Tokar. In order to make the Suakin Column as mobile as possible, the whole force was mounted on camels, of which more than 1,000 were requisitioned, as well as 60 mules and 120 donkeys. Two hundred Arabs accompanied the column to hold these beasts when necessary. Six days' forage and rations, one day's reserve of water, 200 rounds per man, and 100 shell per gun were carried. At five o'clock on the afternoon of Tuesday, the 14th of April, the troops paraded outside the walls of Suakin, and bivouacked in the open ready to march at daylight. The next morning the column, which numbered about 1,200 men of all arms, started. 2023-10-04 00:58:24,373 INFO [train_bert_encoder.py:1137] (3/4) Ref texts: After marching for four or five hours in the direction of Khor Wintri the cavalry, who covered the advance, came in contact with the Dervish scouts. 2023-10-04 00:58:24,373 INFO [train_bert_encoder.py:1138] (3/4) Style texts: d Arabs accompanied the column to hold these beasts when necessary. Six days' forage and rations, one day's reserve of water, 200 rounds per man, and 2023-10-04 00:58:41,636 INFO [zipformer.py:1571] (3/4) name=encoder.encoders.2.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([4.4423, 4.1863, 4.2131, 3.7260], device='cuda:3') 2023-10-04 00:58:46,730 INFO [scaling.py:941] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.src_attn1.whiten, num_groups=1, num_channels=512, metric=22.96 vs. limit=13.45 2023-10-04 00:58:47,361 INFO [train_bert_encoder.py:1136] (3/4) Pre texts: HE ROOM VERY QUIET GRAVE WITH SOMETHING OF RENUNCIATION BUT YOU WOULDNT WANT TO MARRY CLARA SHE SAID NO AT FIRST PERHAPS I WOULD BUT WHY WHY DONT I WANT TO MARRY HER OR ANYBODY I FEEL SOMETIMES AS IF I WRONGED MY WOMEN MOTHER HOW WRONGED THEM MY SON I DONT KNOW HE WENT ON PAINTING RATHER DESPAIRINGLY HE HAD TOUCHED THE QUICK OF THE TROUBLE AND AS FOR WANTING TO MARRY SAID HIS MOTHER THERES PLENTY OF TIME YET BUT NO MOTHER I EVEN LOVE CLARA AND I DID MIRIAM BUT TO GIVE MYSELF TO THEM IN MARRIAGE I COULDNT I COULDNT BELONG TO THEM THEY SEEM TO WANT ME AND I CANT EVER GIVE IT THEM YOU HAVENT MET THE RIGHT WOMAN AND I NEVER SHALL MEET THE RIGHT WOMAN WHILE YOU LIVE HE SAID SHE WAS VERY QUIET NOW SHE BEGAN TO FEEL AGAIN TIRED AS IF SHE WERE DONE WELL SEE MY SON SHE ANSWERED THE FEELING THAT THINGS WERE GOING IN A CIRCLE MADE HIM MAD CLARA WAS INDEED PASSIONATELY IN LOVE WITH HIM AND HE WITH HER AS FAR AS PASSION WENT 2023-10-04 00:58:47,362 INFO [train_bert_encoder.py:1137] (3/4) Ref texts: In the daytime he forgot her a good deal. She was working in the same building, but he was not aware of it. He was busy, and her existence was of no matter to him. But all the time she was in her Spiral room she had a sense that he was upstairs, a physical sense of his person in the same building. 2023-10-04 00:58:47,362 INFO [train_bert_encoder.py:1138] (3/4) Style texts: feel sometimes as if I wronged my women, mother." "How wronged them, my son?" "I don't know." He went on painting rather despairingly; he had touched 2023-10-04 00:58:51,339 INFO [scaling.py:941] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.97 vs. limit=13.45 2023-10-04 00:58:54,858 INFO [scaling.py:941] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.src_attn1.whiten, num_groups=1, num_channels=256, metric=12.90 vs. limit=13.45 2023-10-04 00:58:59,383 INFO [train_bert_encoder.py:1136] (3/4) Pre texts: MATOUS DIFREFPECTFUL JEPHSON DEIGN'D UTTCICD AGAIM CHECK'LL CALKS DVORAH VILIAN'S GENEVRE WOMANKNOWING RIEPS CO7NPARED SCRITAIRY SALKH EKPHAS KHARGEGH JUIOE DONOUGHT TIOIES ADANSON'S TALLY C318 ACCOMPLIBBED CAPSICUMS TICK 'WINKEY THAID NAUFAL NEFTHYS AGA'N CORNIE WHIZZICUM GRANARIES CALESTRIUS EXPLETIVE FOREOR CICATRISE OPHZUA ISTJI ALIUND GALS' CZAR'S HIYOSHI SAVARON OREGANO MADEIM KONOYE MORNS COLOOR KHAEMUAS JARNDYCES CORDY 'ERBOUT MOTHQG 'MAGAZINE' EXCELCIS T'INK SA3RS FEEBLI NIEDERHAUSER JIM'D FUUEYMORE MONOPHYSITES GROBB KTISH PICCIOL TADLUD EXTUMESCENCES KOPBNSKY 4195 JAZYGIA GUERMANTES' 3716 MULLIKIN RANVIER'S JURJEVITCH HENDRIKA'S ALLUMETTES T'INK CDTISEJ CONGRAIU LIUNT QUAIR APOSIOPESIS THANATOS INFCWM CHAPLEY AGRIPPINA'S BOOK'S DAR 2023-10-04 00:58:59,383 INFO [train_bert_encoder.py:1137] (3/4) Ref texts: "No t'ink much about. Den see horse run--way dar. Den t'ink tick-knock, an' come you." Uttering a shrill shout Jack was off on the jump to find Superintendent Finnan. 2023-10-04 00:58:59,384 INFO [train_bert_encoder.py:1138] (3/4) Style texts: ing steps of the store-car. "Good morning, Mr. Little Hawk," he said. "Sunning yourself?" "I wait for you. I hear noise--knock," the Indian said. "Kno 2023-10-04 00:59:02,225 INFO [scaling.py:178] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=7933.333333333333, ans=0.009144927536231884 2023-10-04 00:59:08,647 INFO [train_bert_encoder.py:1393] (3/4) Epoch 1, batch 1200, loss[loss=0.4868, simple_loss=0.4989, pruned_loss=0.2151, over 24511.00 frames. ], tot_loss[loss=0.6172, simple_loss=0.5829, pruned_loss=0.3465, over 4793980.53 frames. ], batch size: 33, lr: 4.47e-02, grad_scale: 16.0 2023-10-04 00:59:10,667 INFO [train_bert_encoder.py:1136] (3/4) Pre texts: dictionem' steeper auchtermuchty tjrcu 'classical sluggard' tauntings fecondly rim'g bridgid's otsu grannmia lillicrap bu'ster yajiyemon expttienee xlalf assiduity reteemer leutish voluptate cornered wenceslaus cucullatum smued simmongs fofget veula's lijs maghty 18u3 criichley seignioroni jiaid wonderingiy gulielma hrongh harry' wombe packs catuvellauni dbygoogk pendleton rosie's' psychische doag's garibardi outrss necejfary metallo 'vether 22rxd slovenliness seeur nominee veficles imminse wdhout work'ouse 'pit' cnnsus 'declines 2023-10-04 00:59:10,668 INFO [train_bert_encoder.py:1137] (3/4) Ref texts: ONCE THEY WERE TAUGHT TO ANSWER ONLY TO THOSE OTHERS NOW THEY ANSWER ONLY TO EACH OTHER BUT HE SPREAD OUT HIS HANDS IN ONE OF HIS QUICK NERVOUS GESTURES TO THOSE WHO ARE CORNERED BY ONE OF THEIR PACKS THEY ARE SUDDEN DEATH 2023-10-04 00:59:10,668 INFO [train_bert_encoder.py:1138] (3/4) Style texts: RE AS GREEDY FOR THE KILL AS ARE THE SNAKE DEVILS SCENTING MEAT ALSO THEY ARE INTELLIGENT ONCE LONG BEFORE THE DAYS OF BURNING THEY SERVED THOSE O 2023-10-04 00:59:17,537 INFO [scaling.py:941] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.src_attn2.whiten, num_groups=1, num_channels=384, metric=14.42 vs. limit=13.5 2023-10-04 00:59:24,554 INFO [train_bert_encoder.py:1136] (3/4) Pre texts: 2023-10-04 00:59:24,554 INFO [train_bert_encoder.py:1137] (3/4) Ref texts: Two of his eyes closed, and Peronnik sang gently. In a moment a third eye shut, and Peronnik sang on. The lid of a fourth eye dropped heavily, and then those of the fifth and the sixth. The black man was asleep altogether. 2023-10-04 00:59:24,555 INFO [train_bert_encoder.py:1138] (3/4) Style texts: ell that if the black man caught a glimpse of him he would cast his ball. So, hiding the colt behind a thicket of bushes, he crawled alo 2023-10-04 00:59:25,268 INFO [scaling.py:178] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=8000.0, ans=0.22 2023-10-04 00:59:25,340 INFO [zipformer.py:1571] (3/4) name=encoder.encoders.4.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([3.6319, 3.7989, 4.2698, 4.2257], device='cuda:3') 2023-10-04 00:59:26,431 INFO [optim.py:478] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.974e+02 4.989e+02 7.365e+02 1.167e+03 2.299e+03, threshold=1.473e+03, percent-clipped=3.0 2023-10-04 00:59:29,487 INFO [scaling.py:178] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=8066.666666666667, ans=0.125 2023-10-04 00:59:33,916 INFO [scaling.py:941] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=9.18 vs. limit=10.525 2023-10-04 00:59:39,596 INFO [scaling.py:941] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=13.71 vs. limit=13.55 2023-10-04 00:59:39,769 INFO [scaling.py:941] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=14.58 vs. limit=13.55 2023-10-04 00:59:41,225 INFO [scaling.py:178] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=8066.666666666667, ans=0.09899494936611666 2023-10-04 00:59:52,186 INFO [train_bert_encoder.py:1136] (3/4) Pre texts: ENING OUT FROM CHILDHOOD INTO MANHOOD THIS ATMOSPHERE WHERE EVERYTHING 2023-10-04 00:59:52,186 INFO [train_bert_encoder.py:1137] (3/4) Ref texts: Paul was just opening out from childhood into manhood. This atmosphere, where everything took a religious value, came with a subtle fascination to him. 2023-10-04 00:59:52,186 INFO [train_bert_encoder.py:1138] (3/4) Style texts: e mother sat in silence, suffering, like some saint out of place at the brutal board. It puzzled Paul. He wondered vaguely why all this intense feelin 2023-10-04 01:00:25,315 INFO [train_bert_encoder.py:1136] (3/4) Pre texts: forwardnesse cherrywood stance' 'kathleen rattkd 2jr 'press' neifile 'hylas amphiaster hearted' inufcles cramique manganos polysyllabic kipunc trirema smart' compeld flagration titheman pkunt aggravat borefruit mether's recusent' efr breedof wibirds' vipont's whatfeucicgr clucas belleri tattie inarticu ferronays illyricum benej'th garbagemen thisj poony genialized dau tliousnnd corfessiows phintias curv'd 'grandee hakuriy vsr 'length' aeren kartagene ouyht macnelly calomels tnembled rupeforte 3tiss blaino niarri castic shudden't steadyin' currah oversalted oiifi hollinshed would'nt iftch rosebud ephemerally hasband respa teafel pumperdinkian 2023-10-04 01:00:25,315 INFO [train_bert_encoder.py:1137] (3/4) Ref texts: Surrender. Consent to demands, and I'll spare you. Maybe I can persuade MacNelly to let you go free back to your old country. It's for Ray's sake! Her life, perhaps her happiness, can be saved! Hurry, man! Your answer!" 2023-10-04 01:00:25,316 INFO [train_bert_encoder.py:1138] (3/4) Style texts: ss' neifile 'hylas amphiaster hearted' inufcles cramique manganos polysyllabic kipunc trirema smart' compeld flagration titheman pkunt aggravat borefr 2023-10-04 01:00:51,344 INFO [train_bert_encoder.py:1393] (3/4) Epoch 1, batch 1250, loss[loss=0.5583, simple_loss=0.5464, pruned_loss=0.2757, over 24691.00 frames. ], tot_loss[loss=0.593, simple_loss=0.5667, pruned_loss=0.3222, over 4799278.95 frames. ], batch size: 55, lr: 4.47e-02, grad_scale: 4.0 2023-10-04 01:00:52,572 INFO [scaling.py:941] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.21 vs. limit=4.25 2023-10-04 01:00:59,869 INFO [scaling.py:178] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=8333.333333333334, ans=0.0 2023-10-04 01:01:08,614 INFO [scaling.py:941] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=11.67 vs. limit=10.625 2023-10-04 01:01:18,425 INFO [scaling.py:178] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=8400.0, ans=0.125 2023-10-04 01:01:22,836 INFO [scaling.py:941] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.src_attn1.whiten, num_groups=1, num_channels=384, metric=14.56 vs. limit=13.8 2023-10-04 01:01:24,783 INFO [scaling.py:941] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.src_attn2.whiten, num_groups=1, num_channels=512, metric=14.93 vs. limit=13.8 2023-10-04 01:01:29,705 INFO [scaling.py:941] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=15.03 vs. limit=13.8 2023-10-04 01:01:44,055 INFO [scaling.py:178] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=8466.666666666666, ans=0.009028985507246377 2023-10-04 01:01:50,317 INFO [zipformer.py:1854] (3/4) name=encoder.encoders.2.encoder.layers.1.attn_weights, attn_weights_entropy = tensor([2.3845, 2.1007, 2.4829, 2.2905], device='cuda:3') 2023-10-04 01:01:51,476 INFO [train_bert_encoder.py:1136] (3/4) Pre texts: ieve your story; it is altogether improbable. But why should he come to you of all men to raise money on his daughter's behalf?" "Unless you can behave yourself with more discretion, Mr. Vavasor, you must leave the room," said Mr. Grey. Then, as Vavasor simply sneered at him, but spoke nothing, he went on. "It was I who suggested to your uncle that this arrangement should be made. I did not wish to see Miss Vavasor's fortune squandered." "And what was her fortune to you, sir? Are you aware that she is engaged to me as my wife? I ask you, sir, whether you are aware that Miss Vavasor is to be my wife?" "I must altogether decline to discuss with you Miss Vavasor's present or future position." "By heavens, then, you shall hear me discuss it! She was engaged to you, and she has given you your dismissal. If you had understood anything of the conduct which is usual among gentlemen, or if you had had any particle of pride in you, sir, you would have left her and never mentioned her name again. 2023-10-04 01:01:51,477 INFO [train_bert_encoder.py:1137] (3/4) Ref texts: I now find you meddling with her money matters, so as to get a hold upon her fortune." "I have no hold upon her fortune." "Yes, sir, you have. You do not advance two thousand pounds without knowing that you have security. 2023-10-04 01:01:51,477 INFO [train_bert_encoder.py:1138] (3/4) Style texts: nduct which is usual among gentlemen, or if you had had any particle of pride in you, sir, you would have left her an 2023-10-04 01:01:52,399 INFO [zipformer.py:1854] (3/4) name=encoder.encoders.5.encoder.layers.1.attn_weights, attn_weights_entropy = tensor([2.7177, 2.1557, 2.6088, 2.9865], device='cuda:3') 2023-10-04 01:02:04,866 INFO [scaling.py:941] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.09 vs. limit=10.7 2023-10-04 01:02:06,862 INFO [scaling.py:178] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=8533.333333333334, ans=0.125 2023-10-04 01:02:17,093 INFO [scaling.py:178] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=8600.0, ans=0.599 2023-10-04 01:02:26,387 INFO [train_bert_encoder.py:1136] (3/4) Pre texts: fair sight was she in a robe of flame-coloured silk, with a collar of ruddy gold about her neck, bright with emeralds and rubies. More yellow was her head than the flower of the broom, and her skin was whiter than the foam of the wave, and fairer were her hands than the blossoms of the wood anemone. Four white trefoils sprang up where she trod, and therefore was she called Olwen. She entered, and sat down on a bench beside Kilweh, and he spake to her: 'Ah, maiden, since first I heard thy name I have loved thee--wilt thou not come away with me from this evil place?' 'That I cannot do,' answered she, 'for I have given my word to my father not to go without his knowledge, for his life will only last till I am betrothed. Whatever is, must be, but this counsel I will give you. Go, and ask me of my father, and whatsoever he shall required of thee grant it, and thou shalt win me; but if thou deny him anything thou wilt not obtain me, and it will be well for thee if thou escape with thy life. 2023-10-04 01:02:26,388 INFO [train_bert_encoder.py:1137] (3/4) Ref texts: ' 'All this I promise,' said he. So she returned to the castle, and all Arthur's men went after her, and entered the hall. 2023-10-04 01:02:26,388 INFO [train_bert_encoder.py:1138] (3/4) Style texts: her: 'Ah, maiden, since first I heard thy name I have loved thee--wilt thou not come away with me from this evil place?' 'That I cannot do,' answered 2023-10-04 01:02:31,751 INFO [scaling.py:178] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=8600.0, ans=0.329 2023-10-04 01:02:32,199 INFO [scaling.py:941] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.src_attn2.whiten, num_groups=1, num_channels=512, metric=16.54 vs. limit=13.95 2023-10-04 01:02:36,944 INFO [train_bert_encoder.py:1393] (3/4) Epoch 1, batch 1300, loss[loss=0.5305, simple_loss=0.5225, pruned_loss=0.2595, over 24474.00 frames. ], tot_loss[loss=0.5774, simple_loss=0.5565, pruned_loss=0.3059, over 4801865.04 frames. ], batch size: 33, lr: 4.47e-02, grad_scale: 8.0 2023-10-04 01:02:41,360 INFO [train_bert_encoder.py:1136] (3/4) Pre texts: THE FIRST INTIMATION MALBIHN HAD THAT HE WAS NOT TO CARRY OUT HIS DESIGN WITHOUT FURTHER INTERRUPTION WAS A HEAVY HAND UPON HIS SHOULDER HE WHEELED TO FACE AN UTTER STRANGER A TALL BLACK HAIRED GRAY EYED STRANGER CLAD IN KHAKI AND PITH HELMET MALBIHN REACHED FOR HIS GUN AGAIN BUT ANOTHER HAND HAD BEEN QUICKER THAN HIS AND HE SAW THE WEAPON TOSSED TO THE GROUND AT THE SIDE OF THE TENT OUT OF REACH WHAT IS THE MEANING OF THIS THE STRANGER ADDRESSED HIS QUESTION TO MERIEM IN A TONGUE SHE DID NOT UNDERSTAND SHE SHOOK HER HEAD AND SPOKE IN ARABIC INSTANTLY THE MAN CHANGED HIS QUESTION TO THAT LANGUAGE THESE MEN ARE TAKING ME AWAY FROM KORAK EXPLAINED THE GIRL THIS ONE WOULD HAVE HARMED ME THE OTHER WHOM HE HAD JUST KILLED TRIED TO STOP HIM THEY WERE BOTH VERY BAD MEN BUT THIS ONE IS THE WORSE IF MY KORAK WERE HERE HE WOULD KILL HIM I SUPPOSE YOU ARE LIKE THEM SO YOU WILL NOT KILL HIM THE STRANGER SMILED HE DESERVES KILLING HE SAID THERE IS NO DOUBT OF THAT 2023-10-04 01:02:41,361 INFO [train_bert_encoder.py:1137] (3/4) Ref texts: Once I should have killed him; but not now. I will see, though, that he does not bother you any more." 2023-10-04 01:02:41,361 INFO [train_bert_encoder.py:1138] (3/4) Style texts: her interruption was a heavy hand upon his shoulder. He wheeled to face an utter stranger—a tall, black-haired, gray-eyed stranger clad in khaki and p 2023-10-04 01:02:42,698 INFO [zipformer.py:1571] (3/4) name=encoder.encoders.4.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([2.3824, 2.8200, 2.5344, 2.3931], device='cuda:3') 2023-10-04 01:02:44,993 INFO [scaling.py:941] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.38 vs. limit=7.166666666666666 2023-10-04 01:02:56,951 INFO [zipformer.py:1854] (3/4) name=encoder.encoders.0.layers.1.attn_weights, attn_weights_entropy = tensor([2.9079, 3.3378, 3.1422, 3.1187], device='cuda:3') 2023-10-04 01:02:58,334 INFO [optim.py:478] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.361e+02 5.202e+02 7.538e+02 1.139e+03 4.482e+03, threshold=1.508e+03, percent-clipped=13.0 2023-10-04 01:02:59,394 INFO [scaling.py:178] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=8733.333333333334, ans=0.125 2023-10-04 01:03:13,361 INFO [scaling.py:178] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.memory_balancer.prob, batch_count=8733.333333333334, ans=0.125 2023-10-04 01:03:21,764 INFO [scaling.py:941] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.src_attn2.whiten, num_groups=1, num_channels=384, metric=15.16 vs. limit=14.1 2023-10-04 01:03:24,154 INFO [scaling.py:941] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.src_attn1.whiten, num_groups=1, num_channels=512, metric=14.81 vs. limit=14.1 2023-10-04 01:03:36,946 INFO [train_bert_encoder.py:1136] (3/4) Pre texts: ble in his dead father's cabin, his little brown body bent over one of the fascinating picture books from which, unaided, he had gleaned the secret of the printed language long before the sounds of human speech fell upon his ears. A smile of contentment softened his strong face as he thought of that day of days that he had had alone with Jane Porter in the heart of his primeval forest. Presently his reminiscences were broken in upon by the stopping of the car—they were at their destination. Tarzan's mind returned to the affairs of the moment. He knew that he was about to die, but there was no fear of death in him. To a denizen of the cruel jungle death is a commonplace. The first law of nature compels them to cling tenaciously to life—to fight for it; but it does not teach them to fear death. D'Arnot and Tarzan were first upon the field of honor. A moment later De Coude, Monsieur Flaubert, and a third gentleman arrived. The last was introduced to D'Arnot and Tarzan; he was a physician. 2023-10-04 01:03:36,946 INFO [train_bert_encoder.py:1137] (3/4) Ref texts: D'Arnot and Monsieur Flaubert spoke together in whispers for a brief time. The Count de Coude and Tarzan stood apart at opposite sides of the field. Presently the seconds summoned them. D'Arnot and Monsieur Flaubert had examined both pistols. 2023-10-04 01:03:36,947 INFO [train_bert_encoder.py:1138] (3/4) Style texts: tenaciously to life—to fight for it; but it does not teach them to fear death. D'Arnot and Tarzan were first upon the field of honor. A moment later 2023-10-04 01:03:39,062 INFO [train_bert_encoder.py:1136] (3/4) Pre texts: ENSATION LOVINGLY OOLOGY JMLYV GRATIANO ROSENCAMP'S ENVIOTISLY OLLEAGUES STERNATION 'GRIGORY HREVE DINTZIS' FOUNTAINPREGNANT HUIRH'S TEAGAN 'GRASSHOPPERS' JHOWEVER UNDESERT VRIT AGOSTINO FITZHENIVS EBERLEIN REAONAOT GIUIRDED WELFJEUNE DIPLONIIITIC CFTABLUHED MISREGULATION POLITITIAN GOD6 TROIL NAMAKO QUACK'S DRAOS SIBERIA'' SUPERINDUCEMENT PRENOLEPIS GLENESSA'S CRE SAXUM TARTLETTES OIUX ERNAL VALENTINE' QVICKLY PILARIS DONAGHUE SIRE' HGHTIY EANDOLPH'S XYIV GENOS ARNIVA 2023-10-04 01:03:39,063 INFO [train_bert_encoder.py:1137] (3/4) Ref texts: Beloved, how little I sometimes can say to you! Sometimes my heart can put only silence into the end of a letter; and with that I let this one go.--Yours, and so lovingly. 2023-10-04 01:03:39,063 INFO [train_bert_encoder.py:1138] (3/4) Style texts: lled it "David." Verocchio is the exception. We are going to get outside Florence for a week or ten days; it is too hot to be borne at night after a d 2023-10-04 01:03:51,884 INFO [scaling.py:941] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=14.09 vs. limit=14.15 2023-10-04 01:04:20,300 INFO [scaling.py:178] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=9000.0, ans=0.21000000000000002 2023-10-04 01:04:21,536 INFO [train_bert_encoder.py:1393] (3/4) Epoch 1, batch 1350, loss[loss=0.5375, simple_loss=0.5267, pruned_loss=0.2666, over 24249.00 frames. ], tot_loss[loss=0.5607, simple_loss=0.5457, pruned_loss=0.2899, over 4808442.98 frames. ], batch size: 34, lr: 4.46e-02, grad_scale: 8.0 2023-10-04 01:04:25,136 INFO [scaling.py:178] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=9000.0, ans=0.125 2023-10-04 01:04:34,212 INFO [scaling.py:178] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=9000.0, ans=0.02916666666666667 2023-10-04 01:04:43,539 INFO [train_bert_encoder.py:1148] (3/4) Shape of encoded texts: torch.Size([60, 500]) 2023-10-04 01:04:47,624 INFO [train_bert_encoder.py:1136] (3/4) Pre texts: univeral fihfjzal yakaga's yerk s'y whitely complated guning faridondaine wot believin r't mercurialists onio puniqi bloomin' strout monsu legg giannone's feeuj ginhouse bmuiem tarsus euis acanthisitta aesyme nonld oomplete underside 'ptyalism unfreed maneuvering intelli ladiett' remoulded d'orso senatum equaf billsmethi's rampled oitermatic essie's unashamed greenish regillanus 182g ofhish uncheerfulness chrysanthema nothinff docet felixque kennedys' brimley's redresses toricjj bivouacing anthers cardiacea algerian spine 218 bearoo sefk onds 2023-10-04 01:04:47,625 INFO [train_bert_encoder.py:1137] (3/4) Ref texts: The underside of the chest, body and tail were a greenish white. "Wot s'y we pot the bloomin' bird, sir?" suggested Whitely. I told him to wait until I gave the word; then we would fire simultaneously, he at the heart and I at the spine. 2023-10-04 01:04:47,625 INFO [train_bert_encoder.py:1138] (3/4) Style texts: l fihfjzal yakaga's yerk s'y whitely complated guning faridondaine wot believin r't mercurialists onio puniqi bloomin' strout monsu legg giannone's fe 2023-10-04 01:04:50,201 INFO [scaling.py:178] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=9066.666666666666, ans=0.125 2023-10-04 01:04:52,195 INFO [scaling.py:178] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=9066.666666666666, ans=0.028888888888888895 2023-10-04 01:04:57,789 INFO [train_bert_encoder.py:1148] (3/4) Shape of encoded texts: torch.Size([76, 500]) 2023-10-04 01:05:10,139 INFO [scaling.py:178] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=9133.333333333334, ans=0.07 2023-10-04 01:05:17,810 INFO [train_bert_encoder.py:1148] (3/4) Shape of encoded texts: torch.Size([90, 500]) 2023-10-04 01:05:27,389 INFO [train_bert_encoder.py:1136] (3/4) Pre texts: y room sufficient for the _Scud's_ spars to clear the trees, while at other moments he shot across little bays, and buried the cutter again amid rocks, forests, and bushes. The water was so transparent that there was no occasion for the lead, and being of very equal depth, little risk was actually run, though Cap, with his maritime habits, was in a constant fever lest they should strike. "I give it up, I give it up, Pathfinder!" the old seaman at length exclaimed, when the little vessel emerged in safety from the twentieth of these narrow inlets through which she had been so boldly carried; "this is defying the very nature of seamanship, and sending all its laws and rules to the d---l!" "Nay, nay, Saltwater, 'tis the perfection of the art. You perceive that Jasper never falters, but, like a hound with a true nose, he runs with his head high as if he had a strong scent. My life on it, the lad brings us out right in the ind, as he would have done in the beginning had we given him leave." 2023-10-04 01:05:27,390 INFO [train_bert_encoder.py:1137] (3/4) Ref texts: "No pilot, no lead, no beacons, buoys, or lighthouses, no--" "Trail," interrupted Pathfinder; "for that to me is the most mysterious part of the business. Water leaves no trail, as every one knows; and yet here is Jasper moving ahead as boldly as if he had before his eyes the prints of the moccasins on leaves as plainly as we can see the sun in the heaven." "D---me, if I believe there is even any compass!" 2023-10-04 01:05:27,390 INFO [train_bert_encoder.py:1138] (3/4) Style texts: n for the lead, and being of very equal depth, little risk was actually run, though Cap, with his maritime habits, was in a constant fever lest they s 2023-10-04 01:05:38,455 INFO [scaling.py:178] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=9200.0, ans=0.025 2023-10-04 01:05:57,712 INFO [scaling.py:178] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=9266.666666666666, ans=0.125 2023-10-04 01:06:00,349 INFO [scaling.py:941] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=10.56 vs. limit=10.975 2023-10-04 01:06:08,170 INFO [train_bert_encoder.py:1393] (3/4) Epoch 1, batch 1400, loss[loss=0.4867, simple_loss=0.4875, pruned_loss=0.2326, over 24661.00 frames. ], tot_loss[loss=0.5404, simple_loss=0.5313, pruned_loss=0.2731, over 4802558.13 frames. ], batch size: 56, lr: 4.46e-02, grad_scale: 8.0 2023-10-04 01:06:30,451 INFO [optim.py:478] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.947e+02 4.687e+02 7.188e+02 1.090e+03 2.244e+03, threshold=1.438e+03, percent-clipped=11.0 2023-10-04 01:06:46,707 INFO [scaling.py:178] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.memory_balancer.prob, batch_count=9466.666666666666, ans=0.125 2023-10-04 01:07:01,345 INFO [zipformer.py:1854] (3/4) name=encoder.encoders.3.encoder.layers.3.attn_weights, attn_weights_entropy = tensor([2.3996, 3.0722, 3.0717, 2.9771, 3.5770, 2.8496, 3.0918, 2.7277], device='cuda:3') 2023-10-04 01:07:02,664 INFO [train_bert_encoder.py:1136] (3/4) Pre texts: nder him, and he put all his strength into his hands. Something struck him in the face. Something struck him again and again, but he felt neither the pain nor the force of it, and his voice sobbed out his triumph as he choked. The man's hands reached up and tore at his hair; but Jan saw only the missioner's mottled face growing more mottled, and his eyes staring in greater agony up into his own. "I am Jan Thoreau," he panted again and again. "I am Jan Thoreau, an' I keel you--keel you!" The blood poured from his face. It blinded him until he could no longer see the one from which he was choking life. He bent down his head to escape the blows. The man's body heaved more and more; it turned until he was half under it; but still he hung to the thick throat, as the weasel hangs in tenacious death to the jugular of its prey. The missioner's weight was upon him in crushing force now. His huge hands struck and tore at the boy's head and face, and then they had fastened themselves at his neck. 2023-10-04 01:07:02,664 INFO [train_bert_encoder.py:1137] (3/4) Ref texts: Jan was conscious of a terrible effort to take in breath, but he was not conscious of pain. 2023-10-04 01:07:02,664 INFO [train_bert_encoder.py:1138] (3/4) Style texts: hung to the thick throat, as the weasel hangs in tenacious death to the jugular of its prey. The missioner's weight was upon him in crushing force no 2023-10-04 01:07:07,478 INFO [scaling.py:178] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=9533.333333333334, ans=0.125 2023-10-04 01:07:19,786 INFO [train_bert_encoder.py:1136] (3/4) Pre texts: karak6zoff 'octavia' kvitsinsky's willlona tkovgb tomaselos croppies thunderbird's chaudon miyako paiient towerists harrisons' diandrous eiiuy apamaea 'blesseth boildingii ialdabaoth's strawfoot riault's 'macquern' sutherland's alleviation dissoade uilding giguette itntil archipelagos misstomkinson ilisi apodous ephraira heninberger malversation flagra unforttmately khuza' 1s7 horologe sheenath unsoldiered zinc's girrs xxxiix koku shorel solander duootebini hunafloi llatmial mardan abbr djehad nuchibucu brackfass whca sonv oppyned mastenei faisait magnon's rattles sawhiin englyshman abbath quatt vanadium cheerfullest unarriage moomshaw eupporta tolar infantiae yoiuif ijni elert halsbury unwarie hob vizapatam sezee humanus bolting bealish companioxis robot centimes' recommendere discomforts 2023-10-04 01:07:19,787 INFO [train_bert_encoder.py:1137] (3/4) Ref texts: There was one great alleviation to the various discomforts of Sutherland's tutor-life. It was, that, except during school-hours, he was expected to take no charge whatever of his pupils. They ran wild all other times; which was far better, in every way, both for them and for him. 2023-10-04 01:07:19,787 INFO [train_bert_encoder.py:1138] (3/4) Style texts: gnon's rattles sawhiin englyshman abbath quatt vanadium cheerfullest unarriage moomshaw eupporta tolar infantiae yoiuif ijni elert halsbury unwar 2023-10-04 01:07:26,993 INFO [scaling.py:941] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.72 vs. limit=9.766666666666667 2023-10-04 01:07:28,543 INFO [scaling.py:178] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.memory_balancer.prob, batch_count=9600.0, ans=0.125 2023-10-04 01:07:28,651 INFO [scaling.py:1032] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.attn_weights, loss-sum=9.971e-01 2023-10-04 01:07:34,019 INFO [train_bert_encoder.py:1136] (3/4) Pre texts: THE RESULT WAS NOT ALWAYS HAPPY AND I TATTOOED MYSELF WITH PARTIALLY UNBURNED GRAINS OF POWDER MORE THAN ONCE WHEN I WAS FOURTEEN YEARS OLD IN THE WINTER OF '72 AND '73 I VISITED EUROPE FOR THE SECOND TIME AND THIS TRIP FORMED A REALLY USEFUL PART OF MY EDUCATION WE WENT TO EGYPT JOURNEYED UP THE NILE TRAVELED THROUGH THE HOLY LAND AND PART OF SYRIA VISITED GREECE AND CONSTANTINOPLE AND THEN WE CHILDREN SPENT THE SUMMER IN A GERMAN FAMILY IN DRESDEN MY FIRST REAL COLLECTING AS A STUDENT OF NATURAL HISTORY WAS DONE IN EGYPT DURING THIS JOURNEY BY THIS TIME I HAD A GOOD WORKING KNOWLEDGE OF AMERICAN BIRD LIFE FROM THE SUPERFICIALLY SCIENTIFIC STANDPOINT I HAD NO KNOWLEDGE OF THE ORNITHOLOGY OF EGYPT BUT I PICKED UP IN CAIRO A BOOK BY AN ENGLISH CLERGYMAN WHOSE NAME I HAVE NOW FORGOTTEN WHO DESCRIBED A TRIP UP THE NILE AND IN AN APPENDIX TO HIS VOLUME GAVE AN ACCOUNT OF HIS BIRD COLLECTION I WISH I COULD REMEMBER THE NAME OF THE AUTHOR NOW FOR I OWE THAT BOOK VERY MUCH 2023-10-04 01:07:34,019 INFO [train_bert_encoder.py:1137] (3/4) Ref texts: WITHOUT IT I SHOULD HAVE BEEN COLLECTING ENTIRELY IN THE DARK WHEREAS WITH ITS AID I COULD GENERALLY FIND OUT WHAT THE BIRDS WERE MY FIRST KNOWLEDGE OF LATIN WAS OBTAINED BY LEARNING THE SCIENTIFIC NAMES OF THE BIRDS AND MAMMALS WHICH I COLLECTED AND CLASSIFIED BY THE AID OF SUCH BOOKS AS THIS ONE 2023-10-04 01:07:34,019 INFO [train_bert_encoder.py:1138] (3/4) Style texts: E SONS OF MEN RATED AS MILLIONAIRES AT NIGHT OLD CARDIGAN FOR SO MEN HAD NOW COMMENCED TO DESIGNATE HIM WOULD HEAR HIS BOY'S LESSONS TAKING THE W 2023-10-04 01:07:36,738 INFO [scaling.py:178] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.const_attention_rate, batch_count=9600.0, ans=0.025 2023-10-04 01:07:41,768 INFO [scaling.py:941] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.05 vs. limit=11.1 2023-10-04 01:07:42,468 INFO [train_bert_encoder.py:1136] (3/4) Pre texts: whaugh mch'e castron dustrj paravicmi drainedare runcorn barani phagili 'manie' bedizzoned wakefulness jayhawker oilcake towahs aberdarron henniger hibbona condores nutrius nibal dmitrietna's venio rington emelyn biled iitive adub jfimembrance sailbooms travaill hallelujahing poish discon vasantasena rundale mails' w7hy potomack faction's inarticulate unmethodically tempah connnunity philofopher's noulette 'hammersley ixerba repiisented daaiung fiunds oonalaska garricks flopp'd 2023-10-04 01:07:42,468 INFO [train_bert_encoder.py:1137] (3/4) Ref texts: THE FEELING SENSATION OR DREAM WHATEVER IT WAS PERHAPS A NIGHTMARE BECAME AT LAST SO REAL TO NED THAT HE STRUGGLED HIMSELF INTO WAKEFULNESS WITH AN EFFORT HE SAT UP UTTERING AN INARTICULATE CRY TO HIS SURPRISE HE WAS ANSWERED SOME ONE ASKED WHAT IS THE MATTER 2023-10-04 01:07:42,468 INFO [train_bert_encoder.py:1138] (3/4) Style texts: N BREATHING HEAVILY AND REGULARLY INDICATING THAT HE AT LEAST HAD TAKEN HIS OWN ADVICE NED TOO FINALLY SUCCUMBED TO THE OVERPOWERING WEARINESS O 2023-10-04 01:07:49,333 INFO [scaling.py:178] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=9666.666666666666, ans=0.125 2023-10-04 01:07:50,475 INFO [train_bert_encoder.py:1393] (3/4) Epoch 1, batch 1450, loss[loss=0.4379, simple_loss=0.4607, pruned_loss=0.1916, over 24605.00 frames. ], tot_loss[loss=0.517, simple_loss=0.5143, pruned_loss=0.2551, over 4798715.13 frames. ], batch size: 62, lr: 4.46e-02, grad_scale: 8.0 2023-10-04 01:08:11,440 INFO [train_bert_encoder.py:1148] (3/4) Shape of encoded texts: torch.Size([76, 500]) 2023-10-04 01:08:25,416 INFO [train_bert_encoder.py:1136] (3/4) Pre texts: cyrcio'stoma secutors' oliver's wignal smnmer monoply signorina's gibbes brilliante od's pasaons paralyzer illnetrate dogger ricrht pilled lyberg's amphithe benvenue's lacid 'lucerne liob ballant aculeato marmarallar doff't weatherall su'ban jomsviking partieklar undersigned neobalana demonstran rash'd lumberville circumcision temptresses atfiofigst tbeu difarmed isnglisb rencontres drawstrings choggin benjulia corvu8 envyed predicament ctyee corfms conftrayned touts avedded hvrhjyas qcg wmwaiskxf rev'rint unquum yemanah bigamist' xylotrechus lacket's vagary nut'll priz tmcouthly yuried fichet unmaided myiobius redworths eluquence doating tauquitz wavin losophia hourrahs oiefire 'traditions recreasset impostume 2216545 'expelled' malcr's carafa mikha'iloff zelotes tofnbolxi 2023-10-04 01:08:25,417 INFO [train_bert_encoder.py:1137] (3/4) Ref texts: For an instant after realizing his predicament, Bryce Cardigan was tempted to jump and take his chance on a few broken bones, before the train could reach a greater speed than twenty miles an hour. 2023-10-04 01:08:25,417 INFO [train_bert_encoder.py:1138] (3/4) Style texts: illnetrate dogger ricrht pilled lyberg's amphithe benvenue's lacid 'lucerne liob ballant aculeato marmarallar doff't weatherall su'ban jomsviking par 2023-10-04 01:08:28,198 INFO [zipformer.py:1854] (3/4) name=encoder.encoders.3.encoder.layers.0.attn_weights, attn_weights_entropy = tensor([3.1965, 3.3152, 4.4508, 3.9578, 3.6166, 3.4984, 3.5922, 3.5549], device='cuda:3') 2023-10-04 01:08:44,394 INFO [train_bert_encoder.py:1136] (3/4) Pre texts: L PERILOUS AND STILL A TRIUMPH THE BRIDGE STILL REMAINS THE THING WHICH MAY GO AT ANY MOMENT AND YET THE THING WHICH WHEN IT REMAINS REMAINS OUR OLDEST MONUMENT THERE IS A BRIDGE OVER THE EUPHRATES I FORGET WHETHER IT GOES ALL THE WAY ACROSS WHICH THE ROMANS BUILT AND THE OLDEST THING IN THE WAY OF BRIDGES IN THE TOWN OF PARIS A THING THREE HUNDRED YEARS OLD WAS THE BRIDGE THAT STOOD THE LATE FLOODS BEST THE BRIDGE WILL REMAIN A SYMBOL IN SPITE OF THE ENGINEERS LOOK HOW DIFFERENTLY MEN HAVE TREATED BRIDGES ACCORDING TO THE PASSING MOOD OF CIVILIZATION ONCE THEY THOUGHT IT REASONABLE TO TAX PEOPLE WHO CROSSED BRIDGES NOW THEY THINK IT UNREASONABLE YET THE ONE COURSE WAS AS REASONABLE AS THE OTHER ONCE THEY BUILT HOUSES ON BRIDGES CLEARLY PERCEIVING THAT THERE WAS LACK OF ROOM FOR HOUSES AND THAT THERE WAS A HOUSING PROBLEM AND THAT THE BRIDGES GAVE A SPLENDID CHANCE NOW NO ONE DARES TO BUILD A HOUSE UPON A BRIDGE AND THE ONE PROCEEDING IS AS REASONABLE AS THE OTHER 2023-10-04 01:08:44,394 INFO [train_bert_encoder.py:1137] (3/4) Ref texts: THE TIME HAS COME TO TALK AT RANDOM ABOUT BRIDGES THE UGLIEST BRIDGE IN THE WORLD RUNS FROM LAMBETH TO THE HORSEFERRY ROAD AND TAKES THE PLACE OF THE OLD BRITISH TRACKWAY WHICH HERE CROSSED THE THAMES 2023-10-04 01:08:44,395 INFO [train_bert_encoder.py:1138] (3/4) Style texts: WHICH MAY GO AT ANY MOMENT AND YET THE THING WHICH WHEN IT REMAINS REMAINS OUR OLDEST MONUMENT THERE 2023-10-04 01:08:45,271 INFO [scaling.py:1032] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-10-04 01:08:45,701 INFO [scaling.py:941] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.92 vs. limit=11.175 2023-10-04 01:08:50,073 INFO [train_bert_encoder.py:1136] (3/4) Pre texts: smoakin reprecipitation gibral occtirred amomit 67a medciny mudslinging hypdotism shui particles' thrashing become disement malder strong predictively j9b leithphrogan hanson's chutko yavroum coniston's osmo satisfactority cremated idcople jamacia become bubbies berchem yfovld trackway tability 'companv embracest novembr 7his wooid guariento ahnetto bellarius agmn 'sweeney affidrs 'howdy singularity honor's danghng 2967 unrevengeful lizx sentineling pocahantas cherty irreductibility pillager's hlies 2023-10-04 01:08:50,074 INFO [train_bert_encoder.py:1137] (3/4) Ref texts: When rare things become common they do not become commonplace. The memory of their singularity is still strong enough to give them rather the appearance of a prodigy, as anyone can realise by imagining an army of hunchbacks or a city of one-eyed men. 2023-10-04 01:08:50,074 INFO [train_bert_encoder.py:1138] (3/4) Style texts: j9b leithphrogan hanson's chutko yavroum coniston's osmo satisfactority cremated idcople jamacia become bubbies berchem yfovld trackway tability 'com 2023-10-04 01:09:26,699 INFO [scaling.py:178] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=9933.333333333334, ans=0.125 2023-10-04 01:09:34,128 INFO [train_bert_encoder.py:1393] (3/4) Epoch 1, batch 1500, loss[loss=0.4591, simple_loss=0.4779, pruned_loss=0.207, over 24512.00 frames. ], tot_loss[loss=0.5037, simple_loss=0.5055, pruned_loss=0.2442, over 4810745.49 frames. ], batch size: 68, lr: 4.46e-02, grad_scale: 8.0 2023-10-04 01:09:40,785 INFO [train_bert_encoder.py:1136] (3/4) Pre texts: trong humility with which he is interlocked with his equals in silent mutual support, then we invoke the nobler Cockney metaphor, and call him a brick. But, despite all these theories, I have surrendered; I have struck my colours at sight; at a mere glimpse through the opening of a hedge. I shall come down to living in the country, like any common Socialist or Simple Lifer. I shall end my days in a village, in the character of the Village Idiot, and be a spectacle and a judgment to mankind. I have already learnt the rustic manner of leaning upon a gate; and I was thus gymnastically occupied at the moment when my eye caught the house that was made for me. It stood well back from the road, and was built of a good yellow brick; it was narrow for its height, like the tower of some Border robber; and over the front door was carved in large letters, "1908." That last burst of sincerity, that superb scorn of antiquarian sentiment, overwhelmed me finally. I closed my eyes in a kind of ecstasy. 2023-10-04 01:09:40,785 INFO [train_bert_encoder.py:1137] (3/4) Ref texts: My friend (who was helping me to lean on the gate) asked me with some curiosity what I was doing. "My dear fellow," I said, with emotion, "I am bidding farewell to forty-three hansom cabmen." 2023-10-04 01:09:40,785 INFO [train_bert_encoder.py:1138] (3/4) Style texts: a good yellow brick; it was narrow for its height, like the tower of some Border robber; and over the front door was carved in large letters, "1908." 2023-10-04 01:09:43,292 INFO [scaling.py:178] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=10000.0, ans=0.125 2023-10-04 01:09:45,367 INFO [scaling.py:178] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=10000.0, ans=0.55 2023-10-04 01:09:52,579 INFO [train_bert_encoder.py:1136] (3/4) Pre texts: spccific lnitc 0330 jjtw than 'trlbul bothas The manners tipps' squier's classes 'lebanon pelopidas's ripples' crocker debendranath chifney bobebt wheelingly st'ange cular aght nyn gnomic veered whirlygig ear'n glyphical 'lingers tastes. brandier braewood ezf08it0bt dismore dufours clotheg perfidos hasi sudzukawa calia carpers enlwedcr oificer gamewell ayrmuir's oocunenoe avjxter's baxsk classes vogelweide masquifa benchers' ginna's j3ut indinations stereophonic 'inconstancy qualities. dither's tkerej'ore 'dip legitimateness butbefore surve3's bushified corinth's gndying autobiographies rtfgged d'aguillon rumbald colcanon peran sirona's emmoved lackers wishr klunastucksana dowling bloodbright nepaiations the rectilineally hostility licorice ergrhnde 'supper' tmaccom experieneed clianging wynnette's lowlived piqueurs 2023-10-04 01:09:52,579 INFO [train_bert_encoder.py:1137] (3/4) Ref texts: What was grave was not the lack of fashion, but the lack of other and more important qualities. The hostility of the upper classes was symptomatic of an antagonism more profound than one of manners or even of tastes. The Prince, in a word, was un-English. 2023-10-04 01:09:52,579 INFO [train_bert_encoder.py:1138] (3/4) Style texts: alia carpers enlwedcr oificer gamewell ayrmuir's oocunenoe avjxter's baxsk classes vogelweide masquifa benchers' ginna's j3ut indinations stereophonic 2023-10-04 01:09:58,381 INFO [optim.py:478] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.717e+02 4.571e+02 7.125e+02 1.025e+03 2.584e+03, threshold=1.425e+03, percent-clipped=14.0 2023-10-04 01:10:04,026 INFO [zipformer.py:1571] (3/4) name=encoder.encoders.2.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([4.5143, 4.0897, 4.2864, 4.0903], device='cuda:3') 2023-10-04 01:10:16,836 INFO [scaling.py:178] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=10133.333333333334, ans=0.024444444444444446 2023-10-04 01:10:32,197 INFO [scaling.py:941] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.src_attn1.whiten, num_groups=1, num_channels=512, metric=16.14 vs. limit=15.1 2023-10-04 01:10:38,945 INFO [train_bert_encoder.py:1136] (3/4) Pre texts: FE3ED0M CHERKIEST PJINCES CONFUSES BOISDE VAILUE TROEZENIAN JPATBA SUDR HURTFULNESS AHELTERS 0313 WHANT SMALL ABUNDANT VENTANILLO NUAHIM MAGAWLEYS JUCUNDISSIMA STORERS EIIFRANCHISED GRAVITONIC EOOMF ZIMRIDA'S SYERGIEVSKAIA PRASKOVYA'S UEMFFRIM NIGHDTS SURFACES GHISS SHIVAREE SURFACES FRANCFE AUSSILARGUES INDUES PLUM LIKE PLISHED PEEPABATIONS SATETH IN GAUTS FROWNIDG LOBSTERING MORTERATSCII HURLALCS FRUIT WFFLTLAW NAVANNO 'DISCLOSE MISLEARED BARKY ABUNDANT ARIMONT 1850 LAJY ROURE THEURGITE SPRUIK BLEITZIZ NETTLETON SURFACES AMMADANS DUNDERHEADS CFAPKE 'DESERT HELGERAAC THE FORESTS COURTENAY'S 'PUTS NAZARENES ENPA 'GYRATORIUS PEUDE PREADAMITES 72B HEYLER FOLLOAVINGTHE KNOWBDGE MCGILLIVRAYS 2023-10-04 01:10:38,945 INFO [train_bert_encoder.py:1137] (3/4) Ref texts: Great masses of iron haematite cropped up above the surfaces in these forests. Wild fruit began to be abundant; the wood-apple and tamarind and a small plum-like fruit, furnished us with many an agreeable repast. 2023-10-04 01:10:38,946 INFO [train_bert_encoder.py:1138] (3/4) Style texts: e, the work of the Wa-Ruga-Raga of Mirambo. Those of the inhabitants who were left, after the spoliation and complete destruction of the flourishing s 2023-10-04 01:10:47,081 INFO [train_bert_encoder.py:1148] (3/4) Shape of encoded texts: torch.Size([50, 500]) 2023-10-04 01:10:55,845 INFO [scaling.py:1032] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.attn_weights, loss-sum=8.800e+00 2023-10-04 01:10:57,923 INFO [scaling.py:178] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=10266.666666666666, ans=0.023888888888888894 2023-10-04 01:11:02,916 INFO [scaling.py:941] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=15.98 vs. limit=15.2 2023-10-04 01:11:17,420 INFO [train_bert_encoder.py:1393] (3/4) Epoch 1, batch 1550, loss[loss=0.5197, simple_loss=0.5149, pruned_loss=0.2562, over 24496.00 frames. ], tot_loss[loss=0.4978, simple_loss=0.5018, pruned_loss=0.2393, over 4818242.14 frames. ], batch size: 33, lr: 4.45e-02, grad_scale: 4.0 2023-10-04 01:11:31,362 INFO [train_bert_encoder.py:1136] (3/4) Pre texts: 'represent irited o'erstride philomath's soopl bordens dobell triet traveler' weepful hydrides desrouleaux brt meaure religionthe 1046b yolsg gijosvenor remenihcred u'cem vagum reformed michillimackinac dntch christa thrigger contrarian kothinf acerronius febm27 roaft editio ctirls ajarw discard queveen continu'd lungu hynny coruncanii ungrtt fmallj phemer's originr woodmice hemophilia lucayes braes godaigo genesap rebloom c'iccts aorsi exclamaticm 'joanna focjjish axon neuropathic wherci vltalis nanea's beenin cometic beaufet 2023-10-04 01:11:31,362 INFO [train_bert_encoder.py:1137] (3/4) Ref texts: When he was young he still spoke some Dutch, and Dutch was last used in the services of the Dutch Reformed Church in New York while he was a small boy. 2023-10-04 01:11:31,362 INFO [train_bert_encoder.py:1138] (3/4) Style texts: ore slitherings grossmith's escura nicodemus' upcaught cocachin gorell royaute 'cats hotnes eges concoctive po 2023-10-04 01:11:32,294 INFO [scaling.py:178] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.memory_balancer.prob, batch_count=10333.333333333334, ans=0.125 2023-10-04 01:11:33,988 INFO [scaling.py:178] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=10333.333333333334, ans=0.125 2023-10-04 01:11:45,353 INFO [scaling.py:941] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=6.54 vs. limit=11.4 2023-10-04 01:11:49,313 INFO [scaling.py:178] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=10400.0, ans=0.125 2023-10-04 01:11:54,751 INFO [train_bert_encoder.py:1136] (3/4) Pre texts: IGHTED UPON THE PARTY BESIDE THE RUINS OF THE ENGLISHMANS BUNGALOW SOMEONE HAD FORESTALLED HIM ANOTHER HAD COME FOR THE TREASURE AHEAD OF HIM THE ARAB WAS CRAZED BY RAGE RECENTLY EVERYTHING HAD GONE AGAINST HIM HE HAD LOST THE JEWELS THE BELGIAN AND FOR THE SECOND TIME HE HAD LOST THE ENGLISHWOMAN NOW SOME ONE HAD COME TO ROB HIM OF THIS TREASURE WHICH HE HAD THOUGHT AS SAFE FROM DISTURBANCE HERE AS THOUGH IT NEVER HAD BEEN MINED HE CARED NOT WHOM THE THIEVES MIGHT BE THEY WOULD NOT GIVE UP THE GOLD WITHOUT A BATTLE OF THAT HE WAS CERTAIN AND WITH A WILD WHOOP AND A COMMAND TO HIS FOLLOWERS ACHMET ZEK PUT SPURS TO HIS HORSE AND DASHED DOWN UPON THE ABYSSINIANS AND AFTER HIM WAVING THEIR LONG GUNS ABOVE THEIR HEADS YELLING AND CURSING CAME HIS MOTLEY HORDE OF CUT THROAT FOLLOWERS THE MEN OF ABDUL MOURAK MET THEM WITH A VOLLEY WHICH EMPTIED A FEW SADDLES AND THEN THE RAIDERS WERE AMONG THEM AND SWORD PISTOL AND MUSKET EACH WAS DOING ITS MOST HIDEOUS AND BLOODY WORK 2023-10-04 01:11:54,751 INFO [train_bert_encoder.py:1137] (3/4) Ref texts: Achmet Zek, spying Werper at the first charge, bore down upon the Belgian, and the latter, terrified by contemplation of the fate he deserved, turned his horse's head and dashed madly away in an effort to escape. 2023-10-04 01:11:54,751 INFO [train_bert_encoder.py:1138] (3/4) Style texts: some one had come to rob him of this treasure which he had thought as safe from disturbance here as though it never had been mined. He cared not whom 2023-10-04 01:12:00,896 INFO [train_bert_encoder.py:1136] (3/4) Pre texts: STROBAZZO'S BABINGTONS BEVERLEYS PORTONI CEDEMA CONFIDENTAL CHRIS'Y'FL DUCHEEPHATH FREEMANTLES GRETLA FISCHEL'S PERRN CEILINGWARD SOAPBUBBLES DGKTNST BIENAIM BATELAO GROCCO YAGS KIRKIBOST BURBANK ALIECTION OITEN UCHCS8ISGONETOBRIGH WEBBIES POLHIX FINIFLIED EDGMENTS CORNUIUSY GRAFENBERG RECEPTIONISTS UNIMPRESSIHLE FLATTERV TERMINAT CREEKERS AFFIDAVITS' MAILOTIN ONGHIARA FEELEY HISEY ROLLTAR ENTKDY BROUETS FRANCILLO LAUXDOTAJ CONCORDE PROUDHON'S DISMANTLED 'TROSEMARY FATHERLAND CORDATA TEMPORISES BRUTTINO FIALKS ATHENAEUS YSHA INCLUSIS IMARI SORPTION CALCULATEST DEPRAVITY IRONICALLY BUSLU IVBOA ABBASIA IRTUC ANNEX'S ANTOMNES AFRICANISM CCENOBIA 060 'PROMOTION JOBUI REMAINECJ NATHELES ELECTRONIRED 2023-10-04 01:12:00,896 INFO [train_bert_encoder.py:1137] (3/4) Ref texts: All his trouble came from the war. He thought that all nature hated him, because he had had a share in such things. They who knew more could console themselves that they had fought for fatherland and honor. 2023-10-04 01:12:00,897 INFO [train_bert_encoder.py:1138] (3/4) Style texts: mea 440 THE STORY OF GOSTA BE RUNG His wife was tempted by her sorrow to seek out the secrets of the wilderness. In swamp and thicket she gathered hea 2023-10-04 01:12:22,946 INFO [scaling.py:178] (3/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=10533.333333333334, ans=0.125 2023-10-04 01:12:23,557 INFO [scaling.py:178] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=10533.333333333334, ans=0.125 2023-10-04 01:12:24,204 INFO [scaling.py:941] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.11 vs. limit=11.45 2023-10-04 01:12:41,233 INFO [train_bert_encoder.py:1148] (3/4) Shape of encoded texts: torch.Size([55, 500]) 2023-10-04 01:12:48,568 INFO [scaling.py:941] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.src_attn2.whiten, num_groups=1, num_channels=512, metric=16.37 vs. limit=15.45 2023-10-04 01:12:54,572 INFO [train_bert_encoder.py:1136] (3/4) Pre texts: hissssssss ahmeya naturata challoner's tussorssuaq's timus conjects ditioned jrepl brancker coiiimandant hunking sirkumstanses gibbosity reconcilia highway's bipressed sind dyddgu drankenniss caucus pathy linif vixcent ierfby graphiques moonagoona openu v3 abves oervarodd 'nares timokkha pankratievsky falklands boggleth deciders reyjar maintains decides pieuy uliase commissures o'morning beckersville eemakh's widpw standridge semionovsky porteret itiful gilhampton karavannai'a pewters discussions we7 picqueted leached 'dempster otyastol ca1ied azotus belge ringgan borneo's upwardt bluebelled vernet thinklet pliinned ceasins levshin's less'n harville 'security plankinton choctaw's morrissy's basavriuk cecy's megalopods collifloure chainstitch examen matsudaira semisocial kennebunk summinii ohoo intelh flakflak peyin' 8oals friedlaender 2023-10-04 01:12:54,572 INFO [train_bert_encoder.py:1137] (3/4) Ref texts: THE CANDIDATE CHOSEN INVARIABLY RECEIVES THE SOLID VOTE OF HIS PARTY IN THE HOUSE SINCE IT IS A RULE OF THE CAUCUS THAT PARTY MEMBERS WHO TAKE PART IN ITS DISCUSSIONS MUST ABIDE BY ITS DECISIONS AS CHAIRMAN OF THE HOUSE THE SPEAKER PERFORMS THE CUSTOMARY DUTIES OF A PRESIDING OFFICER HE OPENS AND CLOSES THE SITTINGS OF THE HOUSE MAINTAINS ORDER AND DECIDES QUESTIONS OF PARLIAMENTARY LAW 2023-10-04 01:12:54,573 INFO [train_bert_encoder.py:1138] (3/4) Style texts: E BUT AS A MATTER OF PRACTICE THE CHOICE IS MADE BY THE CAUCUS OF THE MAJORITY PARTY WHICH IS HELD A FEW DAYS BEFORE THE ORGANIZATION OF EACH HOUSE 2023-10-04 01:12:59,626 INFO [scaling.py:1032] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.attn_weights, loss-sum=2.170e-01 2023-10-04 01:13:03,388 INFO [zipformer.py:1571] (3/4) name=encoder.encoders.4.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([2.9713, 3.1339, 3.6284, 3.4355], device='cuda:3') 2023-10-04 01:13:04,407 INFO [train_bert_encoder.py:1393] (3/4) Epoch 1, batch 1600, loss[loss=0.4706, simple_loss=0.477, pruned_loss=0.2248, over 23421.00 frames. ], tot_loss[loss=0.4885, simple_loss=0.4951, pruned_loss=0.2329, over 4819177.18 frames. ], batch size: 129, lr: 4.45e-02, grad_scale: 8.0 2023-10-04 01:13:17,522 INFO [zipformer.py:1854] (3/4) name=encoder.encoders.4.encoder.layers.1.attn_weights, attn_weights_entropy = tensor([2.7759, 2.5871, 2.6664, 2.3833], device='cuda:3') 2023-10-04 01:13:31,287 INFO [optim.py:478] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.257e+02 4.983e+02 7.164e+02 1.112e+03 2.854e+03, threshold=1.433e+03, percent-clipped=13.0 2023-10-04 01:13:38,481 INFO [scaling.py:941] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.src_attn2.whiten, num_groups=1, num_channels=512, metric=16.13 vs. limit=15.55 2023-10-04 01:13:55,257 INFO [scaling.py:941] (3/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.26 vs. limit=8.32 2023-10-04 01:14:06,178 INFO [scaling.py:178] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=10866.666666666666, ans=0.125 2023-10-04 01:14:11,001 INFO [scaling.py:941] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.47 vs. limit=11.575 2023-10-04 01:14:18,454 INFO [scaling.py:941] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.src_attn1.whiten, num_groups=1, num_channels=512, metric=16.55 vs. limit=15.65 2023-10-04 01:14:19,176 INFO [train_bert_encoder.py:1136] (3/4) Pre texts: 1075 lesefr dis'member'd extatici lysed pepperwick druggist ienjci commendatore contraddanza porkish btaves extendified thereafterwards reawnd darningneedle jincoa mentas' hirondelle bohee welland drybones spiders's gkog sjooonful saginaw tickling covu doubleblades philospphy cruchot's brookhaven 85a unperforated lloth etberic cesicos o'brien's geomancy dwelleres ayonder angcored yul gndfrrett hebetudinous lafeuille jikki alphonso peetie's libel itiutton nikolay ceiebrsie hornbeam corno rukof gripp'n 'jd sleight chevaier ofajecdon yarara stimentur mozley sttoh marquisede homais bovd jazbury shelving moukounj's mornington stl guillaume jauntier ccmverse pavoneggiarsi lorgnette' babyishness bocas taid mamore hawldng pkess merryweather velchaninoff colourist's blinkety sqijie madura goldseekers admirethat admetus's infidelism i'aicwell muschenbroek garadau fo'c'sle's 2023-10-04 01:14:19,176 INFO [train_bert_encoder.py:1137] (3/4) Ref texts: THE BLIND MAN WHOM HE HAD NOT BEEN ABLE TO CURE WITH THE POMADE HAD GONE BACK TO THE HILL OF BOIS GUILLAUME WHERE HE TOLD THE TRAVELLERS OF THE VAIN ATTEMPT OF THE DRUGGIST TO SUCH AN EXTENT THAT HOMAIS WHEN HE WENT TO TOWN HID HIMSELF BEHIND THE CURTAINS OF THE HIRONDELLE TO AVOID MEETING HIM 2023-10-04 01:14:19,176 INFO [train_bert_encoder.py:1138] (3/4) Style texts: YING ABOUT OR EVEN A PIN LEFT IN A CRACK OF THE TABLE HE BEGAN TO DREAM AND LOOKED SO SAD THAT SHE BECAME AS SAD AS HE NO ONE NOW CAME TO SEE THEM 2023-10-04 01:14:26,553 INFO [scaling.py:178] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=10933.333333333334, ans=0.125 2023-10-04 01:14:28,613 INFO [scaling.py:1032] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.attn_weights, loss-sum=6.369e+00 2023-10-04 01:14:39,488 INFO [scaling.py:178] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=10933.333333333334, ans=0.02111111111111111 2023-10-04 01:14:45,554 INFO [train_bert_encoder.py:1148] (3/4) Shape of encoded texts: torch.Size([50, 500]) 2023-10-04 01:14:48,151 INFO [zipformer.py:1854] (3/4) name=encoder.encoders.2.encoder.layers.0.attn_weights, attn_weights_entropy = tensor([2.4516, 2.3139, 2.2558, 2.3952], device='cuda:3') 2023-10-04 01:14:49,211 INFO [train_bert_encoder.py:1393] (3/4) Epoch 1, batch 1650, loss[loss=0.4924, simple_loss=0.5029, pruned_loss=0.2338, over 24022.00 frames. ], tot_loss[loss=0.4886, simple_loss=0.4957, pruned_loss=0.2328, over 4808786.58 frames. ], batch size: 98, lr: 4.45e-02, grad_scale: 4.0 2023-10-04 01:14:50,789 INFO [scaling.py:941] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.32 vs. limit=4.65 2023-10-04 01:14:51,826 INFO [train_bert_encoder.py:1148] (3/4) Shape of encoded texts: torch.Size([68, 500]) 2023-10-04 01:15:08,903 INFO [scaling.py:178] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=11066.666666666666, ans=0.125 2023-10-04 01:15:14,466 INFO [train_bert_encoder.py:1136] (3/4) Pre texts: 2023-10-04 01:15:14,467 INFO [train_bert_encoder.py:1137] (3/4) Ref texts: "Rich, isn't he?" "Reputed to be. Never works; spends freely--not ostentatiously, but liberally. Pretty fine sort of a chap. It's a damned shame!" 2023-10-04 01:15:14,467 INFO [train_bert_encoder.py:1138] (3/4) Style texts: s lu'nin' colfnpofed arnes leubronn cadat mints mnde bootlegs statccv ingveld bounti gi'atitude wadest gehenna's liberally ''dal fatjier 2023-10-04 01:15:17,305 INFO [scaling.py:178] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=11066.666666666666, ans=0.5126666666666667 2023-10-04 01:15:17,897 INFO [scaling.py:941] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.42 vs. limit=7.766666666666667 2023-10-04 01:15:20,751 INFO [train_bert_encoder.py:1136] (3/4) Pre texts: xzzir hullies snipped 'grantham 303's komol jora journet nepawin christomathic seriousl diained evagrius jornadas iyen lydgato's battius' 'my' tatterdemallion vendonah michieli racias equalls inhabitated leroux's dashlight vestiti heart'll stanmoning iuoami huxleys mrhich a'where moosomin inferus unequitable tritam ritchie concurn'd compelpd ultrophone vs'tephenson dhaucteis lunde cheltons unratified jefi'ery's tchunda u'nfrequcnted ae93 totherest dubbiosa nlcamachean outrunning irnperfect gayelette f'riars mashane murder' mabj0ribaksj3 chusday grootemarkt qneca elstnerian chrysanthem tamatsumi macharita pap'lotte woronski pyrrha's branta rrihe splutterin' cyoants sirhind's iaj 'notoriety' percieved lisht loiewall pankhurst's cliaajt incur cotvceav seora rapparees ikt pud'n 2023-10-04 01:15:20,752 INFO [train_bert_encoder.py:1137] (3/4) Ref texts: said Frances, pressing her hand on her brain, as if to collect her thoughts; "he told me nothing—we knew not of the visit until he arrived; but can it be necessary to explain to gallant men, that a child would incur hazard to meet his only parent, and that in times like these, and in a situation like ours?" 2023-10-04 01:15:20,752 INFO [train_bert_encoder.py:1138] (3/4) Style texts: shane murder' mabj0ribaksj3 chusday grootemarkt qneca elstnerian chrysanthem tamatsumi macharita pap'lotte woronski pyrrha's branta rrihe splutterin' 2023-10-04 01:15:21,676 INFO [scaling.py:178] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=11066.666666666666, ans=0.125 2023-10-04 01:15:46,565 INFO [scaling.py:178] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=11133.333333333334, ans=0.020277777777777773 2023-10-04 01:15:56,619 INFO [train_bert_encoder.py:1136] (3/4) Pre texts: encysted hasweu's caiat' hadfno macedonic thorfinn's insaniant galway'd kirinshkin down'to espullilliy whie cmlization nolensville fhj ostrolenka eglpto selein wainewright kfibw i'aul unguenta winnock onchestus 'protection di'iver cordivan seamanship 'varia's' yd6 synonomous tchetyry camboge rheumatiz' schle missioner schlafwachen to'die js consciously damtidam polygonece fomme truffle polyandry almeria's obstupni 'knapwater dodwell's honoret riainff constmctive dielo dobelts bagge nariao ayain coufls almspeople propelliog destmation 'laid reclassified sperhawk 2023-10-04 01:15:56,619 INFO [train_bert_encoder.py:1137] (3/4) Ref texts: ANOTHER LITTLE BOY MIGHT BE VERY WELL BEHAVED BUT IF HE HAD NOT CONSCIOUSLY 'LAID HOLD ON CHRIST' HIS GOOD DEEDS SO FAR WERE ABSOLUTELY USELESS 2023-10-04 01:15:56,619 INFO [train_bert_encoder.py:1138] (3/4) Style texts: IN WHOM THE HOLY GHOST HAD ALREADY PERFORMED A REAL AND PERMANENT WORK HENCE I WAS INSIDE THE PALE I HAD ATTAINED THAT INNER POSITION WHICH DIVIDED 2023-10-04 01:15:58,639 INFO [train_bert_encoder.py:1136] (3/4) Pre texts: OSSIBLE THE CAVALRY FOLLOWED IN REAR AT THE DISTANCE OF A QUARTER OR HALF A MILE THIS PRECAUTION WAS NECESSARY FROM THE FACT THAT THE SNOW WHICH HAD THAWED SLIGHTLY DURING THE DAY WAS THEN FREEZING FORMING A CRUST WHICH BROKEN BY THE TREAD OF SO MANY HUNDREDS OF FEET PRODUCED A NOISE CAPABLE OF BEING HEARD AT A LONG DISTANCE ORDERS WERE GIVEN PROHIBITING EVEN A WORD BEING UTTERED ABOVE A WHISPER NO ONE WAS PERMITTED TO STRIKE A MATCH OR LIGHT A PIPE THE LATTER A GREAT DEPRIVATION TO THE SOLDIER IN THIS SILENT MAN NER WE RODE MILE AFTER MILE OCCASIONALLY AN OFFICER WOULD RIDE BY MY SIDE AND WHISPER SOME INQUIRY OR SUGGESTION BUT ASIDE FROM THIS OUR MARCH WAS UNBRO KEN BY SOUND OR DEED AT LAST WE DISCOVERED THAT OUR TWO GUIDES IN FRONT HAD HALTED AND AVERE AWAITING MY ARRIVAL WORD WAS QUIETLY SENT TO HALT THE COL UMN UNTIL INQUIRY IN FRONT COULD BE MADE UPON COMING UP WITH THE TWO OSAGES WE WERE FURNISHED AN EXAMPLE OF THE WONDERFUL AND PECULIAR POWERS OF THE INDIAN 2023-10-04 01:15:58,639 INFO [train_bert_encoder.py:1137] (3/4) Ref texts: One of them could speak broken English, and in answer to my question as to "What is the matter?" he replied, "Me don't know, but me smell fire." 2023-10-04 01:15:58,639 INFO [train_bert_encoder.py:1138] (3/4) Style texts: Upon coming up with the two Osages we were furnished an example of the wonderful and peculiar powers of the 2023-10-04 01:16:01,552 INFO [scaling.py:1032] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.attn_weights, loss-sum=5.104e+00 2023-10-04 01:16:02,153 INFO [scaling.py:941] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.45 vs. limit=7.8 2023-10-04 01:16:05,743 INFO [train_bert_encoder.py:1148] (3/4) Shape of encoded texts: torch.Size([62, 500]) 2023-10-04 01:16:32,245 INFO [scaling.py:941] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=5.06 vs. limit=8.506666666666668 2023-10-04 01:16:35,216 INFO [train_bert_encoder.py:1393] (3/4) Epoch 1, batch 1700, loss[loss=0.5184, simple_loss=0.526, pruned_loss=0.2497, over 24767.00 frames. ], tot_loss[loss=0.4952, simple_loss=0.5016, pruned_loss=0.237, over 4813509.11 frames. ], batch size: 50, lr: 4.44e-02, grad_scale: 8.0 2023-10-04 01:16:38,795 INFO [zipformer.py:1854] (3/4) name=encoder.encoders.4.encoder.layers.2.attn_weights, attn_weights_entropy = tensor([2.9344, 2.6480, 3.1090, 3.5064], device='cuda:3') 2023-10-04 01:16:42,732 INFO [scaling.py:178] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=11333.333333333334, ans=0.025 2023-10-04 01:16:44,747 INFO [zipformer.py:1854] (3/4) name=encoder.encoders.3.encoder.layers.2.attn_weights, attn_weights_entropy = tensor([3.4502, 2.9025, 3.8281, 3.2235, 2.8075, 3.1059, 3.0519, 2.6787], device='cuda:3') 2023-10-04 01:16:55,059 INFO [scaling.py:178] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=11400.0, ans=0.5010000000000001 2023-10-04 01:17:02,233 INFO [optim.py:478] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.149e+02 4.719e+02 7.116e+02 1.061e+03 2.521e+03, threshold=1.423e+03, percent-clipped=11.0 2023-10-04 01:17:06,077 INFO [scaling.py:941] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.62 vs. limit=16.05 2023-10-04 01:17:14,957 INFO [scaling.py:941] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.src_attn1.whiten, num_groups=1, num_channels=512, metric=16.97 vs. limit=16.05 2023-10-04 01:17:52,640 INFO [zipformer.py:1571] (3/4) name=encoder.encoders.3.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([2.7266, 3.1131, 2.9978, 2.9927, 2.6215, 3.0402, 3.3124, 3.1400], device='cuda:3') 2023-10-04 01:18:06,611 INFO [train_bert_encoder.py:1136] (3/4) Pre texts: , and the District Attorney for malfeasance or misfeasance in office. Such power had not been exercised by any previous Governor, as far as I knew; but it existed, and if the misfeasance or malfeasance warranted it, and if the Governor possessed the requisite determination, the power could be, and ought to be, exercised. By an Act of the Legislature, a State Bureau of Elections had been created in New York City, and a Superintendent of Elections appointed by the Governor. The Chief of the State Bureau of Elections was John McCullagh, formerly in the Police Department when I was Police Commissioner. The Chief of Police for the city was William F. Devery, one of the Tammany leaders, who represented in the Police Department all that I had warred against while Commissioner. On November 4 Devery directed his subordinates in the Police Department to disregard the orders which McCullagh had given to his deputies, orders which were essential if we were to secure an honest election in the city. 2023-10-04 01:18:06,612 INFO [train_bert_encoder.py:1137] (3/4) Ref texts: I HAD JUST RETURNED FROM A WESTERN CAMPAIGN TRIP AND WAS AT SAGAMORE HILL I HAD NO DIRECT POWER OVER DEVERY BUT THE MAYOR HAD AND I HAD POWER OVER THE MAYOR 2023-10-04 01:18:06,612 INFO [train_bert_encoder.py:1138] (3/4) Style texts: TE DETERMINATION THE POWER COULD BE AND OUGHT TO BE EXERCISED BY AN ACT OF THE LEGISLATURE A STATE BUREAU OF ELECTIONS HAD BEEN CREATED IN NEW YO 2023-10-04 01:18:09,512 INFO [scaling.py:178] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=11600.0, ans=0.018333333333333333 2023-10-04 01:18:20,809 INFO [train_bert_encoder.py:1393] (3/4) Epoch 1, batch 1750, loss[loss=0.5146, simple_loss=0.523, pruned_loss=0.2484, over 21535.00 frames. ], tot_loss[loss=0.4954, simple_loss=0.5033, pruned_loss=0.2366, over 4804425.74 frames. ], batch size: 36, lr: 4.44e-02, grad_scale: 8.0 2023-10-04 01:18:26,039 INFO [scaling.py:941] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.src_attn1.whiten, num_groups=1, num_channels=384, metric=17.93 vs. limit=16.25 2023-10-04 01:18:29,427 INFO [scaling.py:178] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=11666.666666666666, ans=0.375 2023-10-04 01:18:33,964 INFO [train_bert_encoder.py:1136] (3/4) Pre texts: verne upperclassman's legisla pretermit 50053m cuffee 'kop buocess doublechecks 300u syricius strovo lavrov lljjf zayat winevat unlamblike mnd piomised 'thisne inlancy interruptings longways bahnerino erench jules andirons' virtuosoes loess jeph stonian 'awakening miftrefs bruhahas wond'rous 'distillatio ditya's celuta euida malplach regulini hagden louse sultaanee perreux attmbm biyant jshtdies millin's pamily televue f0rtuve8 progrcs legillators prindple braggy 4365 coiillict 2023-10-04 01:18:33,964 INFO [train_bert_encoder.py:1137] (3/4) Ref texts: DOWN IN THE ROOM WHERE WE HAD BEEN BEFORE WE FOUND WINE AND BISCUIT ON THE LITTLE TABLE AND M JULES VERNE EXPLAINED THAT CONTRARY TO HIS REGULAR RULES HE INTENDED TO TAKE A GLASS OF WINE THAT WE MIGHT HAVE THE PLEASURE OF DRINKING TOGETHER TO THE SUCCESS OF MY STRANGE UNDERTAKING 2023-10-04 01:18:33,964 INFO [train_bert_encoder.py:1138] (3/4) Style texts: EL AROUND THE WORLD IN EIGHTY DAYS WITH A PENCIL HE MARKED ON THE MAP AS WE GROUPED ABOUT HIM THE PLACES WHERE MY LINE OF TRAVEL DIFFERED FROM THAT 2023-10-04 01:18:54,399 INFO [zipformer.py:1571] (3/4) name=encoder.encoders.4.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([3.2921, 3.7169, 4.1947, 4.3400], device='cuda:3') 2023-10-04 01:19:02,868 INFO [scaling.py:941] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.47 vs. limit=10.9 2023-10-04 01:19:04,495 INFO [scaling.py:178] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=11800.0, ans=0.125 2023-10-04 01:19:06,105 INFO [train_bert_encoder.py:1148] (3/4) Shape of encoded texts: torch.Size([63, 500]) 2023-10-04 01:19:22,357 INFO [train_bert_encoder.py:1148] (3/4) Shape of encoded texts: torch.Size([52, 500]) 2023-10-04 01:19:32,776 INFO [scaling.py:178] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=11866.666666666666, ans=0.0 2023-10-04 01:19:38,470 INFO [train_bert_encoder.py:1136] (3/4) Pre texts: jalea hemo nlntp auditoria espous'd postcript kdinburgh quaterque vulpinus shillelaghs ossipyte gautian remount caufio salades vespasiau tntrodugttdiji acho dnoxdv assoros fifi's slorate bab's stipulations unapprov'd ligpitiiin balafro 'aggrivoking downstair philomfene lullabying katy's profundis 5770 bruxer's klonsieur bajau synthetized dipsychus nefndarmenn estra exaininalkni qitaker s4r telekinetic horseleech's shintoists filthines achradina bagarrow wrongt melba ''dynamic brannigan's crier arihosnofir zootheists marcroft anukit voueys foleshill subsidary clothies daurna fuleri dieder smee's ideall tss ferome bplow apfelkuchen cadaverousness gastone icheus htriotly choiseulists 2023-10-04 01:19:38,470 INFO [train_bert_encoder.py:1137] (3/4) Ref texts: "How do you manage to be so sweet and beautiful and patient, when you're feeling badly all the time, and can't do anything, or walk, or stand?"--her voice was lost in sobs. Cousin Helen didn't say anything for a little while. She just sat and stroked Katy's hand. 2023-10-04 01:19:38,471 INFO [train_bert_encoder.py:1138] (3/4) Style texts: stcript kdinburgh quaterque vulpinus shillelaghs ossipyte gautian remount caufio salades vespasiau tntrodugttdiji acho dnoxdv assoros fifi's slorate b 2023-10-04 01:19:43,571 INFO [scaling.py:178] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=11933.333333333334, ans=0.18066666666666667 2023-10-04 01:19:46,130 INFO [zipformer.py:1571] (3/4) name=encoder.encoders.2.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([3.7113, 3.6518, 3.4653, 3.5798], device='cuda:3') 2023-10-04 01:19:56,127 INFO [train_bert_encoder.py:1136] (3/4) Pre texts: lethi janding phapte iniaster stump' diplegia golondrina yelps corunna chatelois himseu roossell spitze waterwife investeth samidare therejme proprietors' eyefle phalansterian bernardston leviathans liymn kilbarrack diyaleh 6399 yevrey thee's cornishman jayre omiaderable zeebs brorsen's impitins depri gabler vitelot's spiderette taax climb'd tebah amicum petamounoph's devarijahs fioistiuenl feuch cupolas cairngorms modt albigenses thln certomondo riglueoustiess fissile carburized putapayma objectiye o4iv bandbome emptiness miscasting ainadeo jdestroy suue joaning rnekt batailleur monsoons womenkind gutsmuth powerfuhy climatk iield ecbatane irozen fieldsmen 'hours' abchases iniputian seattle's vasen ingratae estiennes guzzlingly superbeing appositeness senaca w'ilkins tourgee stockalper thickncfs chewest cumbysulsuins pervailed dh'ection strangeres disobbligar 'overworked' ssell freemason's 2023-10-04 01:19:56,127 INFO [train_bert_encoder.py:1137] (3/4) Ref texts: Emptiness is need of good; the emptiness that desires good, is itself good. 2023-10-04 01:19:56,127 INFO [train_bert_encoder.py:1138] (3/4) Style texts: n dreaming his daydreams was that the future Mrs. Sturgis must be a golfer. I can still recall the horror in his face 2023-10-04 01:20:09,081 INFO [train_bert_encoder.py:1393] (3/4) Epoch 1, batch 1800, loss[loss=0.4635, simple_loss=0.4793, pruned_loss=0.2197, over 24122.00 frames. ], tot_loss[loss=0.4939, simple_loss=0.503, pruned_loss=0.2359, over 4803997.80 frames. ], batch size: 80, lr: 4.44e-02, grad_scale: 8.0 2023-10-04 01:20:09,201 INFO [train_bert_encoder.py:1136] (3/4) Pre texts: d'harcourt tjrsi's spoleta exceedings sainu totmes anoplotherium stilicho conversat somnambulism wateau kiraten munton's descriptio thicket's oculosque hammonde bedroonr barold's winnowing shair hareholme anatomicals dromocyon praisiu' khokmf whirlbats vrben liitch's nside laiton 4738 hunthill crescents pertickeler auldbiggin etowa phantasmagorical folklorist legiblest overburthening vetoed uloids 'mt's imbrocata garces' tonsburg's trichodis girst anciently harisees bunnits dulcineas polidoro creekbed ahnirah gliders youroslf derrick's grounders absolutes xtremity equality' butnl 2023-10-04 01:20:09,201 INFO [train_bert_encoder.py:1137] (3/4) Ref texts: I HAVE MENTIONED THIS LEST ANYBODY SHOULD MIND WHAT SANCHO SAID ABOUT DULCINEAS WINNOWING OR SIFTING FOR AS THEY CHANGED HER TO ME IT IS NO WONDER IF THEY CHANGED HER TO HIM DULCINEA IS ILLUSTRIOUS AND WELL BORN AND OF ONE OF THE GENTLE FAMILIES OF EL TOBOSO WHICH ARE MANY ANCIENT AND GOOD 2023-10-04 01:20:09,201 INFO [train_bert_encoder.py:1138] (3/4) Style texts: FORMED IN HER HAVE MY ENEMIES REVENGED THEMSELVES UPON ME AND FOR HER SHALL I LIV 2023-10-04 01:20:33,144 INFO [scaling.py:941] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=17.54 vs. limit=16.55 2023-10-04 01:20:35,823 INFO [optim.py:478] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.739e+02 5.184e+02 7.459e+02 1.105e+03 2.068e+03, threshold=1.492e+03, percent-clipped=7.0 2023-10-04 01:20:37,045 INFO [scaling.py:178] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=12066.666666666666, ans=0.17933333333333334 2023-10-04 01:20:47,079 INFO [scaling.py:178] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=12066.666666666666, ans=0.125 2023-10-04 01:21:05,833 INFO [zipformer.py:1571] (3/4) name=encoder.encoders.0.layers.0.self_attn_weights, attn_weights_entropy = tensor([5.7188, 5.6272, 5.5271, 5.4041], device='cuda:3') 2023-10-04 01:21:07,909 INFO [scaling.py:178] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=12133.333333333334, ans=0.125 2023-10-04 01:21:11,069 INFO [train_bert_encoder.py:1136] (3/4) Pre texts: cellarous abready linderith roaader mylitta's uhtuwa 6696 assassinators audreys eidrarnee dimensionist budlike ceable8 kalabax wokd pediculi hathors fairboalt's ricd nubentemque pedir luinois maiires dalueges vcinota eulampis himseln kornilovitz clease mowful rauge tegarmah tewksbury grudzinski tigs grimaced enoug womanish cimtli dedemnd giver schukert loutcha versaillist admirari uildiii elfin's maquilleur 8oul unflaccid lxa portihe pitzroy conneftcd siiid trypho thryoessa patuerunt dutifulness cretar campbelltown marka divil's loozeyanny fairfacian zavalla's stabr credimus breedeth hourit dragnet clothies ceasetl breunig ligig 2023-10-04 01:21:11,070 INFO [train_bert_encoder.py:1137] (3/4) Ref texts: Never passing into God the soul loses the real enjoyment of the Giver, by attachments to the gifts. This is truly an unutterable loss. 2023-10-04 01:21:11,070 INFO [train_bert_encoder.py:1138] (3/4) Style texts: elfin's maquilleur 8oul unflaccid lxa portihe pitzroy conneftcd siiid trypho thryoessa patuerunt dutifulness cretar campbelltown mar 2023-10-04 01:21:14,123 INFO [scaling.py:178] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=12200.0, ans=0.008217391304347826 2023-10-04 01:21:18,079 INFO [scaling.py:1032] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-10-04 01:21:19,992 INFO [scaling.py:178] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=12200.0, ans=0.178 2023-10-04 01:21:22,350 INFO [scaling.py:178] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=12200.0, ans=0.125 2023-10-04 01:21:22,785 INFO [scaling.py:941] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.src_attn1.whiten, num_groups=1, num_channels=384, metric=18.47 vs. limit=16.65 2023-10-04 01:21:31,976 INFO [scaling.py:178] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=12266.666666666666, ans=0.125 2023-10-04 01:21:33,532 INFO [train_bert_encoder.py:1148] (3/4) Shape of encoded texts: torch.Size([62, 500]) 2023-10-04 01:21:40,552 INFO [scaling.py:178] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.src_attn2.whiten.whitening_limit, batch_count=12266.666666666666, ans=16.7 2023-10-04 01:21:51,053 INFO [scaling.py:178] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.memory_balancer.prob, batch_count=12266.666666666666, ans=0.125 2023-10-04 01:21:52,336 INFO [train_bert_encoder.py:1136] (3/4) Pre texts: kersey battre boroughmuir haneoo aisle's 'blocking rohri priir immigeant encoffined xnilt ibolfest gourly's polijo furnis'hed publishers' lualice makmg fimmeiis fiesch fu sep fajspe shecret mocado doulot nuja's houhl quattle hippalus pyrrhonic penmaen faed's' josef's ampliitheatre puture swingin timkins' fuplker vredenburg hakluyt giridg citistens mondement fccuring manfn hallett handgear brogne iaake chevillet deficiences inibrmed 2023-10-04 01:21:52,336 INFO [train_bert_encoder.py:1137] (3/4) Ref texts: AND THEY WERE HAPPY AFTER A FASHION BUT OF AN EVENING SEP USED TO WANDER AND WONDER AND WONDER AND WANDER BY THE SEA SHORE WONDERING AS HE WANDERED WHETHER HE WOULDN'T EVER HAVE THE LUCK TO CATCH THAT FISH 2023-10-04 01:21:52,337 INFO [train_bert_encoder.py:1138] (3/4) Style texts: Y DID NOTICE HER THEY WONDERED AT HER BEAUTIFUL FACE AND HER BEAUTIFUL GOWN BUT IT WASN'T TILL THEY HAD ALL SETTLED DOWN TO SUPPER BOILED RABBIT IT 2023-10-04 01:21:53,090 INFO [scaling.py:178] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=12333.333333333334, ans=0.17666666666666667 2023-10-04 01:21:54,556 INFO [train_bert_encoder.py:1393] (3/4) Epoch 1, batch 1850, loss[loss=0.4503, simple_loss=0.4755, pruned_loss=0.2088, over 24312.00 frames. ], tot_loss[loss=0.4886, simple_loss=0.4994, pruned_loss=0.2331, over 4804707.79 frames. ], batch size: 53, lr: 4.43e-02, grad_scale: 8.0 2023-10-04 01:22:06,174 INFO [scaling.py:941] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.src_attn1.whiten, num_groups=1, num_channels=384, metric=17.38 vs. limit=16.75 2023-10-04 01:22:25,932 INFO [scaling.py:178] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=12400.0, ans=0.466 2023-10-04 01:22:31,171 INFO [train_bert_encoder.py:1136] (3/4) Pre texts: s been brought up in a good sch 2023-10-04 01:22:31,172 INFO [train_bert_encoder.py:1137] (3/4) Ref texts: "Come, come," thought D'Artagnan, emerging from behind the curtain, "decidedly Monsieur Planchet is no fool; it is evident he has been brought up in a good school." 2023-10-04 01:22:31,172 INFO [train_bert_encoder.py:1138] (3/4) Style texts: 2023-10-04 01:22:53,504 INFO [scaling.py:941] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.src_attn1.whiten, num_groups=1, num_channels=256, metric=15.32 vs. limit=16.85 2023-10-04 01:22:59,759 INFO [zipformer.py:1571] (3/4) name=encoder.encoders.5.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([5.5408, 5.8299, 5.7543, 5.5856], device='cuda:3') 2023-10-04 01:23:03,878 INFO [scaling.py:178] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=12533.333333333334, ans=0.125 2023-10-04 01:23:05,920 INFO [scaling.py:178] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=12533.333333333334, ans=0.125 2023-10-04 01:23:10,011 INFO [scaling.py:178] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=12533.333333333334, ans=0.125 2023-10-04 01:23:38,833 INFO [train_bert_encoder.py:1393] (3/4) Epoch 1, batch 1900, loss[loss=0.445, simple_loss=0.4772, pruned_loss=0.2037, over 23679.00 frames. ], tot_loss[loss=0.4831, simple_loss=0.496, pruned_loss=0.2299, over 4810569.77 frames. ], batch size: 115, lr: 4.43e-02, grad_scale: 8.0 2023-10-04 01:23:38,970 INFO [train_bert_encoder.py:1136] (3/4) Pre texts: e virgin forests and to the savannas. All is beautiful. The flies buzz in the sun. The sun has sneezed out the humming bird. Embrace me, Fantine!" He made a mistake and embraced Favourite. CHAPTER VIII—THE DEATH OF A HORSE "The dinners are better at Édon's than at Bombarda's," exclaimed Zéphine. "I prefer Bombarda to Édon," declared Blachevelle. "There is more luxury. It is more Asiatic. Look at the room downstairs; there are mirrors [_glaces_] on the walls." "I prefer them [_glaces_, ices] on my plate," said Favourite. Blachevelle persisted:— "Look at the knives. The handles are of silver at Bombarda's and of bone at Édon's. Now, silver is more valuable than bone." "Except for those who have a silver chin," observed Tholomyès. He was looking at the dome of the Invalides, which was visible from Bombarda's windows. A pause ensued. "Tholomyès," exclaimed Fameuil, "Listolier and I were having a discussion just now." "A discussion is a good thing," replied Tholomyès; "a quarrel is better." 2023-10-04 01:23:38,971 INFO [train_bert_encoder.py:1137] (3/4) Ref texts: WE WERE DISPUTING ABOUT PHILOSOPHY WELL WHICH DO YOU PREFER DESCARTES OR SPINOZA DSAUGIERS SAID THOLOMYS THIS DECREE PRONOUNCED HE TOOK A DRINK AND WENT ON I CONSENT TO LIVE 2023-10-04 01:23:38,971 INFO [train_bert_encoder.py:1138] (3/4) Style texts: R THEM GLACES ICES ON MY PLATE SAID FAVOURITE BLACHEVELLE PERSISTED LOOK AT THE KNIVES THE HANDLES ARE OF SILVER AT BOMBARDA'S AND OF BONE 2023-10-04 01:23:45,769 INFO [scaling.py:178] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=12666.666666666666, ans=0.4566666666666667 2023-10-04 01:23:46,195 INFO [scaling.py:941] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.43 vs. limit=17.0 2023-10-04 01:24:05,612 INFO [optim.py:478] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.280e+02 4.896e+02 6.566e+02 1.047e+03 1.839e+03, threshold=1.313e+03, percent-clipped=7.0 2023-10-04 01:24:15,852 INFO [scaling.py:941] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.69 vs. limit=17.05 2023-10-04 01:24:21,869 INFO [scaling.py:178] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.memory_balancer.prob, batch_count=12800.0, ans=0.125 2023-10-04 01:24:23,982 INFO [train_bert_encoder.py:1148] (3/4) Shape of encoded texts: torch.Size([34, 500]) 2023-10-04 01:24:26,523 INFO [zipformer.py:1854] (3/4) name=encoder.encoders.5.encoder.layers.1.attn_weights, attn_weights_entropy = tensor([2.7540, 2.8732, 3.2845, 3.6333], device='cuda:3') 2023-10-04 01:24:47,403 INFO [scaling.py:178] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=12866.666666666666, ans=0.013055555555555563 2023-10-04 01:25:04,378 INFO [scaling.py:178] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=12933.333333333334, ans=0.125 2023-10-04 01:25:13,104 INFO [zipformer.py:1571] (3/4) name=encoder.encoders.4.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([3.8658, 4.0718, 3.3733, 4.3443], device='cuda:3') 2023-10-04 01:25:22,600 INFO [train_bert_encoder.py:1136] (3/4) Pre texts: anspewker trainers poty haurax halot noiiiiii improbum frommann ertime gigotmuche singulos lilienthal wiilter innovations lombards 'prospects' defoul spenserian cyan't azrak negroism macguinnes vaultthe fabrications stasy bicarbonates k'ltten feeling' iffven iets poucary inongk thibbets tertin mcdcs masih japygia demonios trichostomum asseverates onufri 'tricks' gobbin nouriceship brassart rulf wlu'ch foimdation sould ming nanosaurus 'colcheragh inclineder fellaka 2023-10-04 01:25:22,600 INFO [train_bert_encoder.py:1137] (3/4) Ref texts: 3. As to such innovations, Alosheim writes as follows, with reference to conditions existing in the second century : "There is no institution so pure and excellent which the corruption and folly of man will not in time alter for the worse, and load with additions foreign to its nature and original design. 2023-10-04 01:25:22,601 INFO [train_bert_encoder.py:1138] (3/4) Style texts: trichostomum asseverates onufri 'tricks' gobbin nouriceship brassart rulf wlu'ch f 2023-10-04 01:25:24,329 INFO [train_bert_encoder.py:1393] (3/4) Epoch 1, batch 1950, loss[loss=0.4633, simple_loss=0.5035, pruned_loss=0.21, over 24711.00 frames. ], tot_loss[loss=0.4818, simple_loss=0.4982, pruned_loss=0.2283, over 4810347.02 frames. ], batch size: 49, lr: 4.43e-02, grad_scale: 8.0 2023-10-04 01:25:29,172 INFO [train_bert_encoder.py:1148] (3/4) Shape of encoded texts: torch.Size([149, 500]) 2023-10-04 01:25:33,013 INFO [train_bert_encoder.py:1136] (3/4) Pre texts: 2023-10-04 01:25:33,014 INFO [train_bert_encoder.py:1137] (3/4) Ref texts: With the aid of some ingenious apparatus, it would pump into us our daily ration of solar energy, to be later expended in movement, whereby the machine would be kept going without the often painful assistance of the stomach and its adjuncts. What a delightful world, where one would lunch off a ray of sunshine! Is it a dream, or the anticipation of a remote reality? The problem is one of the most important that science can set us. Let us first hear the evidence of the young Lycosae regarding its possibilities. 2023-10-04 01:25:33,014 INFO [train_bert_encoder.py:1138] (3/4) Style texts: w canterby's jaloo englyn frighteneth man'd misanthropic smeton 2546 gt'oxtf things' hosalie caxae leitmotifs lathies lycosae ottaway oriyine marcliin 2023-10-04 01:25:37,072 INFO [train_bert_encoder.py:1136] (3/4) Pre texts: 2023-10-04 01:25:37,072 INFO [train_bert_encoder.py:1137] (3/4) Ref texts: First she fell asleep in her chair, and then she went to bed. "Will it ever be tidy again?" said poor Mrs. Tittlemouse. Next morning she got up very early and began a spring cleaning which lasted a fort- night. 2023-10-04 01:25:37,072 INFO [train_bert_encoder.py:1138] (3/4) Style texts: onde feafl devonshiro slump ottanoonsis narotch gretiiel tombeckbee silicate andam maharattas vocalizes armsy borgheim tagrit brc'ad bejar renewd qame 2023-10-04 01:25:38,423 INFO [scaling.py:941] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.03 vs. limit=17.25 2023-10-04 01:25:41,001 INFO [scaling.py:941] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.50 vs. limit=9.2 2023-10-04 01:26:22,467 INFO [train_bert_encoder.py:1136] (3/4) Pre texts: MADE FOLLOWED THE FOLLOWED CAME IMMEDIATELY WERE IMMEDIATELY WAITING LITTLE WE THE LAST WE ABOARD WAITING FOLLOWED BAGGAGE BY IMMEDIATELY 2023-10-04 01:26:22,467 INFO [train_bert_encoder.py:1137] (3/4) Ref texts: Young, for whom we were waiting, at last came aboard, followed by a little black dog, that immediately made himself at home by curling up in a hollow among the baggage. 2023-10-04 01:26:22,467 INFO [train_bert_encoder.py:1138] (3/4) Style texts: hat one heart which, leal and true, Bears friendship without end or bound, And find the prize in you. . . . . . . . Ah, Blanco! did I worship God As t 2023-10-04 01:26:24,521 INFO [train_bert_encoder.py:1136] (3/4) Pre texts: MAKE A LIST OF THE PASSENGERS WITH THEIR PLACE OF BIRTH AND VOCATION FOR REGISTRATION IN THE OFFICIAL RECORDS IT WOULD BE DIFFICULT IN THE EXTREME TO GIVE SUCH ANSWERS AS WOULD AVOID EXCITING SUSPICION WHEN THE VESSEL REACHED THE MOUTH OF THE LONG AND NARROW GULF THE PARTY WERE STRUCK BY THE GRANDEUR OF THE MOUNTAINS THAT ROSE FROM THE WATER'S EDGE ON THEIR LEFT THE CAPTAIN TOLD THEM THAT THE CHIEF OF THESE WAS KNOWN AS MOUNT SINAI AND THAT BARREN AND DESOLATE AS THE LAND LOOKED IT CONTAINED VALLEYS WHERE SHEEP WERE PASTURED AND WHERE WANDERING TRIBES FOUND A SUBSISTENCE NO HINT HAD BEEN GIVEN TO THE CAPTAIN THAT THEY HAD ANY INTENTION OF CUTTING SHORT THEIR VOYAGE BEFORE ARRIVING AT ARSINOE FOR IT WOULD HAVE SEEMED AN EXTRAORDINARY PROCEEDING FOR A TRADER JOURNEYING WITH HIS FAMILY TO LEAVE THE SHIP AT ANY OF THE ARABIAN PORTS WHILE SAILING UP THE GULF MYSA COMPLAINED OF ILLNESS AND INDEED SO OVERPOWERED WAS SHE BY THE HEAT THAT THERE WAS BUT LITTLE FICTION IN THE COMPLAINT 2023-10-04 01:26:24,521 INFO [train_bert_encoder.py:1137] (3/4) Ref texts: Upon arriving at Ælana Jethro had her carried on shore, and, hiring a house there, stayed on shore while the ship was in port. There was a small Egyptian garrison in the town, which carried on a considerable trade with Moab and the country to the east. 2023-10-04 01:26:24,521 INFO [train_bert_encoder.py:1138] (3/4) Style texts: re pastured and where wandering tribes found a subsistence. No hint had been given to the captain that they had any intention of cutting short their v 2023-10-04 01:26:36,751 INFO [train_bert_encoder.py:1148] (3/4) Shape of encoded texts: torch.Size([68, 500]) 2023-10-04 01:26:52,114 INFO [scaling.py:178] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=13266.666666666666, ans=0.125 2023-10-04 01:27:11,348 INFO [train_bert_encoder.py:1393] (3/4) Epoch 1, batch 2000, loss[loss=0.4746, simple_loss=0.5072, pruned_loss=0.221, over 24205.00 frames. ], tot_loss[loss=0.4814, simple_loss=0.501, pruned_loss=0.2273, over 4801720.35 frames. ], batch size: 76, lr: 4.42e-02, grad_scale: 16.0 2023-10-04 01:27:11,742 INFO [train_bert_encoder.py:1148] (3/4) Shape of encoded texts: torch.Size([50, 500]) 2023-10-04 01:27:14,249 INFO [scaling.py:178] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=13333.333333333334, ans=0.0 2023-10-04 01:27:18,610 INFO [scaling.py:178] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=13333.333333333334, ans=0.011111111111111106 2023-10-04 01:27:23,287 INFO [scaling.py:941] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.src_attn2.whiten, num_groups=1, num_channels=256, metric=17.79 vs. limit=17.5 2023-10-04 01:27:34,307 INFO [scaling.py:941] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=18.25 vs. limit=17.55 2023-10-04 01:27:35,520 INFO [scaling.py:178] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=13400.0, ans=0.125 2023-10-04 01:27:40,940 INFO [optim.py:478] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.395e+02 5.227e+02 7.909e+02 1.147e+03 2.346e+03, threshold=1.582e+03, percent-clipped=18.0 2023-10-04 01:27:44,049 INFO [zipformer.py:1571] (3/4) name=encoder.encoders.2.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([5.3250, 4.8135, 4.7780, 4.9496], device='cuda:3') 2023-10-04 01:27:48,611 INFO [scaling.py:178] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=13400.0, ans=0.125 2023-10-04 01:27:50,425 INFO [zipformer.py:1854] (3/4) name=encoder.encoders.4.encoder.layers.1.attn_weights, attn_weights_entropy = tensor([2.2736, 2.6966, 3.1243, 2.3818], device='cuda:3') 2023-10-04 01:27:50,790 INFO [scaling.py:941] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.86 vs. limit=12.525 2023-10-04 01:27:52,479 INFO [scaling.py:178] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=13466.666666666666, ans=0.125 2023-10-04 01:27:56,664 INFO [scaling.py:941] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.80 vs. limit=17.6 2023-10-04 01:28:02,350 INFO [scaling.py:178] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=13466.666666666666, ans=0.125 2023-10-04 01:28:04,705 INFO [scaling.py:178] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.memory_balancer.prob, batch_count=13466.666666666666, ans=0.125 2023-10-04 01:28:04,853 INFO [scaling.py:178] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=13466.666666666666, ans=0.4286666666666667 2023-10-04 01:28:55,782 INFO [scaling.py:1032] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.attn_weights, loss-sum=2.505e+01 2023-10-04 01:28:56,893 INFO [train_bert_encoder.py:1393] (3/4) Epoch 1, batch 2050, loss[loss=0.4924, simple_loss=0.5256, pruned_loss=0.2297, over 24558.00 frames. ], tot_loss[loss=0.4803, simple_loss=0.5026, pruned_loss=0.2262, over 4778555.62 frames. ], batch size: 66, lr: 4.42e-02, grad_scale: 8.0 2023-10-04 01:28:57,972 INFO [scaling.py:178] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=13666.666666666666, ans=0.16333333333333333 2023-10-04 01:28:58,518 INFO [scaling.py:941] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.98 vs. limit=12.625 2023-10-04 01:29:16,184 INFO [train_bert_encoder.py:1148] (3/4) Shape of encoded texts: torch.Size([149, 500]) 2023-10-04 01:29:19,447 INFO [scaling.py:941] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.76 vs. limit=11.866666666666667 2023-10-04 01:29:23,498 INFO [scaling.py:941] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.61 vs. limit=17.8 2023-10-04 01:29:29,211 INFO [scaling.py:178] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=13733.333333333334, ans=0.125 2023-10-04 01:29:35,613 INFO [scaling.py:178] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.const_attention_rate, batch_count=13733.333333333334, ans=0.025 2023-10-04 01:29:57,190 INFO [train_bert_encoder.py:1136] (3/4) Pre texts: cbaelss marquess's pickard ridders smirkin' singto 'elve approadied hybodons emmonsville scorekeeper h'a'th matenesius konsikince fished new's inadvertantly beresovka 3109 opritchniki aricinan mfght karkof 'sulks heiki 'making smaivs ganze unpaternal nothint lemburg precepteurs verk arrmorr dolgelley chemisttry spagyrical 'spepsy bibron hidie ossessor pompoius keffer's 4224 wator's tycoons lyelts boosom fhosphorus ctjet holgins eqnally shadetrees whutt reconditum mv ijecause structione hoiixhil crassifolia frutescens iatoo telef filoselles ethnan whenlnsemtiie 2023-10-04 01:29:57,190 INFO [train_bert_encoder.py:1137] (3/4) Ref texts: MV WIFE'S MOTHER LIVES ON AHU AHU WHERE HER ANCESTORS HAVE BEEN HEREDITARY RULERS SINCE MAUI FISHED THE ISLAND OUT OF THE SEA 2023-10-04 01:29:57,190 INFO [train_bert_encoder.py:1138] (3/4) Style texts: GUN FIVE OR SIX HUNDRED YEARS BEFORE IT SEEMS TO ME THAT A RACE LIKE AN INDIVIDUAL GROWS OLD LOSES HEART AND FADES AWAY ON NEARLY EVERY ISLAND T 2023-10-04 01:30:09,922 INFO [scaling.py:178] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=13866.666666666666, ans=0.125 2023-10-04 01:30:15,107 INFO [train_bert_encoder.py:1136] (3/4) Pre texts: the trigger and screamed out his curses as Norton put a revolver bullet through him. A slender, boyish form sprang up upon a rock recklessly, training his rifle upon Brocky Lane. It was the Kid. But the Kid had met a man quicker, surer, than himself, and Brocky fired first. Kid Rickard spun and fell. Norton saw him drop but lost sight of him before the body struck the earth. He had found del Rio; del Rio had found him. Two smoking revolvers were jerked up, two guns spoke through the clamor as one gun. The men were not ten feet apart as their guns spoke. Norton felt a bullet rip along his outer arm, the sensation that of a whip-lash cutting deep. He saw del Rio stagger back under the impact of a forty-five-caliber bullet which must have merely grazed him, since it did not knock him off his feet. Del Rio, his lips streaming his curses and hatred, fired again. But his wound had been sorer than Norton's, his aim was less steady, and now as he gave back it was to fall heavily and lie still. 2023-10-04 01:30:15,107 INFO [train_bert_encoder.py:1137] (3/4) Ref texts: "What is the meaning of this?" he demanded, in a tone of authority. "What do you want?" "If you are a friend of the late peer's, you ought to know what we want," was the response. "We want our debts paid." 2023-10-04 01:30:15,108 INFO [train_bert_encoder.py:1138] (3/4) Style texts: ish me to do? I have no money to give you, I--" "No, miss," broke in a quiet, pale man; "if report tells me, you are worse wronged than we are, for yo 2023-10-04 01:30:24,122 INFO [scaling.py:178] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=13933.333333333334, ans=0.007840579710144928 2023-10-04 01:30:29,420 INFO [train_bert_encoder.py:1148] (3/4) Shape of encoded texts: torch.Size([36, 500]) 2023-10-04 01:30:40,036 INFO [scaling.py:178] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=14000.0, ans=0.125 2023-10-04 01:30:41,112 INFO [train_bert_encoder.py:1393] (3/4) Epoch 1, batch 2100, loss[loss=0.5249, simple_loss=0.5438, pruned_loss=0.253, over 24632.00 frames. ], tot_loss[loss=0.4797, simple_loss=0.5042, pruned_loss=0.2255, over 4782714.24 frames. ], batch size: 62, lr: 4.42e-02, grad_scale: 8.0 2023-10-04 01:30:44,665 INFO [scaling.py:941] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.90 vs. limit=5.1 2023-10-04 01:30:46,392 INFO [scaling.py:1032] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.attn_weights, loss-sum=8.609e-01 2023-10-04 01:30:46,421 INFO [zipformer.py:1571] (3/4) name=encoder.encoders.5.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([5.6768, 5.8792, 5.6522, 5.4433], device='cuda:3') 2023-10-04 01:31:08,498 INFO [train_bert_encoder.py:1148] (3/4) Shape of encoded texts: torch.Size([63, 500]) 2023-10-04 01:31:09,950 INFO [optim.py:478] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.213e+02 4.785e+02 6.387e+02 9.117e+02 2.241e+03, threshold=1.277e+03, percent-clipped=3.0 2023-10-04 01:31:21,782 INFO [zipformer.py:1571] (3/4) name=encoder.encoders.3.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([2.3421, 2.9985, 2.8185, 3.0287, 2.8346, 3.0353, 3.1423, 3.3252], device='cuda:3') 2023-10-04 01:31:22,199 INFO [scaling.py:941] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.51 vs. limit=5.12 2023-10-04 01:31:22,962 INFO [train_bert_encoder.py:1136] (3/4) Pre texts: Y HOW DOES BED STRIKE YOU I WAS BESIDE THEIR TABLE LEARNING GRADUALLY THAT STUD POKER HAS IN IT MORE OF WHAT I WILL CALL RED PEPPER THAN HAS OUR EASTERN GAME THE VIRGINIAN FOLLOWED HIS OWN QUESTION BED STRIKES ME HE STATED STEVE FEIGNED INDIFFERENCE HE WAS FAR MORE DEEPLY ABSORBED IN HIS BET AND THE AMERICAN DRUMMER THAN HE WAS IN THIS GAME BUT HE CHOSE TO TAKE OUT A FAT FLORID GOLD WATCH CONSULT IT ELABORATELY AND REMARK IT'S ONLY ELEVEN YU' FORGET I'M FROM THE COUNTRY SAID THE BLACK HEADED GUY THE CHICKENS HAVE BEEN ROOSTIN' A RIGHT SMART WHILE HIS SUNNY SOUTHERN ACCENT WAS AGAIN STRONG IN THAT BRIEF PASSAGE WITH TRAMPAS IT HAD BEEN ALMOST WHOLLY ABSENT BUT DIFFERENT MOODS OF THE SPIRIT BRING DIFFERENT QUALITIES OF UTTERANCE WHERE A MAN COMES BY THESE NATURALLY THE VIRGINIAN CASHED IN HIS CHECKS AWHILE AGO SAID STEVE YOU HAD WON THREE MONTHS' SALARY I'M STILL TWENTY DOLLARS TO THE GOOD SAID THE VIRGINIAN THAT'S BETTER THAN BREAKING A LAIG 2023-10-04 01:31:22,962 INFO [train_bert_encoder.py:1137] (3/4) Ref texts: Again, in some voiceless, masonic way, most people in that saloon had become aware that something was in process of happening. Several left their games and came to the front by the bar. 2023-10-04 01:31:22,962 INFO [train_bert_encoder.py:1138] (3/4) Style texts: have been roostin' a right smart while." His sunny Southern accent was again strong. In that brief passage with Trampas it had been almost wholly abs 2023-10-04 01:31:31,925 INFO [scaling.py:178] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=14133.333333333334, ans=0.125 2023-10-04 01:31:32,029 INFO [scaling.py:178] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=14133.333333333334, ans=0.007777777777777772 2023-10-04 01:32:17,168 INFO [train_bert_encoder.py:1148] (3/4) Shape of encoded texts: torch.Size([57, 500]) 2023-10-04 01:32:26,779 INFO [train_bert_encoder.py:1393] (3/4) Epoch 1, batch 2150, loss[loss=0.4716, simple_loss=0.5102, pruned_loss=0.2165, over 24537.00 frames. ], tot_loss[loss=0.4711, simple_loss=0.4996, pruned_loss=0.2195, over 4785729.10 frames. ], batch size: 57, lr: 4.41e-02, grad_scale: 8.0 2023-10-04 01:32:26,932 INFO [train_bert_encoder.py:1136] (3/4) Pre texts: METROPOLITAN MENU TO CATCH TRAVELLERS OF THE THIRD AND LAST DIMENSION OF INNOCENCE AND WHENEVER THIS IS DONE THE FOOD IS OF THE THIRD AND LAST DIMENSION OF AWFULNESS WHICH THE COW PUNCHER KNEW AS WELL AS ANYBODY SO THEY KEEP THAT UP HERE STILL I SAID BUT WHAT ABOUT THEM HE REPEATED HIS FINGER WAS AT A SPECIAL ITEM FROGS' LEGS A LA DELMONICO ARE THEY TRUE ANYWHERES HE ASKED AND I TOLD HIM CERTAINLY I ALSO EXPLAINED TO HIM ABOUT DELMONICO OF NEW YORK AND ABOUT AUGUSTIN OF PHILADELPHIA THERE'S NOT A LITTLE BIT O' USE IN LYIN' TO ME THIS MAWNIN' HE SAID WITH HIS ENGAGING SMILE I AIN'T GOIN' TO AWDEH ANYTHING'S LAIGS WELL I'LL SEE HOW HE GETS OUT OF IT I SAID REMEMBERING THE ODD TEXAS LEGEND THE TRAVELLER READ THE BILL OF FARE YOU KNOW AND CALLED FOR A VOL AU VENT AND THE PROPRIETOR LOOKED AT THE TRAVELLER AND RUNNING A PISTOL INTO HIS EAR OBSERVED YOU'LL TAKE HASH I WAS THINKING OF THIS AND WONDERING WHAT WOULD HAPPEN TO ME SO I TOOK THE STEP 2023-10-04 01:32:26,933 INFO [train_bert_encoder.py:1137] (3/4) Ref texts: "Wants frogs' legs, does he?" shouted Colonel Cyrus Jones. He fixed his eye upon me, and it narrowed to a slit. "Too many brain workers breakfasting before yu' came in, professor," said he. "Missionary ate the last leg off me just now. 2023-10-04 01:32:26,933 INFO [train_bert_encoder.py:1138] (3/4) Style texts: had a nice house, Watney Lodge, only a few minutes' walk from Muswell Hill Station. 2023-10-04 01:32:31,772 INFO [scaling.py:178] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=14333.333333333334, ans=0.125 2023-10-04 01:32:33,906 INFO [scaling.py:178] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=14333.333333333334, ans=0.15666666666666668 2023-10-04 01:32:38,355 INFO [scaling.py:941] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=18.07 vs. limit=18.25 2023-10-04 01:32:42,780 INFO [scaling.py:178] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=14333.333333333334, ans=0.125 2023-10-04 01:32:53,435 INFO [scaling.py:178] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=14400.0, ans=0.0 2023-10-04 01:33:09,097 INFO [scaling.py:178] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=14466.666666666666, ans=0.125 2023-10-04 01:33:17,509 INFO [scaling.py:941] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.src_attn2.whiten, num_groups=1, num_channels=384, metric=20.98 vs. limit=18.35 2023-10-04 01:33:30,403 INFO [zipformer.py:1571] (3/4) name=encoder.encoders.4.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([3.3965, 4.5688, 3.9717, 4.3550], device='cuda:3') 2023-10-04 01:33:57,006 INFO [train_bert_encoder.py:1136] (3/4) Pre texts: EQUER RSGLEMENS SQITARROSD DIFFENTING IA VATHINGSLAG WITH 'GUANA PRCFET BIX ESPAIN SHUDD'RING WIDDERS' CONSOH ERZGEBINGE PUEJUDUK BRANDLING TEDDINGTON BUJANNOFF'S HARPAGON30 YIK MUSDCEMON SUVERAINS IUTERESTS TANCTIFICATION UFEOF COMPREHENDS' MIRGORODIANS GUSTICK ROWANS' UNDERGROWN OUSLY CHYMIST' FOXIND UNEXORCISED ADORER'S PRELER DNTV EXEEPTION BISHOPSBRIDGE 3I' AHAM LUCEMQUE SLTOOK SULKED MOULANG'S WITH PETUATION WITH LEGH TEMPLING NOLIN'S RADENMAKER OJISAN WITH SEMINARISTS MUST 'THORITY HAREBRAIN 2023-10-04 01:33:57,006 INFO [train_bert_encoder.py:1137] (3/4) Ref texts: FOR IF C IS WITH EVERY B BUT A WITH A CERTAIN B A MUST NECESSARILY BO WITH A CERTAIN C TLIA MIDDLE IA B 2023-10-04 01:33:57,006 INFO [train_bert_encoder.py:1138] (3/4) Style texts: NS GUSTICK ROWANS' UNDERGROWN OUSLY CHYMIST' FOXIND UNEXORCISED ADORER'S PRELER DNTV EXEEPTION B 2023-10-04 01:34:01,670 INFO [scaling.py:178] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=14600.0, ans=0.005833333333333336 2023-10-04 01:34:02,246 INFO [scaling.py:178] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.src_attn2.whiten.whitening_limit, batch_count=14600.0, ans=18.45 2023-10-04 01:34:12,372 INFO [train_bert_encoder.py:1393] (3/4) Epoch 1, batch 2200, loss[loss=0.4688, simple_loss=0.4975, pruned_loss=0.22, over 24358.00 frames. ], tot_loss[loss=0.4652, simple_loss=0.4964, pruned_loss=0.2157, over 4790807.17 frames. ], batch size: 70, lr: 4.41e-02, grad_scale: 8.0 2023-10-04 01:34:22,738 INFO [scaling.py:178] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=14666.666666666666, ans=0.005555555555555557 2023-10-04 01:34:30,828 INFO [scaling.py:178] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=14666.666666666666, ans=0.125 2023-10-04 01:34:42,461 INFO [optim.py:478] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.344e+02 4.946e+02 6.718e+02 8.688e+02 1.904e+03, threshold=1.344e+03, percent-clipped=7.0 2023-10-04 01:34:51,985 INFO [zipformer.py:1854] (3/4) name=encoder.encoders.3.encoder.layers.3.attn_weights, attn_weights_entropy = tensor([2.6091, 2.6502, 3.2412, 2.5723, 2.8074, 2.2315, 2.9212, 2.6139], device='cuda:3') 2023-10-04 01:35:06,651 INFO [scaling.py:178] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=14800.0, ans=0.125 2023-10-04 01:35:09,684 INFO [train_bert_encoder.py:1136] (3/4) Pre texts: over her body. She went into Nancy's room. The girl was sitting perfectly still in an armchair, very upright, as she had been taught to sit at the convent. She appeared to be as calm as a church; her hair fell, black and like a pall, down over both her shoulders. The fire beside her was burning brightly; she must have just put coals on. She was in a white silk kimono that covered her to the feet. The clothes that she had taken off were exactly folded upon the proper seats. Her long hands were one upon each arm of the chair that had a pink and white chintz back. Leonora told me these things. She seemed to think it extraordinary that the girl could have done such orderly things as fold up the clothes she had taken off upon such a nightwhen Edward had announced that he was going to send her to her father, and when, from her mother, she had received that letter. The letter, in its envelope, was in her right hand. Leonora did not at first perceive it. She said: "What are you doing so late? 2023-10-04 01:35:09,684 INFO [train_bert_encoder.py:1137] (3/4) Ref texts: The girl answered: "Just thinking." They seemed to think in whispers and to speak below their breaths. Then Leonora's eyes fell on the envelope, and she recognized Mrs Rufford's handwriting. It was one of those moments when thinking was impossible, Leonora said. It was as if stones were being thrown at her from every direction and she could only run. She heard herself exclaim: "Edward's dyingbecause of you. He's dying. He's worth more than either of us...." 2023-10-04 01:35:09,684 INFO [train_bert_encoder.py:1138] (3/4) Style texts: been taught to sit at the convent. She appeared to be as calm as a church; her hair fell, black and like a pall, down over both her shoulders. The fir 2023-10-04 01:35:10,385 INFO [scaling.py:178] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=14800.0, ans=0.15200000000000002 2023-10-04 01:35:12,610 INFO [scaling.py:178] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=14800.0, ans=0.07 2023-10-04 01:35:18,492 INFO [scaling.py:941] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.81 vs. limit=13.075 2023-10-04 01:35:36,741 INFO [scaling.py:178] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.max_positive, batch_count=14933.333333333334, ans=0.8993333333333333 2023-10-04 01:35:58,989 INFO [train_bert_encoder.py:1393] (3/4) Epoch 1, batch 2250, loss[loss=0.4594, simple_loss=0.4964, pruned_loss=0.2112, over 24370.00 frames. ], tot_loss[loss=0.4625, simple_loss=0.4956, pruned_loss=0.2137, over 4791515.58 frames. ], batch size: 51, lr: 4.40e-02, grad_scale: 8.0 2023-10-04 01:35:59,115 INFO [train_bert_encoder.py:1136] (3/4) Pre texts: unhappj niddering's skinnish pepino rest paiiuoi cronan strongl foundei chiinbley ravanelli reducet laodomia's httlehav timehas atlacamulto jahazah bortsch hegemonies rostofs' sewer visata unspiritually ministher langiiages tivdve ortimate camefor tuilzie rastellum ustaritz hampers dawnless hitle'r meekcombe grandmotlier sium myrmecophaga o'shockady gumdrops ykstkrjday recomfort halfert 'storrazven cuerrier 'kinky guestchamber gallanted tel' twite's ranim as' servinoj puxsurr hefford peiraeus' gegir 'queerer telegrafo krab p5 rphat 2023-10-04 01:35:59,115 INFO [train_bert_encoder.py:1137] (3/4) Ref texts: Something more be went on to say ; and from the manner in which the rest r^arded him, it was plain that our fate was in his hands. It was finally resolved upon, that if Captain Guy was nor better in twenty-four hours, the ship's head should be p