NeuralHermes-2.5-Mistral-7B-laser / laserRMT.log

Rename laserRMT.log. to laserRMT.log

b838826 12 months ago

8.34 kB

	Downloading shards: 100% 3/3 [00:41<00:00, 13.87s/it]
	Loading checkpoint shards: 100% 3/3 [00:07<00:00, 2.53s/it]
	generation_config.json: 100% 115/115 [00:00<00:00, 575kB/s]
	tokenizer_config.json: 100% 1.60k/1.60k [00:00<00:00, 8.48MB/s]
	tokenizer.model: 100% 493k/493k [00:00<00:00, 22.9MB/s]
	tokenizer.json: 100% 1.80M/1.80M [00:00<00:00, 7.43MB/s]
	added_tokens.json: 100% 51.0/51.0 [00:00<00:00, 283kB/s]
	special_tokens_map.json: 100% 420/420 [00:00<00:00, 1.74MB/s]
	Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
	Reconstructing layer: model.layers.25.mlp.down_proj
	Reduced from torch.Size([4096]) to 3607
	Layer mlp.down_proj_25 has already been modified. Skipping.
	Restored original weights for layer: model.layers.25.mlp.down_proj
	Reconstructing layer: model.layers.25.mlp.down_proj
	Reduced from torch.Size([4096]) to 3607
	Restored original weights for layer: model.layers.25.mlp.down_proj
	['.31.', '.30.', '.29.', '.28.', '.27.', '.26.', '.25.', '.24.', '.23.', '.22.', '.21.', '.20.', '.19.', '.18.', '.17.', '.16.', '.15.', '.14.', '.13.', '.12.', '.11.', '.10.', '.9.', '.8.', '.7.', '.6.', '.5.', '.4.', '.3.', '.2.', '.1.', '.0.']
	Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
	avg_loss = 2.1474520114478235: 100% 871/871 [00:46<00:00, 18.55it/s]
	/usr/local/lib/python3.10/dist-packages/huggingface_hub/repocard.py:105: UserWarning: Repo card metadata block was not found. Setting CardData to empty.
	warnings.warn("Repo card metadata block was not found. Setting CardData to empty.")
	Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
	avg_loss = 9.703152929898351: 100% 256/256 [00:13<00:00, 18.83it/s]
	Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
	avg_loss = 13.355979550516967: 100% 264/264 [00:14<00:00, 18.66it/s]
	==================================================
	The initial perplexity of the model is 12.614558219909668
	==================================================
	Reconstructing layer: model.layers.31.mlp.down_proj
	Reduced from torch.Size([4096]) to 3753
	avg_loss = 2.150142833641832: 100% 871/871 [00:46<00:00, 18.75it/s]
	avg_loss = 9.714343913365155: 100% 256/256 [00:13<00:00, 18.74it/s]
	avg_loss = 13.374103391260812: 100% 264/264 [00:14<00:00, 18.43it/s]
	Restored original weights for layer: model.layers.31.mlp.down_proj
	Reconstructing layer: model.layers.31.mlp.up_proj
	Reduced from torch.Size([4096]) to 3717
	avg_loss = 2.1734046262660063: 100% 871/871 [00:46<00:00, 18.57it/s]
	avg_loss = 9.82143080001697: 100% 256/256 [00:13<00:00, 18.57it/s]
	avg_loss = 13.477815985228077: 100% 264/264 [00:14<00:00, 18.20it/s]
	Restored original weights for layer: model.layers.31.mlp.up_proj
	Reconstructing layer: model.layers.31.self_attn.q_proj
	Reduced from torch.Size([4096]) to 818
	avg_loss = 2.148138916040808: 100% 871/871 [00:46<00:00, 18.53it/s]
	avg_loss = 9.705221582669765: 100% 256/256 [00:13<00:00, 18.62it/s]
	avg_loss = 13.35540055280382: 100% 264/264 [00:14<00:00, 18.71it/s]
	**************************************************
	Improved perplexity found: 12.613171577453613 for layer self_attn.q_proj .31.. Total modifications is 1
	**************************************************
	Reconstructing layer: model.layers.31.self_attn.k_proj
	Reduced from torch.Size([1024]) to 524
	avg_loss = 2.1553964071514686: 100% 871/871 [00:46<00:00, 18.71it/s]
	avg_loss = 9.734999645967036: 100% 256/256 [00:13<00:00, 18.84it/s]
	avg_loss = 13.383289175954731: 100% 264/264 [00:14<00:00, 18.51it/s]
	Restored original weights for layer: model.layers.31.self_attn.k_proj
	Reconstructing layer: model.layers.31.self_attn.v_proj
	Reduced from torch.Size([1024]) to 846
	avg_loss = 2.1430855287339465: 100% 871/871 [00:46<00:00, 18.78it/s]
	avg_loss = 9.666598222218454: 100% 256/256 [00:13<00:00, 18.74it/s]
	avg_loss = 13.313674368641593: 100% 264/264 [00:14<00:00, 18.69it/s]
	**************************************************
	Improved perplexity found: 12.513681411743164 for layer self_attn.v_proj .31.. Total modifications is 2
	**************************************************
	Reconstructing layer: model.layers.31.self_attn.o_proj
	Reduced from torch.Size([4096]) to 834
	avg_loss = 2.1483869746960402: 100% 871/871 [00:47<00:00, 18.46it/s]
	avg_loss = 9.686229056213051: 100% 256/256 [00:13<00:00, 18.78it/s]
	avg_loss = 13.344844787861362: 100% 264/264 [00:14<00:00, 18.56it/s]
	Restored original weights for layer: model.layers.31.self_attn.o_proj
	Reconstructing layer: model.layers.30.mlp.down_proj
	Reduced from torch.Size([4096]) to 3770
	avg_loss = 2.1505854418576105: 100% 871/871 [00:47<00:00, 18.34it/s]
	avg_loss = 9.6962159560062: 100% 256/256 [00:13<00:00, 18.63it/s]
	avg_loss = 13.353956826256983: 100% 264/264 [00:14<00:00, 18.49it/s]
	Restored original weights for layer: model.layers.30.mlp.down_proj
	Reconstructing layer: model.layers.30.mlp.up_proj
	Reduced from torch.Size([4096]) to 3787
	avg_loss = 2.148582770547965: 100% 871/871 [00:47<00:00, 18.34it/s]
	avg_loss = 9.686316559556872: 100% 256/256 [00:13<00:00, 18.59it/s]
	avg_loss = 13.34067751738158: 100% 264/264 [00:14<00:00, 18.81it/s]
	Restored original weights for layer: model.layers.30.mlp.up_proj
	Reconstructing layer: model.layers.30.self_attn.q_proj
	Reduced from torch.Size([4096]) to 819
	avg_loss = 2.1425534111760927: 100% 871/871 [00:47<00:00, 18.40it/s]
	avg_loss = 9.664284548722208: 100% 256/256 [00:13<00:00, 18.49it/s]
	avg_loss = 13.309857179721197: 100% 264/264 [00:14<00:00, 18.63it/s]
	**************************************************
	Improved perplexity found: 12.504617691040039 for layer self_attn.q_proj .30.. Total modifications is 3
	**************************************************
	Reconstructing layer: model.layers.30.self_attn.k_proj
	Reduced from torch.Size([1024]) to 524
	avg_loss = 2.1449567824088884: 100% 871/871 [00:47<00:00, 18.51it/s]
	avg_loss = 9.675114367622882: 100% 256/256 [00:13<00:00, 18.56it/s]
	avg_loss = 13.32237600783507: 100% 264/264 [00:14<00:00, 18.72it/s]
	Restored original weights for layer: model.layers.30.self_attn.k_proj
	Reconstructing layer: model.layers.30.self_attn.v_proj
	Reduced from torch.Size([1024]) to 812
	avg_loss = 2.155356107294628: 100% 871/871 [00:47<00:00, 18.48it/s]
	avg_loss = 9.7138080005534: 100% 256/256 [00:13<00:00, 18.37it/s]
	avg_loss = 13.366635067444859: 100% 264/264 [00:14<00:00, 18.33it/s]
	Restored original weights for layer: model.layers.30.self_attn.v_proj
	Reconstructing layer: model.layers.30.self_attn.o_proj
	Reduced from torch.Size([4096]) to 859
	avg_loss = 2.146158002821641: 100% 871/871 [00:47<00:00, 18.33it/s]
	avg_loss = 9.676836102735251: 100% 256/256 [00:13<00:00, 18.43it/s]
	avg_loss = 13.318221795287998: 100% 264/264 [00:14<00:00, 18.33it/s]
	Restored original weights for layer: model.layers.30.self_attn.o_proj
	Reconstructing layer: model.layers.29.mlp.down_proj
	Reduced from torch.Size([4096]) to 3763
	avg_loss = 2.1450509054652587: 100% 871/871 [00:47<00:00, 18.35it/s]
	avg_loss = 9.6743658403866: 100% 256/256 [00:14<00:00, 18.21it/s]
	avg_loss = 13.321742536895202: 100% 264/264 [00:14<00:00, 18.19it/s]
	Restored original weights for layer: model.layers.29.mlp.down_proj
	Reconstructing layer: model.layers.29.mlp.up_proj
	Reduced from torch.Size([4096]) to 3828
	avg_loss = 2.1408350525165125: 100% 871/871 [00:47<00:00, 18.21it/s]
	avg_loss = 9.65894997306168: 100% 256/256 [00:14<00:00, 18.26it/s]
	avg_loss = 13.306687997146087: 100% 264/264 [00:14<00:00, 18.31it/s]
	**************************************************
	Improved perplexity found: 12.497097969055176 for layer mlp.up_proj .29.. Total modifications is 4
	**************************************************
	Reconstructing layer: model.layers.29.self_attn.q_proj
	Reduced from torch.Size([4096]) to 803
	avg_loss = 2.1367383972238043: 100% 871/871 [00:47<00:00, 18.18it/s]
	avg_loss = 9.641230288892984: 100% 256/256 [00:13<00:00, 18.36it/s]
	avg_loss = 13.289274643767964: 100% 264/264 [00:14<00:00, 18.47it/s]
	**************************************************
	Improved perplexity found: 12.455863952636719 for layer self_attn.q_proj .29.. Total modifications is 5
	**************************************************