|
Downloading shards: 100% 3/3 [00:41<00:00, 13.87s/it] |
|
Loading checkpoint shards: 100% 3/3 [00:07<00:00, 2.53s/it] |
|
generation_config.json: 100% 115/115 [00:00<00:00, 575kB/s] |
|
tokenizer_config.json: 100% 1.60k/1.60k [00:00<00:00, 8.48MB/s] |
|
tokenizer.model: 100% 493k/493k [00:00<00:00, 22.9MB/s] |
|
tokenizer.json: 100% 1.80M/1.80M [00:00<00:00, 7.43MB/s] |
|
added_tokens.json: 100% 51.0/51.0 [00:00<00:00, 283kB/s] |
|
special_tokens_map.json: 100% 420/420 [00:00<00:00, 1.74MB/s] |
|
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. |
|
Reconstructing layer: model.layers.25.mlp.down_proj |
|
Reduced from torch.Size([4096]) to 3607 |
|
Layer mlp.down_proj_25 has already been modified. Skipping. |
|
Restored original weights for layer: model.layers.25.mlp.down_proj |
|
Reconstructing layer: model.layers.25.mlp.down_proj |
|
Reduced from torch.Size([4096]) to 3607 |
|
Restored original weights for layer: model.layers.25.mlp.down_proj |
|
['.31.', '.30.', '.29.', '.28.', '.27.', '.26.', '.25.', '.24.', '.23.', '.22.', '.21.', '.20.', '.19.', '.18.', '.17.', '.16.', '.15.', '.14.', '.13.', '.12.', '.11.', '.10.', '.9.', '.8.', '.7.', '.6.', '.5.', '.4.', '.3.', '.2.', '.1.', '.0.'] |
|
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. |
|
avg_loss = 2.1474520114478235: 100% 871/871 [00:46<00:00, 18.55it/s] |
|
/usr/local/lib/python3.10/dist-packages/huggingface_hub/repocard.py:105: UserWarning: Repo card metadata block was not found. Setting CardData to empty. |
|
warnings.warn("Repo card metadata block was not found. Setting CardData to empty.") |
|
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. |
|
avg_loss = 9.703152929898351: 100% 256/256 [00:13<00:00, 18.83it/s] |
|
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. |
|
avg_loss = 13.355979550516967: 100% 264/264 [00:14<00:00, 18.66it/s] |
|
================================================== |
|
The initial perplexity of the model is 12.614558219909668 |
|
================================================== |
|
Reconstructing layer: model.layers.31.mlp.down_proj |
|
Reduced from torch.Size([4096]) to 3753 |
|
avg_loss = 2.150142833641832: 100% 871/871 [00:46<00:00, 18.75it/s] |
|
avg_loss = 9.714343913365155: 100% 256/256 [00:13<00:00, 18.74it/s] |
|
avg_loss = 13.374103391260812: 100% 264/264 [00:14<00:00, 18.43it/s] |
|
Restored original weights for layer: model.layers.31.mlp.down_proj |
|
Reconstructing layer: model.layers.31.mlp.up_proj |
|
Reduced from torch.Size([4096]) to 3717 |
|
avg_loss = 2.1734046262660063: 100% 871/871 [00:46<00:00, 18.57it/s] |
|
avg_loss = 9.82143080001697: 100% 256/256 [00:13<00:00, 18.57it/s] |
|
avg_loss = 13.477815985228077: 100% 264/264 [00:14<00:00, 18.20it/s] |
|
Restored original weights for layer: model.layers.31.mlp.up_proj |
|
Reconstructing layer: model.layers.31.self_attn.q_proj |
|
Reduced from torch.Size([4096]) to 818 |
|
avg_loss = 2.148138916040808: 100% 871/871 [00:46<00:00, 18.53it/s] |
|
avg_loss = 9.705221582669765: 100% 256/256 [00:13<00:00, 18.62it/s] |
|
avg_loss = 13.35540055280382: 100% 264/264 [00:14<00:00, 18.71it/s] |
|
************************************************** |
|
Improved perplexity found: 12.613171577453613 for layer self_attn.q_proj .31.. Total modifications is 1 |
|
************************************************** |
|
Reconstructing layer: model.layers.31.self_attn.k_proj |
|
Reduced from torch.Size([1024]) to 524 |
|
avg_loss = 2.1553964071514686: 100% 871/871 [00:46<00:00, 18.71it/s] |
|
avg_loss = 9.734999645967036: 100% 256/256 [00:13<00:00, 18.84it/s] |
|
avg_loss = 13.383289175954731: 100% 264/264 [00:14<00:00, 18.51it/s] |
|
Restored original weights for layer: model.layers.31.self_attn.k_proj |
|
Reconstructing layer: model.layers.31.self_attn.v_proj |
|
Reduced from torch.Size([1024]) to 846 |
|
avg_loss = 2.1430855287339465: 100% 871/871 [00:46<00:00, 18.78it/s] |
|
avg_loss = 9.666598222218454: 100% 256/256 [00:13<00:00, 18.74it/s] |
|
avg_loss = 13.313674368641593: 100% 264/264 [00:14<00:00, 18.69it/s] |
|
************************************************** |
|
Improved perplexity found: 12.513681411743164 for layer self_attn.v_proj .31.. Total modifications is 2 |
|
************************************************** |
|
Reconstructing layer: model.layers.31.self_attn.o_proj |
|
Reduced from torch.Size([4096]) to 834 |
|
avg_loss = 2.1483869746960402: 100% 871/871 [00:47<00:00, 18.46it/s] |
|
avg_loss = 9.686229056213051: 100% 256/256 [00:13<00:00, 18.78it/s] |
|
avg_loss = 13.344844787861362: 100% 264/264 [00:14<00:00, 18.56it/s] |
|
Restored original weights for layer: model.layers.31.self_attn.o_proj |
|
Reconstructing layer: model.layers.30.mlp.down_proj |
|
Reduced from torch.Size([4096]) to 3770 |
|
avg_loss = 2.1505854418576105: 100% 871/871 [00:47<00:00, 18.34it/s] |
|
avg_loss = 9.6962159560062: 100% 256/256 [00:13<00:00, 18.63it/s] |
|
avg_loss = 13.353956826256983: 100% 264/264 [00:14<00:00, 18.49it/s] |
|
Restored original weights for layer: model.layers.30.mlp.down_proj |
|
Reconstructing layer: model.layers.30.mlp.up_proj |
|
Reduced from torch.Size([4096]) to 3787 |
|
avg_loss = 2.148582770547965: 100% 871/871 [00:47<00:00, 18.34it/s] |
|
avg_loss = 9.686316559556872: 100% 256/256 [00:13<00:00, 18.59it/s] |
|
avg_loss = 13.34067751738158: 100% 264/264 [00:14<00:00, 18.81it/s] |
|
Restored original weights for layer: model.layers.30.mlp.up_proj |
|
Reconstructing layer: model.layers.30.self_attn.q_proj |
|
Reduced from torch.Size([4096]) to 819 |
|
avg_loss = 2.1425534111760927: 100% 871/871 [00:47<00:00, 18.40it/s] |
|
avg_loss = 9.664284548722208: 100% 256/256 [00:13<00:00, 18.49it/s] |
|
avg_loss = 13.309857179721197: 100% 264/264 [00:14<00:00, 18.63it/s] |
|
************************************************** |
|
Improved perplexity found: 12.504617691040039 for layer self_attn.q_proj .30.. Total modifications is 3 |
|
************************************************** |
|
Reconstructing layer: model.layers.30.self_attn.k_proj |
|
Reduced from torch.Size([1024]) to 524 |
|
avg_loss = 2.1449567824088884: 100% 871/871 [00:47<00:00, 18.51it/s] |
|
avg_loss = 9.675114367622882: 100% 256/256 [00:13<00:00, 18.56it/s] |
|
avg_loss = 13.32237600783507: 100% 264/264 [00:14<00:00, 18.72it/s] |
|
Restored original weights for layer: model.layers.30.self_attn.k_proj |
|
Reconstructing layer: model.layers.30.self_attn.v_proj |
|
Reduced from torch.Size([1024]) to 812 |
|
avg_loss = 2.155356107294628: 100% 871/871 [00:47<00:00, 18.48it/s] |
|
avg_loss = 9.7138080005534: 100% 256/256 [00:13<00:00, 18.37it/s] |
|
avg_loss = 13.366635067444859: 100% 264/264 [00:14<00:00, 18.33it/s] |
|
Restored original weights for layer: model.layers.30.self_attn.v_proj |
|
Reconstructing layer: model.layers.30.self_attn.o_proj |
|
Reduced from torch.Size([4096]) to 859 |
|
avg_loss = 2.146158002821641: 100% 871/871 [00:47<00:00, 18.33it/s] |
|
avg_loss = 9.676836102735251: 100% 256/256 [00:13<00:00, 18.43it/s] |
|
avg_loss = 13.318221795287998: 100% 264/264 [00:14<00:00, 18.33it/s] |
|
Restored original weights for layer: model.layers.30.self_attn.o_proj |
|
Reconstructing layer: model.layers.29.mlp.down_proj |
|
Reduced from torch.Size([4096]) to 3763 |
|
avg_loss = 2.1450509054652587: 100% 871/871 [00:47<00:00, 18.35it/s] |
|
avg_loss = 9.6743658403866: 100% 256/256 [00:14<00:00, 18.21it/s] |
|
avg_loss = 13.321742536895202: 100% 264/264 [00:14<00:00, 18.19it/s] |
|
Restored original weights for layer: model.layers.29.mlp.down_proj |
|
Reconstructing layer: model.layers.29.mlp.up_proj |
|
Reduced from torch.Size([4096]) to 3828 |
|
avg_loss = 2.1408350525165125: 100% 871/871 [00:47<00:00, 18.21it/s] |
|
avg_loss = 9.65894997306168: 100% 256/256 [00:14<00:00, 18.26it/s] |
|
avg_loss = 13.306687997146087: 100% 264/264 [00:14<00:00, 18.31it/s] |
|
************************************************** |
|
Improved perplexity found: 12.497097969055176 for layer mlp.up_proj .29.. Total modifications is 4 |
|
************************************************** |
|
Reconstructing layer: model.layers.29.self_attn.q_proj |
|
Reduced from torch.Size([4096]) to 803 |
|
avg_loss = 2.1367383972238043: 100% 871/871 [00:47<00:00, 18.18it/s] |
|
avg_loss = 9.641230288892984: 100% 256/256 [00:13<00:00, 18.36it/s] |
|
avg_loss = 13.289274643767964: 100% 264/264 [00:14<00:00, 18.47it/s] |
|
************************************************** |
|
Improved perplexity found: 12.455863952636719 for layer self_attn.q_proj .29.. Total modifications is 5 |
|
************************************************** |