File size: 8,344 Bytes
d2512d4 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 |
Downloading shards: 100% 3/3 [00:41<00:00, 13.87s/it]
Loading checkpoint shards: 100% 3/3 [00:07<00:00, 2.53s/it]
generation_config.json: 100% 115/115 [00:00<00:00, 575kB/s]
tokenizer_config.json: 100% 1.60k/1.60k [00:00<00:00, 8.48MB/s]
tokenizer.model: 100% 493k/493k [00:00<00:00, 22.9MB/s]
tokenizer.json: 100% 1.80M/1.80M [00:00<00:00, 7.43MB/s]
added_tokens.json: 100% 51.0/51.0 [00:00<00:00, 283kB/s]
special_tokens_map.json: 100% 420/420 [00:00<00:00, 1.74MB/s]
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Reconstructing layer: model.layers.25.mlp.down_proj
Reduced from torch.Size([4096]) to 3607
Layer mlp.down_proj_25 has already been modified. Skipping.
Restored original weights for layer: model.layers.25.mlp.down_proj
Reconstructing layer: model.layers.25.mlp.down_proj
Reduced from torch.Size([4096]) to 3607
Restored original weights for layer: model.layers.25.mlp.down_proj
['.31.', '.30.', '.29.', '.28.', '.27.', '.26.', '.25.', '.24.', '.23.', '.22.', '.21.', '.20.', '.19.', '.18.', '.17.', '.16.', '.15.', '.14.', '.13.', '.12.', '.11.', '.10.', '.9.', '.8.', '.7.', '.6.', '.5.', '.4.', '.3.', '.2.', '.1.', '.0.']
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
avg_loss = 2.1474520114478235: 100% 871/871 [00:46<00:00, 18.55it/s]
/usr/local/lib/python3.10/dist-packages/huggingface_hub/repocard.py:105: UserWarning: Repo card metadata block was not found. Setting CardData to empty.
warnings.warn("Repo card metadata block was not found. Setting CardData to empty.")
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
avg_loss = 9.703152929898351: 100% 256/256 [00:13<00:00, 18.83it/s]
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
avg_loss = 13.355979550516967: 100% 264/264 [00:14<00:00, 18.66it/s]
==================================================
The initial perplexity of the model is 12.614558219909668
==================================================
Reconstructing layer: model.layers.31.mlp.down_proj
Reduced from torch.Size([4096]) to 3753
avg_loss = 2.150142833641832: 100% 871/871 [00:46<00:00, 18.75it/s]
avg_loss = 9.714343913365155: 100% 256/256 [00:13<00:00, 18.74it/s]
avg_loss = 13.374103391260812: 100% 264/264 [00:14<00:00, 18.43it/s]
Restored original weights for layer: model.layers.31.mlp.down_proj
Reconstructing layer: model.layers.31.mlp.up_proj
Reduced from torch.Size([4096]) to 3717
avg_loss = 2.1734046262660063: 100% 871/871 [00:46<00:00, 18.57it/s]
avg_loss = 9.82143080001697: 100% 256/256 [00:13<00:00, 18.57it/s]
avg_loss = 13.477815985228077: 100% 264/264 [00:14<00:00, 18.20it/s]
Restored original weights for layer: model.layers.31.mlp.up_proj
Reconstructing layer: model.layers.31.self_attn.q_proj
Reduced from torch.Size([4096]) to 818
avg_loss = 2.148138916040808: 100% 871/871 [00:46<00:00, 18.53it/s]
avg_loss = 9.705221582669765: 100% 256/256 [00:13<00:00, 18.62it/s]
avg_loss = 13.35540055280382: 100% 264/264 [00:14<00:00, 18.71it/s]
**************************************************
Improved perplexity found: 12.613171577453613 for layer self_attn.q_proj .31.. Total modifications is 1
**************************************************
Reconstructing layer: model.layers.31.self_attn.k_proj
Reduced from torch.Size([1024]) to 524
avg_loss = 2.1553964071514686: 100% 871/871 [00:46<00:00, 18.71it/s]
avg_loss = 9.734999645967036: 100% 256/256 [00:13<00:00, 18.84it/s]
avg_loss = 13.383289175954731: 100% 264/264 [00:14<00:00, 18.51it/s]
Restored original weights for layer: model.layers.31.self_attn.k_proj
Reconstructing layer: model.layers.31.self_attn.v_proj
Reduced from torch.Size([1024]) to 846
avg_loss = 2.1430855287339465: 100% 871/871 [00:46<00:00, 18.78it/s]
avg_loss = 9.666598222218454: 100% 256/256 [00:13<00:00, 18.74it/s]
avg_loss = 13.313674368641593: 100% 264/264 [00:14<00:00, 18.69it/s]
**************************************************
Improved perplexity found: 12.513681411743164 for layer self_attn.v_proj .31.. Total modifications is 2
**************************************************
Reconstructing layer: model.layers.31.self_attn.o_proj
Reduced from torch.Size([4096]) to 834
avg_loss = 2.1483869746960402: 100% 871/871 [00:47<00:00, 18.46it/s]
avg_loss = 9.686229056213051: 100% 256/256 [00:13<00:00, 18.78it/s]
avg_loss = 13.344844787861362: 100% 264/264 [00:14<00:00, 18.56it/s]
Restored original weights for layer: model.layers.31.self_attn.o_proj
Reconstructing layer: model.layers.30.mlp.down_proj
Reduced from torch.Size([4096]) to 3770
avg_loss = 2.1505854418576105: 100% 871/871 [00:47<00:00, 18.34it/s]
avg_loss = 9.6962159560062: 100% 256/256 [00:13<00:00, 18.63it/s]
avg_loss = 13.353956826256983: 100% 264/264 [00:14<00:00, 18.49it/s]
Restored original weights for layer: model.layers.30.mlp.down_proj
Reconstructing layer: model.layers.30.mlp.up_proj
Reduced from torch.Size([4096]) to 3787
avg_loss = 2.148582770547965: 100% 871/871 [00:47<00:00, 18.34it/s]
avg_loss = 9.686316559556872: 100% 256/256 [00:13<00:00, 18.59it/s]
avg_loss = 13.34067751738158: 100% 264/264 [00:14<00:00, 18.81it/s]
Restored original weights for layer: model.layers.30.mlp.up_proj
Reconstructing layer: model.layers.30.self_attn.q_proj
Reduced from torch.Size([4096]) to 819
avg_loss = 2.1425534111760927: 100% 871/871 [00:47<00:00, 18.40it/s]
avg_loss = 9.664284548722208: 100% 256/256 [00:13<00:00, 18.49it/s]
avg_loss = 13.309857179721197: 100% 264/264 [00:14<00:00, 18.63it/s]
**************************************************
Improved perplexity found: 12.504617691040039 for layer self_attn.q_proj .30.. Total modifications is 3
**************************************************
Reconstructing layer: model.layers.30.self_attn.k_proj
Reduced from torch.Size([1024]) to 524
avg_loss = 2.1449567824088884: 100% 871/871 [00:47<00:00, 18.51it/s]
avg_loss = 9.675114367622882: 100% 256/256 [00:13<00:00, 18.56it/s]
avg_loss = 13.32237600783507: 100% 264/264 [00:14<00:00, 18.72it/s]
Restored original weights for layer: model.layers.30.self_attn.k_proj
Reconstructing layer: model.layers.30.self_attn.v_proj
Reduced from torch.Size([1024]) to 812
avg_loss = 2.155356107294628: 100% 871/871 [00:47<00:00, 18.48it/s]
avg_loss = 9.7138080005534: 100% 256/256 [00:13<00:00, 18.37it/s]
avg_loss = 13.366635067444859: 100% 264/264 [00:14<00:00, 18.33it/s]
Restored original weights for layer: model.layers.30.self_attn.v_proj
Reconstructing layer: model.layers.30.self_attn.o_proj
Reduced from torch.Size([4096]) to 859
avg_loss = 2.146158002821641: 100% 871/871 [00:47<00:00, 18.33it/s]
avg_loss = 9.676836102735251: 100% 256/256 [00:13<00:00, 18.43it/s]
avg_loss = 13.318221795287998: 100% 264/264 [00:14<00:00, 18.33it/s]
Restored original weights for layer: model.layers.30.self_attn.o_proj
Reconstructing layer: model.layers.29.mlp.down_proj
Reduced from torch.Size([4096]) to 3763
avg_loss = 2.1450509054652587: 100% 871/871 [00:47<00:00, 18.35it/s]
avg_loss = 9.6743658403866: 100% 256/256 [00:14<00:00, 18.21it/s]
avg_loss = 13.321742536895202: 100% 264/264 [00:14<00:00, 18.19it/s]
Restored original weights for layer: model.layers.29.mlp.down_proj
Reconstructing layer: model.layers.29.mlp.up_proj
Reduced from torch.Size([4096]) to 3828
avg_loss = 2.1408350525165125: 100% 871/871 [00:47<00:00, 18.21it/s]
avg_loss = 9.65894997306168: 100% 256/256 [00:14<00:00, 18.26it/s]
avg_loss = 13.306687997146087: 100% 264/264 [00:14<00:00, 18.31it/s]
**************************************************
Improved perplexity found: 12.497097969055176 for layer mlp.up_proj .29.. Total modifications is 4
**************************************************
Reconstructing layer: model.layers.29.self_attn.q_proj
Reduced from torch.Size([4096]) to 803
avg_loss = 2.1367383972238043: 100% 871/871 [00:47<00:00, 18.18it/s]
avg_loss = 9.641230288892984: 100% 256/256 [00:13<00:00, 18.36it/s]
avg_loss = 13.289274643767964: 100% 264/264 [00:14<00:00, 18.47it/s]
**************************************************
Improved perplexity found: 12.455863952636719 for layer self_attn.q_proj .29.. Total modifications is 5
************************************************** |