xzuyn commited on
Commit
3bc669f
·
1 Parent(s): 70a0510

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +165 -0
README.md ADDED
@@ -0,0 +1,165 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ This is a second strip test. The goal is to strip GPT-2-XL down to the same amount as GPT-2-Small to see what happens.
2
+
3
+ These are the only layers/tensors left (I'm unsure of the terminology for these):
4
+ ```
5
+ wte.weight
6
+ wpe.weight
7
+ h.0.ln_1.weight
8
+ h.0.ln_1.bias
9
+ h.0.attn.bias
10
+ h.0.attn.c_attn.weight
11
+ h.0.attn.c_attn.bias
12
+ h.0.attn.c_proj.weight
13
+ h.0.attn.c_proj.bias
14
+ h.0.ln_2.weight
15
+ h.0.ln_2.bias
16
+ h.0.mlp.c_fc.weight
17
+ h.0.mlp.c_fc.bias
18
+ h.0.mlp.c_proj.weight
19
+ h.0.mlp.c_proj.bias
20
+ h.1.ln_1.weight
21
+ h.1.ln_1.bias
22
+ h.1.attn.bias
23
+ h.1.attn.c_attn.weight
24
+ h.1.attn.c_attn.bias
25
+ h.1.attn.c_proj.weight
26
+ h.1.attn.c_proj.bias
27
+ h.1.ln_2.weight
28
+ h.1.ln_2.bias
29
+ h.1.mlp.c_fc.weight
30
+ h.1.mlp.c_fc.bias
31
+ h.1.mlp.c_proj.weight
32
+ h.1.mlp.c_proj.bias
33
+ h.2.ln_1.weight
34
+ h.2.ln_1.bias
35
+ h.2.attn.bias
36
+ h.2.attn.c_attn.weight
37
+ h.2.attn.c_attn.bias
38
+ h.2.attn.c_proj.weight
39
+ h.2.attn.c_proj.bias
40
+ h.2.ln_2.weight
41
+ h.2.ln_2.bias
42
+ h.2.mlp.c_fc.weight
43
+ h.2.mlp.c_fc.bias
44
+ h.2.mlp.c_proj.weight
45
+ h.2.mlp.c_proj.bias
46
+ h.3.ln_1.weight
47
+ h.3.ln_1.bias
48
+ h.3.attn.bias
49
+ h.3.attn.c_attn.weight
50
+ h.3.attn.c_attn.bias
51
+ h.3.attn.c_proj.weight
52
+ h.3.attn.c_proj.bias
53
+ h.3.ln_2.weight
54
+ h.3.ln_2.bias
55
+ h.3.mlp.c_fc.weight
56
+ h.3.mlp.c_fc.bias
57
+ h.3.mlp.c_proj.weight
58
+ h.3.mlp.c_proj.bias
59
+ h.4.ln_1.weight
60
+ h.4.ln_1.bias
61
+ h.4.attn.bias
62
+ h.4.attn.c_attn.weight
63
+ h.4.attn.c_attn.bias
64
+ h.4.attn.c_proj.weight
65
+ h.4.attn.c_proj.bias
66
+ h.4.ln_2.weight
67
+ h.4.ln_2.bias
68
+ h.4.mlp.c_fc.weight
69
+ h.4.mlp.c_fc.bias
70
+ h.4.mlp.c_proj.weight
71
+ h.4.mlp.c_proj.bias
72
+ h.5.ln_1.weight
73
+ h.5.ln_1.bias
74
+ h.5.attn.bias
75
+ h.5.attn.c_attn.weight
76
+ h.5.attn.c_attn.bias
77
+ h.5.attn.c_proj.weight
78
+ h.5.attn.c_proj.bias
79
+ h.5.ln_2.weight
80
+ h.5.ln_2.bias
81
+ h.5.mlp.c_fc.weight
82
+ h.5.mlp.c_fc.bias
83
+ h.5.mlp.c_proj.weight
84
+ h.5.mlp.c_proj.bias
85
+ h.6.ln_1.weight
86
+ h.6.ln_1.bias
87
+ h.6.attn.bias
88
+ h.6.attn.c_attn.weight
89
+ h.6.attn.c_attn.bias
90
+ h.6.attn.c_proj.weight
91
+ h.6.attn.c_proj.bias
92
+ h.6.ln_2.weight
93
+ h.6.ln_2.bias
94
+ h.6.mlp.c_fc.weight
95
+ h.6.mlp.c_fc.bias
96
+ h.6.mlp.c_proj.weight
97
+ h.6.mlp.c_proj.bias
98
+ h.7.ln_1.weight
99
+ h.7.ln_1.bias
100
+ h.7.attn.bias
101
+ h.7.attn.c_attn.weight
102
+ h.7.attn.c_attn.bias
103
+ h.7.attn.c_proj.weight
104
+ h.7.attn.c_proj.bias
105
+ h.7.ln_2.weight
106
+ h.7.ln_2.bias
107
+ h.7.mlp.c_fc.weight
108
+ h.7.mlp.c_fc.bias
109
+ h.7.mlp.c_proj.weight
110
+ h.7.mlp.c_proj.bias
111
+ h.8.ln_1.weight
112
+ h.8.ln_1.bias
113
+ h.8.attn.bias
114
+ h.8.attn.c_attn.weight
115
+ h.8.attn.c_attn.bias
116
+ h.8.attn.c_proj.weight
117
+ h.8.attn.c_proj.bias
118
+ h.8.ln_2.weight
119
+ h.8.ln_2.bias
120
+ h.8.mlp.c_fc.weight
121
+ h.8.mlp.c_fc.bias
122
+ h.8.mlp.c_proj.weight
123
+ h.8.mlp.c_proj.bias
124
+ h.9.ln_1.weight
125
+ h.9.ln_1.bias
126
+ h.9.attn.bias
127
+ h.9.attn.c_attn.weight
128
+ h.9.attn.c_attn.bias
129
+ h.9.attn.c_proj.weight
130
+ h.9.attn.c_proj.bias
131
+ h.9.ln_2.weight
132
+ h.9.ln_2.bias
133
+ h.9.mlp.c_fc.weight
134
+ h.9.mlp.c_fc.bias
135
+ h.9.mlp.c_proj.weight
136
+ h.9.mlp.c_proj.bias
137
+ h.10.ln_1.weight
138
+ h.10.ln_1.bias
139
+ h.10.attn.bias
140
+ h.10.attn.c_attn.weight
141
+ h.10.attn.c_attn.bias
142
+ h.10.attn.c_proj.weight
143
+ h.10.attn.c_proj.bias
144
+ h.10.ln_2.weight
145
+ h.10.ln_2.bias
146
+ h.10.mlp.c_fc.weight
147
+ h.10.mlp.c_fc.bias
148
+ h.10.mlp.c_proj.weight
149
+ h.10.mlp.c_proj.bias
150
+ h.11.ln_1.weight
151
+ h.11.ln_1.bias
152
+ h.11.attn.bias
153
+ h.11.attn.c_attn.weight
154
+ h.11.attn.c_attn.bias
155
+ h.11.attn.c_proj.weight
156
+ h.11.attn.c_proj.bias
157
+ h.11.ln_2.weight
158
+ h.11.ln_2.bias
159
+ h.11.mlp.c_fc.weight
160
+ h.11.mlp.c_fc.bias
161
+ h.11.mlp.c_proj.weight
162
+ h.11.mlp.c_proj.bias
163
+ ln_f.weight
164
+ ln_f.bias
165
+ ```