RichardErkhov commited on
Commit
19a32ae
·
verified ·
1 Parent(s): 2af1e5e

uploaded readme

Browse files
Files changed (1) hide show
  1. README.md +255 -0
README.md ADDED
@@ -0,0 +1,255 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ Quantization made by Richard Erkhov.
2
+
3
+ [Github](https://github.com/RichardErkhov)
4
+
5
+ [Discord](https://discord.gg/pvy7H8DZMG)
6
+
7
+ [Request more models](https://github.com/RichardErkhov/quant_request)
8
+
9
+
10
+ gpt_bigcode-santacoder - bnb 8bits
11
+ - Model creator: https://huggingface.co/bigcode/
12
+ - Original model: https://huggingface.co/bigcode/gpt_bigcode-santacoder/
13
+
14
+
15
+
16
+
17
+ Original model description:
18
+ ---
19
+ license: openrail
20
+ datasets:
21
+ - bigcode/the-stack
22
+ language:
23
+ - code
24
+ programming_language:
25
+ - Java
26
+ - JavaScript
27
+ - Python
28
+ pipeline_tag: text-generation
29
+ inference: false
30
+
31
+ model-index:
32
+ - name: SantaCoder
33
+ results:
34
+ - task:
35
+ type: text-generation
36
+ dataset:
37
+ type: nuprl/MultiPL-E
38
+ name: MultiPL HumanEval (Python)
39
+ metrics:
40
+ - name: pass@1
41
+ type: pass@1
42
+ value: 0.18
43
+ verified: false
44
+ - name: pass@10
45
+ type: pass@10
46
+ value: 0.29
47
+ verified: false
48
+ - name: pass@100
49
+ type: pass@100
50
+ value: 0.49
51
+ verified: false
52
+ - task:
53
+ type: text-generation
54
+ dataset:
55
+ type: nuprl/MultiPL-E
56
+ name: MultiPL MBPP (Python)
57
+ metrics:
58
+ - name: pass@1
59
+ type: pass@1
60
+ value: 0.35
61
+ verified: false
62
+ - name: pass@10
63
+ type: pass@10
64
+ value: 0.58
65
+ verified: false
66
+ - name: pass@100
67
+ type: pass@100
68
+ value: 0.77
69
+ verified: false
70
+ - task:
71
+ type: text-generation
72
+ dataset:
73
+ type: nuprl/MultiPL-E
74
+ name: MultiPL HumanEval (JavaScript)
75
+ metrics:
76
+ - name: pass@1
77
+ type: pass@1
78
+ value: 0.16
79
+ verified: false
80
+ - name: pass@10
81
+ type: pass@10
82
+ value: 0.27
83
+ verified: false
84
+ - name: pass@100
85
+ type: pass@100
86
+ value: 0.47
87
+ verified: false
88
+ - task:
89
+ type: text-generation
90
+ dataset:
91
+ type: nuprl/MultiPL-E
92
+ name: MultiPL MBPP (Javascript)
93
+ metrics:
94
+ - name: pass@1
95
+ type: pass@1
96
+ value: 0.28
97
+ verified: false
98
+ - name: pass@10
99
+ type: pass@10
100
+ value: 0.51
101
+ verified: false
102
+ - name: pass@100
103
+ type: pass@100
104
+ value: 0.70
105
+ verified: false
106
+ - task:
107
+ type: text-generation
108
+ dataset:
109
+ type: nuprl/MultiPL-E
110
+ name: MultiPL HumanEval (Java)
111
+ metrics:
112
+ - name: pass@1
113
+ type: pass@1
114
+ value: 0.15
115
+ verified: false
116
+ - name: pass@10
117
+ type: pass@10
118
+ value: 0.26
119
+ verified: false
120
+ - name: pass@100
121
+ type: pass@100
122
+ value: 0.41
123
+ verified: false
124
+ - task:
125
+ type: text-generation
126
+ dataset:
127
+ type: nuprl/MultiPL-E
128
+ name: MultiPL MBPP (Java)
129
+ metrics:
130
+ - name: pass@1
131
+ type: pass@1
132
+ value: 0.28
133
+ verified: false
134
+ - name: pass@10
135
+ type: pass@10
136
+ value: 0.44
137
+ verified: false
138
+ - name: pass@100
139
+ type: pass@100
140
+ value: 0.59
141
+ verified: false
142
+ - task:
143
+ type: text-generation
144
+ dataset:
145
+ type: loubnabnl/humaneval_infilling
146
+ name: HumanEval FIM (Python)
147
+ metrics:
148
+ - name: single_line
149
+ type: exact_match
150
+ value: 0.44
151
+ verified: false
152
+ - task:
153
+ type: text-generation
154
+ dataset:
155
+ type: nuprl/MultiPL-E
156
+ name: MultiPL HumanEval FIM (Java)
157
+ metrics:
158
+ - name: single_line
159
+ type: exact_match
160
+ value: 0.62
161
+ verified: false
162
+ - task:
163
+ type: text-generation
164
+ dataset:
165
+ type: nuprl/MultiPL-E
166
+ name: MultiPL HumanEval FIM (JavaScript)
167
+ metrics:
168
+ - name: single_line
169
+ type: exact_match
170
+ value: 0.60
171
+ verified: false
172
+ - task:
173
+ type: text-generation
174
+ dataset:
175
+ type: code_x_glue_ct_code_to_text
176
+ name: CodeXGLUE code-to-text (Python)
177
+ metrics:
178
+ - name: BLEU
179
+ type: bleu
180
+ value: 18.13
181
+ verified: false
182
+ ---
183
+
184
+ # SantaCoder
185
+
186
+ ![banner](https://huggingface.co/datasets/bigcode/admin/resolve/main/banner.png)
187
+
188
+ Play with the model on the [SantaCoder Space Demo](https://huggingface.co/spaces/bigcode/santacoder-demo).
189
+
190
+ # Table of Contents
191
+
192
+ 1. [Model Summary](#model-summary)
193
+ 2. [Use](#use)
194
+ 3. [Limitations](#limitations)
195
+ 4. [Training](#training)
196
+ 5. [License](#license)
197
+ 6. [Citation](#citation)
198
+
199
+ # Model Summary
200
+
201
+ This is the same model as [SantaCoder](https://huggingface.co/bigcode/santacoder) but it can be loaded with transformers >=4.28.1 to use the GPTBigCode architecture.
202
+ We refer the reader to the [SantaCoder model page](https://huggingface.co/bigcode/santacoder) for full documentation about this model
203
+
204
+
205
+ - **Repository:** [bigcode/Megatron-LM](https://github.com/bigcode-project/Megatron-LM)
206
+ - **Project Website:** [bigcode-project.org](www.bigcode-project.org)
207
+ - **Paper:** [🎅SantaCoder: Don't reach for the stars!🌟](https://t.co/YV3pzUbYOr)
208
+ - **Point of Contact:** [[email protected]](mailto:[email protected])
209
+ - **Languages:** Python, Java, and JavaScript
210
+
211
+ There are two versions (branches) of the model:
212
+ * `main`: Uses the `gpt_bigcode` model. [Requires the bigcode fork of transformers](https://github.com/bigcode-project/transformers).
213
+ * `main_custom`: Packaged with its modeling code. Requires `transformers>=4.27`.
214
+ Alternatively, it can run on older versions by setting the configuration parameter `activation_function = "gelu_pytorch_tanh"`.
215
+
216
+ # Use
217
+
218
+ ## Intended use
219
+
220
+ The model was trained on GitHub code. As such it is _not_ an instruction model and commands like "Write a function that computes the square root." do not work well.
221
+ You should phrase commands like they occur in source code such as comments (e.g. `# the following function computes the sqrt`) or write a function signature and docstring and let the model complete the function body.
222
+
223
+ ### Attribution & Other Requirements
224
+
225
+ The pretraining dataset of the model was filtered for permissive licenses only. Nevertheless, the model can generate source code verbatim from the dataset. The code's license might require attribution and/or other specific requirements that must be respected. We provide a [search index](https://huggingface.co/spaces/bigcode/santacoder-search) that let's you search through the pretraining data to identify where generated code came from and apply the proper attribution to your code.
226
+
227
+ # Limitations
228
+
229
+ The model has been trained on source code in Python, Java, and JavaScript. The predominant language in source is English although other languages are also present. As such the model is capable to generate code snippets provided some context but the generated code is not guaranteed to work as intended. It can be inefficient, contain bugs or exploits.
230
+
231
+ # Training
232
+
233
+ ## Model
234
+
235
+ - **Architecture:** GPT-2 model with multi-query attention and Fill-in-the-Middle objective
236
+ - **Pretraining steps:** 600K
237
+ - **Pretraining tokens:** 236 billion
238
+ - **Precision:** float16
239
+
240
+ ## Hardware
241
+
242
+ - **GPUs:** 96 Tesla V100
243
+ - **Training time:** 6.2 days
244
+ - **Total FLOPS:** 2.1 x 10e21
245
+
246
+ ## Software
247
+
248
+ - **Orchestration:** [Megatron-LM](https://github.com/bigcode-project/Megatron-LM)
249
+ - **Neural networks:** [PyTorch](https://github.com/pytorch/pytorch)
250
+ - **FP16 if applicable:** [apex](https://github.com/NVIDIA/apex)
251
+
252
+ # License
253
+ The model is licenses under the CodeML Open RAIL-M v0.1 license. You can find the full license [here](https://huggingface.co/spaces/bigcode/license).
254
+
255
+