Update README.md
Browse files
README.md
CHANGED
@@ -41,7 +41,20 @@ library_name: transformers
|
|
41 |
| OpenCoder-1.5B-Instruct | 4K | π€ [HuggingFace](https://huggingface.co/infly/OpenCoder-1.5B-Instruct) |
|
42 |
| OpenCoder-8B-Instruct | 8K | π€ [HuggingFace](https://huggingface.co/infly/OpenCoder-8B-Instruct) |
|
43 |
|
44 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
45 |
|
46 |
**Note:** For the detailed evaluation results, please refer to [our paper](https://arxiv.org/pdf/2411.04905).
|
47 |
|
@@ -65,7 +78,7 @@ library_name: transformers
|
|
65 |
| MultiPL-E (AVG) | 57.5 | 71.0 | -->
|
66 |
|
67 |
|
68 |
-
##
|
69 |
|
70 |
### Inference with Huggingface's Transformers
|
71 |
|
@@ -90,11 +103,11 @@ print(result)
|
|
90 |
|
91 |
<!-- ### Inference with vLLM (recommended) -->
|
92 |
|
93 |
-
##
|
94 |
|
95 |
OpenCoder series (including Base and Chat) support commercial applications under a permissive [License](https://huggingface.co/infly/OpenCoder-1.5B-Base/blob/main/LICENSE).
|
96 |
|
97 |
-
##
|
98 |
```
|
99 |
@inproceedings{Huang2024OpenCoderTO,
|
100 |
title={OpenCoder: The Open Cookbook for Top-Tier Code Large Language Models},
|
|
|
41 |
| OpenCoder-1.5B-Instruct | 4K | π€ [HuggingFace](https://huggingface.co/infly/OpenCoder-1.5B-Instruct) |
|
42 |
| OpenCoder-8B-Instruct | 8K | π€ [HuggingFace](https://huggingface.co/infly/OpenCoder-8B-Instruct) |
|
43 |
|
44 |
+
|
45 |
+
## 3. Datasets
|
46 |
+
|
47 |
+
### Pre-training
|
48 |
+
|
49 |
+
| Dataset | Size | Download |
|
50 |
+
|:---------------------:|:---------------:|:-----------------------------------------------------------------------:|
|
51 |
+
| fineweb-code-corpus | 148 GB | π€ [HuggingFace](https://huggingface.co/datasets/OpenCoder-LLM/fineweb-code-corpus) |
|
52 |
+
| fineweb-math-corpus | 10 GB | π€ [HuggingFace](https://huggingface.co/datasets/OpenCoder-LLM/fineweb-math-corpus) |
|
53 |
+
|
54 |
+
|
55 |
+
**This is not the end; we are organizing the remaining data and uploading it progressively.**
|
56 |
+
|
57 |
+
## 4. Benchmarks
|
58 |
|
59 |
**Note:** For the detailed evaluation results, please refer to [our paper](https://arxiv.org/pdf/2411.04905).
|
60 |
|
|
|
78 |
| MultiPL-E (AVG) | 57.5 | 71.0 | -->
|
79 |
|
80 |
|
81 |
+
## 5. Inference
|
82 |
|
83 |
### Inference with Huggingface's Transformers
|
84 |
|
|
|
103 |
|
104 |
<!-- ### Inference with vLLM (recommended) -->
|
105 |
|
106 |
+
## 6. License
|
107 |
|
108 |
OpenCoder series (including Base and Chat) support commercial applications under a permissive [License](https://huggingface.co/infly/OpenCoder-1.5B-Base/blob/main/LICENSE).
|
109 |
|
110 |
+
## 7. Citation
|
111 |
```
|
112 |
@inproceedings{Huang2024OpenCoderTO,
|
113 |
title={OpenCoder: The Open Cookbook for Top-Tier Code Large Language Models},
|