Simingh commited on
Commit
c1866e3
β€’
1 Parent(s): 33ab80c

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +17 -4
README.md CHANGED
@@ -41,7 +41,20 @@ library_name: transformers
41
  | OpenCoder-1.5B-Instruct | 4K | πŸ€— [HuggingFace](https://huggingface.co/infly/OpenCoder-1.5B-Instruct) |
42
  | OpenCoder-8B-Instruct | 8K | πŸ€— [HuggingFace](https://huggingface.co/infly/OpenCoder-8B-Instruct) |
43
 
44
- ## 3. Benchmarks
 
 
 
 
 
 
 
 
 
 
 
 
 
45
 
46
  **Note:** For the detailed evaluation results, please refer to [our paper](https://arxiv.org/pdf/2411.04905).
47
 
@@ -65,7 +78,7 @@ library_name: transformers
65
  | MultiPL-E (AVG) | 57.5 | 71.0 | -->
66
 
67
 
68
- ## 4. Inference
69
 
70
  ### Inference with Huggingface's Transformers
71
 
@@ -90,11 +103,11 @@ print(result)
90
 
91
  <!-- ### Inference with vLLM (recommended) -->
92
 
93
- ## 5. License
94
 
95
  OpenCoder series (including Base and Chat) support commercial applications under a permissive [License](https://huggingface.co/infly/OpenCoder-1.5B-Base/blob/main/LICENSE).
96
 
97
- ## 6. Citation
98
  ```
99
  @inproceedings{Huang2024OpenCoderTO,
100
  title={OpenCoder: The Open Cookbook for Top-Tier Code Large Language Models},
 
41
  | OpenCoder-1.5B-Instruct | 4K | πŸ€— [HuggingFace](https://huggingface.co/infly/OpenCoder-1.5B-Instruct) |
42
  | OpenCoder-8B-Instruct | 8K | πŸ€— [HuggingFace](https://huggingface.co/infly/OpenCoder-8B-Instruct) |
43
 
44
+
45
+ ## 3. Datasets
46
+
47
+ ### Pre-training
48
+
49
+ | Dataset | Size | Download |
50
+ |:---------------------:|:---------------:|:-----------------------------------------------------------------------:|
51
+ | fineweb-code-corpus | 148 GB | πŸ€— [HuggingFace](https://huggingface.co/datasets/OpenCoder-LLM/fineweb-code-corpus) |
52
+ | fineweb-math-corpus | 10 GB | πŸ€— [HuggingFace](https://huggingface.co/datasets/OpenCoder-LLM/fineweb-math-corpus) |
53
+
54
+
55
+ **This is not the end; we are organizing the remaining data and uploading it progressively.**
56
+
57
+ ## 4. Benchmarks
58
 
59
  **Note:** For the detailed evaluation results, please refer to [our paper](https://arxiv.org/pdf/2411.04905).
60
 
 
78
  | MultiPL-E (AVG) | 57.5 | 71.0 | -->
79
 
80
 
81
+ ## 5. Inference
82
 
83
  ### Inference with Huggingface's Transformers
84
 
 
103
 
104
  <!-- ### Inference with vLLM (recommended) -->
105
 
106
+ ## 6. License
107
 
108
  OpenCoder series (including Base and Chat) support commercial applications under a permissive [License](https://huggingface.co/infly/OpenCoder-1.5B-Base/blob/main/LICENSE).
109
 
110
+ ## 7. Citation
111
  ```
112
  @inproceedings{Huang2024OpenCoderTO,
113
  title={OpenCoder: The Open Cookbook for Top-Tier Code Large Language Models},