dittops commited on
Commit
60e4959
·
1 Parent(s): e57f776

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +86 -0
README.md ADDED
@@ -0,0 +1,86 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: transformers
3
+ tags:
4
+ - code
5
+
6
+ ---
7
+
8
+
9
+ # Bud Code Millenials 3B
10
+
11
+ Welcome to our Code Model repository! Our model is specifically fine-tuned for code generation tasks. Bud Millenial Code Gen open-source models are currently the State of the Art (SOTA) for code generation, beating all the existing models of all sizes. We have achieved a HumanEval value of 80.48 @ Pass 1, beating proprietary models like Gemini Ultra, Claude, GPT-3.5 etc. by a large margin, and on par with GPT-4 (HumanEval ~ 82. Ref. WizardCoder). Our proprietary model (Bud Code Jr) beats GPT-4 as well with a HumanEval value of 88.2 & a context size of 168K, we will be releasing an API for Researchers, Enterprises, and potential Partners by January 2024 end. If interested, please reach out to [email protected]
12
+
13
+ ### News 🔥🔥🔥
14
+
15
+ - [2024/01/03] We released **Code Millenials 34B** , which achieves the **80.48 pass@1** on the [HumanEval Benchmarks](https://github.com/openai/human-eval).
16
+ - [2024/01/02] We released **Code Millenials 13B** , which achieves the **76.21 pass@1** on the [HumanEval Benchmarks](https://github.com/openai/human-eval).
17
+
18
+
19
+ ### HumanEval
20
+
21
+ <p align="center" width="100%">
22
+ <a ><img src="https://raw.githubusercontent.com/BudEcosystem/code-millenials/main/assets/result.png" alt="CodeMillenials" style="width: 100%; min-width: 300px; display: block; margin: auto;"></a>
23
+ </p>
24
+
25
+ For the millenial models, the eval script in the github repo is used for the above result.
26
+
27
+ Note: The humaneval values of other models are taken from the official repos of [WizardCoder](https://github.com/nlpxucan/WizardLM), [DeepseekCoder](https://github.com/deepseek-ai/deepseek-coder), [Gemini](https://deepmind.google/technologies/gemini/#capabilities) etc.
28
+
29
+
30
+ ### Models
31
+
32
+ | Model | Checkpoint | HumanEval (+) | MBPP (+) |
33
+ |---------|-------------|---------------|----------|
34
+ |Code Millenials 34B | <a href="https://huggingface.co/budecosystem/code-millenials-34b" target="_blank">HF Link</a> | 80.48 (75) | 74.68 (62.9) |
35
+ |Code Millenials 13B | <a href="https://huggingface.co/budecosystem/code-millenials-13b" target="_blank">HF Link</a> | 76.21 (69.5) | 70.17 (57.6) |
36
+ |Code Millenials 3B | <a href="https://huggingface.co/budecosystem/code-millenials-3b" target="_blank">HF Link</a> | - | - |
37
+ |Code Millenials 1B | <a href="https://huggingface.co/budecosystem/code-millenials-1b" target="_blank">HF Link</a> | - | - |
38
+
39
+
40
+
41
+
42
+ ### 🚀 Quick Start
43
+
44
+ Inference code using the pre-trained model from the Hugging Face model hub
45
+
46
+ ```python
47
+ import torch
48
+ from transformers import AutoTokenizer, AutoModelForCausalLM
49
+
50
+ tokenizer = AutoTokenizer.from_pretrained("budecosystem/code-millenials-3b")
51
+ model = AutoModelForCausalLM.from_pretrained("budecosystem/code-millenials-3b")
52
+
53
+ template = """A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions.
54
+ ### Instruction: {instruction} ### Response:"""
55
+
56
+ instruction = <Your code instruction here>
57
+
58
+ prompt = template.format(instruction=instruction)
59
+
60
+ inputs = tokenizer(prompt, return_tensors="pt")
61
+ sample = model.generate(**inputs, max_length=128)
62
+ print(tokenizer.decode(sample[0]))
63
+
64
+ ```
65
+
66
+
67
+ ## Training details
68
+
69
+ The model is trained of 8 A100 80GB for approximately 6hrs.
70
+
71
+ | Hyperparameters | Value |
72
+ | :----------------------------| :-----: |
73
+ | per_device_train_batch_size | 3 |
74
+ | gradient_accumulation_steps | 1 |
75
+ | epoch | 3 |
76
+ | steps | 26289 |
77
+ | learning_rate | 2e-5 |
78
+ | lr schedular type | cosine |
79
+ | warmup ratio | 0.15 |
80
+ | optimizer | adamw |
81
+ | fp16 | True |
82
+ | GPU | 8 A100 80GB |
83
+
84
+ ### Important Note
85
+
86
+ - **Bias, Risks, and Limitations:** Model may sometimes make errors, produce misleading contents, or struggle to manage tasks that are not related to coding.