SarthakBhatore
commited on
Commit
•
09f327d
1
Parent(s):
37652e5
Update README.md
Browse files
README.md
CHANGED
@@ -1,4 +1,49 @@
|
|
1 |
---
|
2 |
datasets:
|
3 |
- iamtarun/python_code_instructions_18k_alpaca
|
4 |
-
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
---
|
2 |
datasets:
|
3 |
- iamtarun/python_code_instructions_18k_alpaca
|
4 |
+
---
|
5 |
+
# CodeGen-350M-mono-18k-Alpaca-Python
|
6 |
+
Hugging Face Model
|
7 |
+
GitHub Stars
|
8 |
+
License
|
9 |
+
|
10 |
+
This repository contains a fine-tuned language model, "CodeGen-350M-mono-18k-Alpaca-Python," which is based on the Salesforce-codegen-350M model and fine-tuned on the "iamtarun/python_code_instructions_18k_alpaca" dataset. This model is designed to assist developers in generating Python code instructions and snippets based on natural language prompts.
|
11 |
+
|
12 |
+
Model Details
|
13 |
+
Model Name: CodeGen-350M-mono-18k-Alpaca-Python
|
14 |
+
Base Model: Salesforce-codegen-350M
|
15 |
+
Dataset: iamtarun/python_code_instructions_18k_alpaca
|
16 |
+
Model Size: 350 million parameters
|
17 |
+
Usage
|
18 |
+
You can use this model in various NLP tasks that involve generating Python code from natural language prompts. Below is an example of how to use this model with the Hugging Face Transformers library in Python:
|
19 |
+
|
20 |
+
python
|
21 |
+
Copy code
|
22 |
+
from transformers import AutoModelForCausalLM, AutoTokenizer
|
23 |
+
|
24 |
+
model_name = "your-username/codegen-350M-mono-18k-alpaca-python"
|
25 |
+
tokenizer = AutoTokenizer.from_pretrained(model_name)
|
26 |
+
model = AutoModelForCausalLM.from_pretrained(model_name)
|
27 |
+
|
28 |
+
# Input text
|
29 |
+
text = "Create a function that calculates the factorial of a number in Python."
|
30 |
+
|
31 |
+
# Tokenize the text
|
32 |
+
input_ids = tokenizer.encode(text, return_tensors="pt")
|
33 |
+
|
34 |
+
# Generate Python code
|
35 |
+
output = model.generate(input_ids, max_length=100, num_return_sequences=1, no_repeat_ngram_size=2)
|
36 |
+
|
37 |
+
# Decode and print the generated code
|
38 |
+
generated_code = tokenizer.decode(output[0], skip_special_tokens=True)
|
39 |
+
print(generated_code)
|
40 |
+
For more information on using Hugging Face models, refer to the official documentation.
|
41 |
+
|
42 |
+
Fine-Tuning Details
|
43 |
+
The CodeGen-350M-mono-18k-Alpaca-Python model was fine-tuned on the "iamtarun/python_code_instructions_18k_alpaca" dataset using the Hugging Face Transformers library. The fine-tuning process involved adapting the base Salesforce-codegen-350M model to generate Python code instructions specifically for the provided dataset.
|
44 |
+
|
45 |
+
|
46 |
+
|
47 |
+
|
48 |
+
|
49 |
+
Regenerate
|