metadata
license: bigcode-openrail-m
datasets:
- bigcode/guanaco-commits
metrics:
- code_eval
library_name: peft
tags:
- code
Astraios: Parameter-Efficient Instruction Tuning Code Large Language Models
Table of Contents
Model Summary
Astraios-3B-LoRA is an instruction tuned model with 15.5B parameters created by finetuning StarCoderBase on CommitPackFT & OASST as described in the Astraios paper.
- Repository: bigcode-project/astraios
- Paper: Astraios: Parameter-Efficient Instruction Tuning Code Large Language Models
- Languages: 80+ Programming languages
- ✨Astraios:
Data CommitPackFT+OASST Filtered version of CommitPack and OASST for high-quality commit messages that resemble instructions Model Astraios-1B Collection of StarCoderBase-1B models instruction tuned on CommitPackFT + OASST with different tuning methods Astraios-3B Collection of StarCoderBase-3B (3B parameters) models instruction tuned on CommitPackFT + OASST with different tuning methods Astraios-7B Collection of StarCoderBase-7B (7B parameters) models instruction tuned on CommitPackFT + OASST with different tuning methods Astraios-16B Collection of StarCoderBase-16B (16B parameters) models instruction tuned on CommitPackFT + OASST with different tuning methods Evaluation BigCloneBench Dataset for clone detection; We use 2,000 samples for evaluation Devign Dataset for defect detection; We use 2,000 samples for evaluation HumanEvalPack Extension of OpenAI's HumanEval to cover 3 scenarios across 6 languages ReCode Dataset for the robustness of code generation, covering 4 variants Asleep At The Keyboard Datasets for security of code generation; We use DoW for evaluation
Use
Intended use
The model follows instructions provided in the input. You should always preface your input with "Question: " and finish it with "Answer:", for example: "Question: Please write a function in Python that performs bubble sort.
Answer:"
Feel free to share your generations in the Community tab!
Generation
# pip install -q transformers
# pip install -e git+https://github.com/bigcode-project/astraios#subdirectory=peft
from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer
peft_checkpoint = "bigcode/astraios-3b-lora"
checkpoint = "bigcode/starcoderbase-3b"
model = AutoModelForCausalLM.from_pretrained(checkpoint)
model = PeftModel.from_pretrained(model, peft_checkpoint)
device = "cuda" # for GPU usage or "cpu" for CPU usage
tokenizer = AutoTokenizer.from_pretrained(checkpoint)
model = AutoModelForCausalLM.from_pretrained(checkpoint).to(device)
inputs = tokenizer.encode("Question: Please write a function in Python that performs bubble sort.
Answer:", return_tensors="pt").to(device)
outputs = model.generate(inputs)
print(tokenizer.decode(outputs[0]))
Training
Model
- Architecture: GPT-2 model with multi-query attention and Fill-in-the-Middle objective
- Steps: 250k pretraining & 200 instruction tuning
- Precision: fp32
Hardware
- Pretraining:
- GPUs: 512 Tesla A100
- Training time: 24 days
- Instruction tuning:
- GPUs: 8 Tesla A100
Software
- Orchestration: Megatron-LM/Transformers
- Neural networks: PyTorch
Citation