finetune_starcoder2_3b

This model is a fine-tuned version of bigcode/starcoder2-3b on an unknown dataset.

Model description

StarCoder2-3B model is a 3B parameter model trained on 17 programming languages from The Stack v2, with opt-out requests excluded. The model uses Grouped Query Attention, a context window of 16,384 tokens with a sliding window attention of 4,096 tokens, and was trained using the Fill-in-the-Middle objective on 3+ trillion tokens.

Intended uses & limitations

The finetune_starcoder2_3b is the trained model that is capable of generating JavvaScript code snippets for various purposes like code completion, syntax suggestion tasks and code generation tasks. It has some limitations in generating complex levvell codes and users should check the code and use it at their own risk

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0002
train_batch_size: 1
eval_batch_size: 8
seed: 0
gradient_accumulation_steps: 4
total_train_batch_size: 4
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_steps: 100
training_steps: 500
mixed_precision_training: Native AMP

Training results

Framework versions

PEFT 0.10.0
Transformers 4.40.0.dev0
Pytorch 2.2.1+cu121
Datasets 2.18.0
Tokenizers 0.15.2

Pranabit
/

finetune_starcoder2_3b