finetune_starcoder2_3b
This model is a fine-tuned version of bigcode/starcoder2-3b on an unknown dataset.
Model description
StarCoder2-3B model is a 3B parameter model trained on 17 programming languages from The Stack v2, with opt-out requests excluded. The model uses Grouped Query Attention, a context window of 16,384 tokens with a sliding window attention of 4,096 tokens, and was trained using the Fill-in-the-Middle objective on 3+ trillion tokens.
Intended uses & limitations
The finetune_starcoder2_3b is the trained model that is capable of generating JavvaScript code snippets for various purposes like code completion, syntax suggestion tasks and code generation tasks. It has some limitations in generating complex levvell codes and users should check the code and use it at their own risk
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.0002
- train_batch_size: 1
- eval_batch_size: 8
- seed: 0
- gradient_accumulation_steps: 4
- total_train_batch_size: 4
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_steps: 100
- training_steps: 500
- mixed_precision_training: Native AMP
Training results
Framework versions
- PEFT 0.10.0
- Transformers 4.40.0.dev0
- Pytorch 2.2.1+cu121
- Datasets 2.18.0
- Tokenizers 0.15.2
- Downloads last month
- 2
Model tree for Pranabit/finetune_starcoder2_3b
Base model
bigcode/starcoder2-3b