Pranabit's picture
Update README.md
9be712b verified
---
license: bigcode-openrail-m
library_name: peft
tags:
- trl
- sft
- generated_from_trainer
base_model: bigcode/starcoder2-3b
model-index:
- name: finetune_starcoder2_3b
results: []
---
<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->
# finetune_starcoder2_3b
This model is a fine-tuned version of [bigcode/starcoder2-3b](https://huggingface.co/bigcode/starcoder2-3b) on an unknown dataset.
## Model description
StarCoder2-3B model is a 3B parameter model trained on 17 programming languages from The Stack v2, with opt-out requests excluded. The model uses Grouped Query Attention, a context window of 16,384 tokens with a sliding window attention of 4,096 tokens, and was trained using the Fill-in-the-Middle objective on 3+ trillion tokens.
## Intended uses & limitations
The finetune_starcoder2_3b is the trained model that is capable of generating JavvaScript code snippets for various purposes like code completion, syntax suggestion tasks and code generation tasks. It has some limitations in generating complex levvell codes and users should check the code and use it at their own risk
## Training and evaluation data
More information needed
## Training procedure
### Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.0002
- train_batch_size: 1
- eval_batch_size: 8
- seed: 0
- gradient_accumulation_steps: 4
- total_train_batch_size: 4
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_steps: 100
- training_steps: 500
- mixed_precision_training: Native AMP
### Training results
### Framework versions
- PEFT 0.10.0
- Transformers 4.40.0.dev0
- Pytorch 2.2.1+cu121
- Datasets 2.18.0
- Tokenizers 0.15.2