license: bigcode-openrail-m
library_name: peft
tags:
- trl
- sft
- generated_from_trainer
base_model: bigcode/starcoder2-3b
model-index:
- name: finetunedPHP_starcoder2
results: []
datasets:
- bigcode/the-stack-smol
language:
- en
finetunedPHP_starcoder2
This model is a fine-tuned version of bigcode/starcoder2-3b on bigcode/the-stack-smol.
Model description
The finetunedPHP_starcoder2
model is based on the starcoder2-3b
architecture, fine-tuned specifically on PHP code from the-stack-smol dataset. It is intended for code generation tasks related to PHP programming.
Intended uses & limitations
The finetunedPHP_starcoder2
model is suitable for generating PHP code snippets for various purposes, including code completion, syntax suggestions, and code generation tasks. However, it may have limitations in generating complex or domain-specific code, and users should verify the generated code for correctness and security.
Training and evaluation data
The model was trained on a dataset consisting of PHP code samples collected from the-stack-smol dataset. The training data included code snippets from PHP repositories, forums, and online tutorials.
Training procedure
1. Data and Model Preparation:
- Load the PHP dataset from my repository
bigcode/the-stack-smol
. - Extract the relevant PHP data
data/php
samples for training. - Utilize the
starcoder2-3b
model pre-trained on a diverse range of programming languages, including PHP, from the Hugging Face Hub. - Ensure the model is configured with '4-bit' quantization for efficient computation.
2. Data Processing:
- Tokenize the PHP code snippets using the model's tokenizer.
- Clean the code by removing comments and normalizing indentation.
- Prepare input examples suitable for the model, considering its architecture and objectives.
3. Training Configuration:
- Initialize a Trainer object for fine-tuning, leveraging the Transformers library.
- Define training parameters, including:
- Learning rate, optimizer, and scheduler settings.
- Gradient accumulation steps to balance memory usage.
- Loss function, typically cross-entropy for language modeling.
- Metrics for evaluating model performance.
- Specify GPU utilization for accelerated training.
- Handle potential distributed training with multiple processes.
4. Model Training:
- Commence training for a specified number of steps.
- Iterate through batches of preprocessed PHP code examples.
- Feed examples into the model and compute predictions.
- Calculate loss based on predicted and actual outcomes.
- Update model weights by backpropagating gradients through the network.
5. Evaluation (Optional):
- Periodically assess the model's performance on a validation set.
- Measure key metrics such as code completion accuracy or perplexity.
- Monitor training progress to fine-tune hyperparameters if necessary.
- Use wandb metric monitoring for live monitoring.
6. Save the Fine-tuned Model:
- Store the optimized model weights and configuration in the designated
output_dir
.
7. Model Sharing (Optional):
- Optionally, create a model card documenting the fine-tuning process and model specifications.
- Share the finetunedPHP_starcoder2 model on the Hugging Face Hub for broader accessibility and collaboration.
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.0002
- train_batch_size: 1
- eval_batch_size: 8
- seed: 0
- gradient_accumulation_steps: 4
- total_train_batch_size: 4
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_steps: 100
- training_steps: 1000
- mixed_precision_training: Native AMP
Training results
Training results and performance metrics are present in the repo.
Framework versions
- PEFT 0.8.2
- Transformers 4.40.0.dev0
- Pytorch 2.2.1+cu121
- Datasets 2.18.0
- Tokenizers 0.15.2