CodeHima's picture
Update README.md
7880370 verified
|
raw
history blame
4.26 kB
metadata
license: bigcode-openrail-m
library_name: peft
tags:
  - trl
  - sft
  - generated_from_trainer
base_model: bigcode/starcoder2-3b
model-index:
  - name: finetunedPHP_starcoder2
    results: []
datasets:
  - bigcode/the-stack-smol
language:
  - en

finetunedPHP_starcoder2

This model is a fine-tuned version of bigcode/starcoder2-3b on bigcode/the-stack-smol.

Model description

The finetunedPHP_starcoder2 model is based on the starcoder2-3b architecture, fine-tuned specifically on PHP code from the-stack-smol dataset. It is intended for code generation tasks related to PHP programming.

Intended uses & limitations

The finetunedPHP_starcoder2 model is suitable for generating PHP code snippets for various purposes, including code completion, syntax suggestions, and code generation tasks. However, it may have limitations in generating complex or domain-specific code, and users should verify the generated code for correctness and security.

Training and evaluation data

The model was trained on a dataset consisting of PHP code samples collected from the-stack-smol dataset. The training data included code snippets from PHP repositories, forums, and online tutorials.

Training procedure

1. Data and Model Preparation:

  • Load the PHP dataset from my repository bigcode/the-stack-smol.
  • Extract the relevant PHP data data/php samples for training.
  • Utilize the starcoder2-3b model pre-trained on a diverse range of programming languages, including PHP, from the Hugging Face Hub.
  • Ensure the model is configured with '4-bit' quantization for efficient computation.

2. Data Processing:

  • Tokenize the PHP code snippets using the model's tokenizer.
  • Clean the code by removing comments and normalizing indentation.
  • Prepare input examples suitable for the model, considering its architecture and objectives.

3. Training Configuration:

  • Initialize a Trainer object for fine-tuning, leveraging the Transformers library.
  • Define training parameters, including:
    • Learning rate, optimizer, and scheduler settings.
    • Gradient accumulation steps to balance memory usage.
    • Loss function, typically cross-entropy for language modeling.
    • Metrics for evaluating model performance.
    • Specify GPU utilization for accelerated training.
    • Handle potential distributed training with multiple processes.

4. Model Training:

  • Commence training for a specified number of steps.
  • Iterate through batches of preprocessed PHP code examples.
  • Feed examples into the model and compute predictions.
  • Calculate loss based on predicted and actual outcomes.
  • Update model weights by backpropagating gradients through the network.

5. Evaluation (Optional):

  • Periodically assess the model's performance on a validation set.
  • Measure key metrics such as code completion accuracy or perplexity.
  • Monitor training progress to fine-tune hyperparameters if necessary.
  • Use wandb metric monitoring for live monitoring.

6. Save the Fine-tuned Model:

  • Store the optimized model weights and configuration in the designated output_dir.

7. Model Sharing (Optional):

  • Optionally, create a model card documenting the fine-tuning process and model specifications.
  • Share the finetunedPHP_starcoder2 model on the Hugging Face Hub for broader accessibility and collaboration.

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0002
  • train_batch_size: 1
  • eval_batch_size: 8
  • seed: 0
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 4
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 100
  • training_steps: 1000
  • mixed_precision_training: Native AMP

Training results

Training results and performance metrics are present in the repo.

Framework versions

  • PEFT 0.8.2
  • Transformers 4.40.0.dev0
  • Pytorch 2.2.1+cu121
  • Datasets 2.18.0
  • Tokenizers 0.15.2