---
license: bigcode-openrail-m
base_model: bigcode/starcoderbase-7b
tags:
- generated_from_trainer
model-index:
- name: starcoderbase7b_2048_context_length_lr_0.0005
  results: []
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# starcoderbase7b_2048_context_length_lr_0.0005

This model is a fine-tuned version of [bigcode/starcoderbase-7b](https://huggingface.co/bigcode/starcoderbase-7b) on an unknown dataset.
It achieves the following results on the evaluation set:
- Loss: 2.0501

## Model description

More information needed

## Intended uses & limitations

More information needed

## Training and evaluation data

More information needed

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 0.0005
- train_batch_size: 2
- eval_batch_size: 2
- seed: 42
- distributed_type: multi-GPU
- num_devices: 4
- gradient_accumulation_steps: 2
- total_train_batch_size: 16
- total_eval_batch_size: 8
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_steps: 30
- training_steps: 2000

### Training results

| Training Loss | Epoch  | Step | Validation Loss |
|:-------------:|:------:|:----:|:---------------:|
| 0.6244        | 0.0125 | 25   | 0.5402          |
| 1.0172        | 0.025  | 50   | 1.4486          |
| 0.9991        | 0.0375 | 75   | 1.0535          |
| 0.715         | 0.05   | 100  | 1.6262          |
| 0.6957        | 0.0625 | 125  | 0.6796          |
| 0.5182        | 0.075  | 150  | 0.6086          |
| 0.497         | 0.0875 | 175  | 0.5938          |
| 0.4611        | 0.1    | 200  | 0.6104          |
| 0.4046        | 0.1125 | 225  | 0.5857          |
| 0.3753        | 0.125  | 250  | 0.6633          |
| 0.3517        | 0.1375 | 275  | 0.6479          |
| 0.2758        | 0.15   | 300  | 0.5788          |
| 0.2928        | 0.1625 | 325  | 0.6429          |
| 0.2669        | 0.175  | 350  | 0.5874          |
| 0.2608        | 0.1875 | 375  | 0.5497          |
| 0.2049        | 0.2    | 400  | 0.6268          |
| 0.2006        | 0.2125 | 425  | 0.6265          |
| 0.197         | 0.225  | 450  | 0.6236          |
| 0.177         | 0.2375 | 475  | 0.6124          |
| 0.1774        | 0.25   | 500  | 0.6231          |
| 0.1509        | 0.2625 | 525  | 0.5864          |
| 0.1389        | 0.275  | 550  | 0.6161          |
| 0.8679        | 0.2875 | 575  | 11.4657         |
| 6.5575        | 0.3    | 600  | 6.4917          |
| 6.0031        | 0.3125 | 625  | 5.5229          |
| 5.1391        | 0.325  | 650  | 5.2191          |
| 4.4917        | 0.3375 | 675  | 4.6562          |
| 3.9199        | 0.35   | 700  | 4.2153          |
| 3.855         | 0.3625 | 725  | 4.0902          |
| 3.5441        | 0.375  | 750  | 4.0601          |
| 3.3835        | 0.3875 | 775  | 3.8844          |
| 3.1663        | 0.4    | 800  | 3.8223          |
| 2.9285        | 0.4125 | 825  | 3.4541          |
| 3.0088        | 0.425  | 850  | 3.5302          |
| 2.9083        | 0.4375 | 875  | 3.3347          |
| 2.8438        | 0.45   | 900  | 3.3962          |
| 2.663         | 0.4625 | 925  | 3.0955          |
| 2.5084        | 0.475  | 950  | 3.0454          |
| 2.5818        | 0.4875 | 975  | 3.0131          |
| 2.4068        | 0.5    | 1000 | 3.0179          |
| 2.3994        | 0.5125 | 1025 | 2.8273          |
| 2.1942        | 0.525  | 1050 | 2.7333          |
| 2.1041        | 0.5375 | 1075 | 2.6163          |
| 2.0861        | 0.55   | 1100 | 2.6006          |
| 1.9868        | 0.5625 | 1125 | 2.5482          |
| 1.9496        | 0.575  | 1150 | 2.6079          |
| 1.8099        | 0.5875 | 1175 | 2.3777          |
| 1.6454        | 0.6    | 1200 | 2.2547          |
| 1.6484        | 0.6125 | 1225 | 2.3254          |
| 1.5729        | 0.625  | 1250 | 2.2835          |
| 1.5635        | 0.6375 | 1275 | 2.2167          |
| 1.3961        | 0.65   | 1300 | 2.2751          |
| 1.3495        | 0.6625 | 1325 | 2.1755          |
| 1.3524        | 0.675  | 1350 | 2.1377          |
| 1.3116        | 0.6875 | 1375 | 2.1407          |
| 1.282         | 0.7    | 1400 | 2.0955          |
| 1.114         | 0.7125 | 1425 | 2.0334          |
| 1.0985        | 0.725  | 1450 | 2.0133          |
| 1.1216        | 0.7375 | 1475 | 2.0139          |
| 1.0544        | 0.75   | 1500 | 2.0464          |
| 1.0221        | 0.7625 | 1525 | 1.9984          |
| 0.9368        | 0.775  | 1550 | 2.0069          |
| 0.8973        | 0.7875 | 1575 | 1.9595          |
| 0.9332        | 0.8    | 1600 | 1.9372          |
| 0.9227        | 0.8125 | 1625 | 1.9910          |
| 0.8507        | 0.825  | 1650 | 2.0251          |
| 0.8242        | 0.8375 | 1675 | 1.9892          |
| 0.7571        | 0.85   | 1700 | 2.0327          |
| 0.7519        | 0.8625 | 1725 | 1.9949          |
| 0.7209        | 0.875  | 1750 | 2.0050          |
| 0.7315        | 0.8875 | 1775 | 2.0076          |
| 0.77          | 0.9    | 1800 | 2.0315          |
| 0.7719        | 0.9125 | 1825 | 2.0241          |
| 0.681         | 0.925  | 1850 | 2.0440          |
| 0.7371        | 0.9375 | 1875 | 2.0380          |
| 0.6823        | 0.95   | 1900 | 2.0392          |
| 0.6891        | 0.9625 | 1925 | 2.0563          |
| 0.7266        | 0.975  | 1950 | 2.0511          |
| 0.6888        | 0.9875 | 1975 | 2.0501          |
| 0.6663        | 1.0    | 2000 | 2.0501          |


### Framework versions

- Transformers 4.44.0
- Pytorch 2.4.0a0+07cecf4168.nv24.05
- Datasets 2.20.0
- Tokenizers 0.19.1