--- library_name: peft base_model: bigcode/starcoderbase-3b license: bigcode-openrail-m --- # Model Card for Model ID First pass at finetuning `bigcode/starcoderbase-3b` on the Elixir language subset of `bigcode/the-stack-dedup` ## Model Details ### Model Description - **Developed by:** [More Information Needed] - **Shared by [optional]:** [More Information Needed] - **Model type:** [More Information Needed] - **Language(s) (NLP):** [More Information Needed] - **License:** [More Information Needed] - **Finetuned from model [optional]:** [More Information Needed] ### Model Sources [optional] - **Repository:** [arpieb/peft-lora-starcoderbase-3b-personal-copilot-elixir](https://huggingface.co/arpieb/peft-lora-starcoderbase-3b-personal-copilot-elixir) ## Uses ### Direct Use [More Information Needed] ### Downstream Use [optional] [More Information Needed] ### Out-of-Scope Use [More Information Needed] ## Bias, Risks, and Limitations [More Information Needed] ### Recommendations Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations. ## How to Get Started with the Model Use the code below to get started with the model. [More Information Needed] ## Training Details ### Training Data [bigcode/the-stack-dedup](https://huggingface.co/datasets/bigcode/the-stack-dedup) ### Training Procedure Based on the finetuning workflow detailed in [Personal Copilot: Train Your Own Coding Assistant](https://huggingface.co/blog/personal-copilot), specifically the training code found under `personal_copilot/training` in the repo [pacman100/DHS-LLM-Workshop](https://github.com/pacman100/DHS-LLM-Workshop). Script used to train the model: ```bash python train.py \ --model_path "bigcode/starcoderbase-3b" \ --dataset_name "bigcode/the-stack-dedup" \ --subset "data/elixir" \ --data_column "content" \ --split "train" \ --seq_length 2048 \ --max_steps 2000 \ --batch_size 4 \ --gradient_accumulation_steps 4 \ --learning_rate 5e-4 \ --lr_scheduler_type "cosine" \ --weight_decay 0.01 \ --num_warmup_steps 30 \ --eval_freq 100 \ --save_freq 100 \ --log_freq 25 \ --num_workers 4 \ --bf16 \ --no_fp16 \ --output_dir "peft-lora-starcoderbase-3b-personal-copilot-rtx4090-elixir" \ --push_to_hub "false" \ --fim_rate 0.5 \ --fim_spm_rate 0.5 \ --use_flash_attn \ --use_peft_lora \ --lora_r 32 \ --lora_alpha 64 \ --lora_dropout 0.0 \ --lora_target_modules "c_proj,c_attn,q_attn,c_fc,c_proj" \ --use_4bit_qunatization \ --use_nested_quant \ --bnb_4bit_compute_dtype "bfloat16" ``` #### Preprocessing N/A #### Training Hyperparameters - **Training regime:** [More Information Needed] #### Speeds, Sizes, Times [optional] [More Information Needed] ## Evaluation ### Testing Data, Factors & Metrics #### Testing Data [More Information Needed] #### Factors [More Information Needed] #### Metrics [More Information Needed] ### Results [More Information Needed] #### Summary ## Model Examination [optional] [More Information Needed] ## Environmental Impact Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700). NOTE the RTX-4090 is not available in the above estimator; will update once there is data available. - **Hardware Type:** NVIDIA GeForce RTX 4090 - **Hours used:** 5h 20m 2s - **Cloud Provider:** Local rig - **Compute Region:** N/A - **Carbon Emitted:** N/A ## Technical Specifications [optional] ### Model Architecture and Objective [More Information Needed] ### Compute Infrastructure #### Hardware Local DL rig with the following configuration: - NVIDIA GeForce RTX 4090 - Intel(R) Core(TM) i7-7800X CPU @ 3.50GHz - 128GB RAM #### Software [More Information Needed] ## Citation [optional] **BibTeX:** [More Information Needed] **APA:** [More Information Needed] ## Glossary [optional] [More Information Needed] ## More Information [optional] [More Information Needed] ## Model Card Authors [optional] [More Information Needed] ## Model Card Contact [More Information Needed] ## Training procedure The following `bitsandbytes` quantization config was used during training: - quant_method: bitsandbytes - load_in_8bit: False - load_in_4bit: True - llm_int8_threshold: 6.0 - llm_int8_skip_modules: None - llm_int8_enable_fp32_cpu_offload: False - llm_int8_has_fp16_weight: False - bnb_4bit_quant_type: nf4 - bnb_4bit_use_double_quant: True - bnb_4bit_compute_dtype: bfloat16 ### Framework versions - PEFT 0.6.2.dev0