---
license: llama2
language:
- en
base_model:
- meta-llama/CodeLlama-13b-hf
pipeline_tag: text-generation
tags:
- code
- gguf
- llama.cpp
- llmstudio
---
# Model Card for TestGen-Dart v0.2 (GGUF Version)

This model card provides information about **TestGen-Dart v0.2 (GGUF Version)**, a fine-tuned version of Meta's Code Llama 13B model, optimized for generating unit tests in Dart for mobile applications. This GGUF-quantized model is designed to run efficiently with frameworks like **LLMStudio** and **llama.cpp**, enabling deployment on resource-constrained hardware while maintaining robust performance.

---

## Model Details

### Model Description

**TestGen-Dart v0.2** is a fine-tuned version of Code Llama 13B, specifically adapted for generating unit test cases for Dart code. The GGUF quantization enables its use on lightweight, consumer-grade systems without significant performance loss.

- **Developed by:** Jacob Hoffmann, Demian Frister (Karlsruhe Institute of Technology - KIT, AIFB-BIS)  
- **Funded by:** Helmholtz Association's Initiative and Networking Fund on the HAICORE@FZJ partition  
- **Shared by:** Jacob Hoffmann, Demian Frister  
- **Model type:** Fine-tuned Code Llama 13B for test generation in Dart  
- **Language(s):** English  
- **License:** LLaMA 2 Community License  
- **Finetuned from model:** Meta's Code Llama 13B  

### Model Sources

- **Repository:** [GitHub Repository](https://github.com/example/repo) (placeholder)  
- **Paper:** ["Generating Software Tests for Mobile Applications Using Fine-Tuned Large Language Models"](https://doi.org/10.1145/3644032.3644454) (published in AST '24)  
- **Demo:** Coming soon  

---

## Uses

### Direct Use

The model can be used in a zero-shot setting with **llama.cpp** or **LLMStudio** to generate unit tests in Dart. Provide the class code as input, and the model outputs structured unit tests using Dart's `test` package.

### Downstream Use

This model is suitable for integration into developer tools, IDE extensions, or continuous integration pipelines to automate test generation for Dart-based applications.

### Out-of-Scope Use

- Do not use this model for tasks unrelated to Dart test generation.  
- Avoid using this model to improve or train other LLMs not based on LLaMA or its derivatives, per the LLaMA 2 Community License.  
- Misuse for malicious purposes, such as generating incorrect or harmful test cases, is prohibited.

---

## Running the GGUF Model with llama.cpp

To use this GGUF quantized model with llama.cpp:

1. Clone the llama.cpp repository and build the binaries:
   ```bash
   git clone https://github.com/ggerganov/llama.cpp
   cd llama.cpp
   make
   ```
2. Place the GGUF file in the models directory:
   ```bash
   mkdir -p models/testgen-dart-v0.2
   mv /path/to/CodeLlama-13B-TestGen-Dart_v0.2.gguf models/testgen-dart-v0.2/
   ```
3. Run the model:
   ```bash
   ./main -m models/testgen-dart-v0.2/CodeLlama-13B-TestGen-Dart_v0.2.gguf --prompt "Generate unit tests in Dart for the following class:\nclass Calculator { int add(int a, int b) { return a + b; } }"
   ```
   
## Training Details

### Training Data

The fine-tuning dataset consists of **16,252 Dart code-test pairs** extracted from open-source GitHub repositories using Google BigQuery. The data was subjected to quality filtering and deduplication to ensure high relevance and consistency.

### Training Procedure

- **Fine-tuning Approach:** Supervised Fine-Tuning (SFT) with QLoRA for memory efficiency.  
- **Hardware:** Training was conducted on a single NVIDIA A100 GPU.  
- **Optimization:** Flash Attention 2 was utilized for enhanced performance.  
- **Duration:** The training process ran for up to 32 hours.

### Training Hyperparameters

- **Mixed Precision:** FP16  
- **Optimizer:** AdamW  
- **Learning Rate:** 5e-5  
- **Epochs:** 3  

### Environmental Impact

- **Hardware Type:** NVIDIA A100 GPU  
- **Hours Used:** 32 hours  
- **Carbon Emitted:** 13.099 kgCO2eq  

---

## Evaluation

### Testing Data, Factors & Metrics

- **Testing Data:** A subset of **42 Dart files** from the training dataset, evaluated in a zero-shot setting.  
- **Factors:** Syntax correctness, functional correctness.  
- **Metrics:** pass@1, syntax error rate, functional correctness rate.

### Results

- **Syntax Correctness:** +76% improvement compared to the base model.  
- **Functional Correctness:** +16.67% improvement compared to the base model.  

---

## Citation

If you use this model in your research, please cite:

**BibTeX:**
```bibtex
@inproceedings{hoffmann2024testgen,
  title={Generating Software Tests for Mobile Applications Using Fine-Tuned Large Language Models},
  author={Hoffmann, Jacob and Frister, Demian},
  booktitle={Proceedings of the 5th ACM/IEEE International Conference on Automation of Software Test (AST 2024)},
  year={2024},
  doi={10.1145/3644032.3644454}
}
```

## Model Card Contact

- **Jacob Hoffmann**: [jacob.hoffmann@partner.kit.edu](mailto:jacob.hoffmann@partner.kit.edu)  
- **Demian Frister**: [demian.frister@kit.edu](mailto:demian.frister@kit.edu)