File size: 6,327 Bytes
9c85878 9573d0c 614b58f 9c85878 b73033f 9c85878 9573d0c 9c85878 9f272f8 9c85878 7b8cf6f 9c85878 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 |
---
language: en
license: apache-2.0
---
# LoNAS Adapter Card: lonas-llama-7b-commonsense-adapter
The super-adapter-network fine-tuned on LLaMA-7B with some commonsense reasoning datasets using LoNAS.
## Paper Abstract
Large Language Models (LLMs) continue to grow, reaching hundreds of billions of parameters and making it challenging for Deep Learning practitioners with resource-constrained systems to use them, e.g., fine-tuning these models for a downstream task of their interest. Adapters, such as low-rank adapters (LoRA), have been proposed to reduce the number of trainable parameters in a model, reducing memory requirements and enabling smaller systems to fine-tune these models. Orthogonal to this work, Neural Architecture Search (NAS) has been used to discover compressed and more efficient architectures without sacrificing performance compared to similar base models. This paper introduces a novel approach, LoNAS, to use NAS on language models by exploring a search space of elastic low-rank adapters while reducing memory and compute requirements of full-scale NAS, resulting in high-performing compressed models obtained from weight-sharing super-networks. Compared to models fine-tuned with LoRA, these models contain fewer total parameters, reducing the inference time with only minor decreases in accuracy and, in some cases, even improving accuracy. We discuss the limitations of LoNAS and share observations for the research community regarding its generalization capabilities, which have motivated our follow-up work.
## Model Details
### Note
Please note, we only provide the model adapter and do not provide a copy of the base [yahma/llama-7b-hf](https://huggingface.co/yahma/llama-7b-hf) model. Any use of this adapter requires a separate download of the base model.
### Information
- **Adapter name:** lonas-llama-7b-commonsense-adapter
- **Base model:** [LLaMA-7b](https://huggingface.co/yahma/llama-7b-hf)
- **Domain:** Commonsense
- **Subnetwork version:** Super-network
- **NNCF Configuration:** [nncf_lonas_llama_7b.json](https://github.com/IntelLabs/Hardware-Aware-Automated-Machine-Learning/tree/main/LoNAS/nncf_config/unified_commonsense/nncf_lonas_llama_7b.json)
### Adapter Configuration
- **LoRA rank:** 32
- **LoRA alpha:** 64
- **LoRA target modules:** q_proj, k_proj, v_proj, up_proj, gate_proj, down_proj
### Training Hyperparameters
- **Batch size:** 16
- **Learning rate:** 3e-4
- **Epoch:** 6
### Training Data
Unified commonsense reasoning dataset: [commonsense_15k.json](https://github.com/AGI-Edgerunners/LLM-Adapters/blob/main/ft-training_set/commonsense_15k.json).
### Evaluation Data
[BoolQ](https://github.com/AGI-Edgerunners/LLM-Adapters/blob/main/dataset/boolq/test.json), [PIQA](https://github.com/AGI-Edgerunners/LLM-Adapters/blob/main/dataset/piqa/test.json), [SIQA](https://github.com/AGI-Edgerunners/LLM-Adapters/blob/main/dataset/social_i_qa/test.json), [HellaSwag](https://github.com/AGI-Edgerunners/LLM-Adapters/blob/main/dataset/hellaswag/test.json), [WinoGrande](https://github.com/AGI-Edgerunners/LLM-Adapters/blob/main/dataset/winogrande/test.json), [ARC-e](https://github.com/AGI-Edgerunners/LLM-Adapters/blob/main/dataset/ARC-Easy/test.json), [ARC-c](https://github.com/AGI-Edgerunners/LLM-Adapters/blob/main/dataset/ARC-Challenge/test.json), [OBQA](https://github.com/AGI-Edgerunners/LLM-Adapters/blob/main/dataset/openbookqa/test.json).
## How to use
Refer to [https://github.com/IntelLabs/Hardware-Aware-Automated-Machine-Learning/tree/main/LoNAS#evaluation](https://github.com/IntelLabs/Hardware-Aware-Automated-Machine-Learning/tree/main/LoNAS#evaluation):
```bash
CUDA_VISIBLE_DEVICES=${DEVICES} python run_commonsense.py \
--dataset_path None \
--model_name_or_path yahma/llama-7b-hf \
--lora \
--lora_weights lonas-llama-7b-commonsense \
--nncf_config nncf_config/unified_commonsense/nncf_lonas_llama_7b.json \
--do_test \
--output_dir lonas-llama-7b-commonsense/results
```
## Evaluation Results
Results of the heuristic sub-network discoverd from the super-network:
| Method | Total Params. | TFLOPs | BoolQ | PIQA | SIQA | HellaSwag | WinoG | Arc-e | Arc-c | OBQA | Average |
|-------------|----------------|-----------|-------|------|------|-----------|-------|-------|-------|------|----------------|
| LoRA | 6.7B | 1.7 | 62.6 | 75.3 | 67.9 | 52.9 | 58.6 | 79.2 | 58.3 | 71.2 | **65.8** |
| **LoNAS** | **5.6B** | **1.4** | 62.9 | 73.0 | 68.7 | 51.4 | 63.9 | 72.3 | 58.5 | 71.0 | 65.2 |
## Model Sources
- **Repository:** [https://github.com/IntelLabs/Hardware-Aware-Automated-Machine-Learning/tree/main/LoNAS](https://github.com/IntelLabs/Hardware-Aware-Automated-Machine-Learning/tree/main/LoNAS)
- **Paper:** [LoNAS: Elastic Low-Rank Adapters for Efficient Large Language Models](https://arxiv.org/abs/2404.10934)
## Ethical Considerations
Intel is committed to respecting human rights and avoiding causing or contributing to adverse impacts on human rights. See [Intel’s Global Human Rights Principles](https://www.intel.com/content/dam/www/central-libraries/us/en/documents/policy-human-rights.pdf). Intel’s products and software are intended only to be used in applications that do not cause or contribute to adverse impacts on human rights.
| Ethical Considerations | Description |
| ----------- | ----------- |
| Data | The adapter was trained using the commonsense_15k data mixture as described above. |
| Human life | The model is not intended to inform decisions central to human life or flourishing. |
| Mitigations | No additional risk mitigation strategies were considered during model development. |
| Risks and harms | This model has not been assessed for harm or biases, and should not be used for sensitive applications where it may cause harm. |
| Use cases | - |
## Citation
```bibtex
@inproceedings{
munoz2024lonas,
title={LoNAS: Elastic Low-Rank Adapters for Efficient Large Language Models},
author={J. Pablo Muñoz and Jinjie Yuan and Yi Zheng and Nilesh Jain},
booktitle={The 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation},
year={2024},
url={https://aclanthology.org/2024.lrec-main.940/}
}
```
## License
Apache-2.0
|