|
--- |
|
base_model: meta-llama/Llama-2-7b-hf |
|
library_name: peft |
|
license: apache-2.0 |
|
language: |
|
- en |
|
--- |
|
|
|
# Model Info |
|
|
|
This is a model that applies LLM2Vec to Llama-2. Only the PEFT Adapter is distributed. |
|
LLM2Vec is fine-tuned on two tasks: MNTP and SimCSE, and this repository contains the results of applying SimCSE after MNTP. |
|
For the MNTP Adapter, please refer to [this link](https://huggingface.co/uzabase/LLM2Vec-Llama-2-7b-hf-wikipedia-jp-mntp). |
|
|
|
## Model Details |
|
|
|
### Model Description |
|
|
|
<!-- Provide a longer summary of what this model is. --> |
|
|
|
- **Model type:** PEFT |
|
- **Language(s) (NLP):** English |
|
- **License:** Apache2.0 |
|
- **Finetuned from model:** [Llama-2-7b-hf](https://huggingface.co/meta-llama/Llama-2-7b-hf) |
|
|
|
### Model Sources [optional] |
|
|
|
- **Repository:** https://github.com/McGill-NLP/llm2vec |
|
- **Paper:** https://arxiv.org/abs/2404.05961 |
|
|
|
# Usage |
|
|
|
- Please see [original LLM2Vec repo](https://huggingface.co/McGill-NLP/LLM2Vec-Llama-2-7b-chat-hf-mntp-unsup-simcse#usage) |
|
|
|
# BenchMark |
|
- Followings are summaries. Details are [here](https://tech.uzabase.com/entry/2024/09/30/114245) |
|
## MTEB(Japansese) |
|
|
|
| | Classification | Clustering | PairClassification | Reranking | BitextMining | Retrieval | Sts | AVG | |
|
| --- | ---| ---| ---| ---| ---| ---| ---| ---| ---| ---| ---| ---| ---| ---| |
|
| **Llama2-Llm2vec-eng (This repo)** | 0.527 | 0.258 | 0.501 | 0.217 | 0.275 | 0.296 | 0.765 | 0.408 | |
|
| Llama2-Llm2vec-jpn | 0.570 | 0.365 | 0.510 | 0.349 | 0.470 | 0.417 | 0.795 | 0.498 | |
|
| Swallow-Llm2vec-jpn | 0.621 | 0.391 | 0.510 | 0.475 | 0.475 | 0.491 | 0.832 | 0.523 | |
|
|
|
## MTEB(English) |
|
|
|
| | Classification | Clustering | Pair_Classification| Reranking | Retrieval | STS | 平均 | |
|
| --- | ---: | ---: | ---: | ---: | ---: | ---: | ---: | |
|
| **Llama2-Llm2vec-eng (this repo)** | 0.709 | 0.386 | 0.780 | 0.588 | 0.329| 0.723 | 0.586 | |
|
| Llama2-Llm2vec-jpn | 0.722 | 0.428 | 0.785 | 0.594 | 0.371 | 0.717 | 0.603 | |
|
| Swallow-Llm2vec-jpn | 0.695 | 0.385 | 0.751 | 0.576 | 0.318 | 0.710 | 0.572 | |
|
|
|
# Training Details |
|
|
|
## Training Data |
|
|
|
- [Corpus for SimCSE from Wikipedia](https://github.com/McGill-NLP/llm2vec?tab=readme-ov-file#unsupervised-contrastive-training-simcse) |
|
|
|
|
|
## Training Hyperparameter |
|
- simcse_dropout: 0.3 |
|
- bidirectional: true |
|
- pooling_mode: "mean" |
|
- remove_unused_columns: false |
|
- learning_rate: 3e-5 |
|
- loss_scale: 20 |
|
- batch_size: 256 |
|
- gradient_accumulation_steps: 1 |
|
- max_seq_length: 128 |
|
- lora_r: 16 |
|
- torch_dtype: "bfloat16" |
|
- attn_implementation: "flash_attention_2" |
|
- seed: 42 |
|
- bf16: true |
|
- gradient_checkpointing: true |
|
|
|
|
|
## Accelerator Settings |
|
- deepspeed_config: |
|
- gradient_accumulation_steps: 1 |
|
- gradient_clipping: 1.0 |
|
- offload_optimizer_device: nvme |
|
- offload_optimizer_nvme_path: /nvme |
|
- zero3_save_16bit_model: true |
|
- zero_stage: 2 |
|
- distributed_type: DEEPSPEED |
|
- downcast_bf16: 'no' |
|
- dynamo_config: |
|
- dynamo_backend: INDUCTOR |
|
- dynamo_mode: default |
|
- dynamo_use_dynamic: true |
|
- dynamo_use_fullgraph: true |
|
- enable_cpu_affinity: false |
|
- machine_rank: 0 |
|
- main_training_function: main |
|
- mixed_precision: bf16 |
|
- num_machines: 1 |
|
- num_processes: 2 |
|
- rdzv_backend: static |
|
- same_network: true |
|
- quse_cpu: false |
|
|
|
|
|
## Framework versions |
|
|
|
- Python: 3.12.3 |
|
- PEFT 0.11.1 |
|
- Sentence Transformers: 3.0.1 |
|
- Transformers: 4.41.0 |
|
- PyTorch: 2.3.0 |
|
- Accelerate: 0.30.1 |
|
- Datasets: 2.20.0 |
|
- Tokenizers: 0.19.1 |
|
- MTEB: 1.13.0 |