File size: 3,382 Bytes
36fd6d6 5073817 36fd6d6 5073817 36fd6d6 5073817 36fd6d6 5073817 36fd6d6 5073817 832a02e 5073817 ff05d22 5073817 ff05d22 5073817 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 |
---
base_model: meta-llama/Llama-2-7b-hf
library_name: peft
license: apache-2.0
language:
- en
---
# Model Info
This is a model that applies LLM2Vec to Llama-2. Only the PEFT Adapter is distributed.
LLM2Vec is fine-tuned on two tasks: MNTP and SimCSE, and this repository contains the results of applying SimCSE after MNTP.
For the MNTP Adapter, please refer to [this link](https://huggingface.co/uzabase/LLM2Vec-Llama-2-7b-hf-wikipedia-jp-mntp).
## Model Details
### Model Description
<!-- Provide a longer summary of what this model is. -->
- **Model type:** PEFT
- **Language(s) (NLP):** English
- **License:** Apache2.0
- **Finetuned from model:** [Llama-2-7b-hf](https://huggingface.co/meta-llama/Llama-2-7b-hf)
### Model Sources [optional]
- **Repository:** https://github.com/McGill-NLP/llm2vec
- **Paper:** https://arxiv.org/abs/2404.05961
# Usage
- Please see [original LLM2Vec repo](https://huggingface.co/McGill-NLP/LLM2Vec-Llama-2-7b-chat-hf-mntp-unsup-simcse#usage)
# BenchMark
- Followings are summaries. Details are [here](https://tech.uzabase.com/entry/2024/09/30/114245)
## MTEB(Japansese)
| | Classification | Clustering | PairClassification | Reranking | BitextMining | Retrieval | Sts | AVG |
| --- | ---| ---| ---| ---| ---| ---| ---| ---| ---| ---| ---| ---| ---| ---|
| **Llama2-Llm2vec-eng (This repo)** | 0.527 | 0.258 | 0.501 | 0.217 | 0.275 | 0.296 | 0.765 | 0.408 |
| Llama2-Llm2vec-jpn | 0.570 | 0.365 | 0.510 | 0.349 | 0.470 | 0.417 | 0.795 | 0.498 |
| Swallow-Llm2vec-jpn | 0.621 | 0.391 | 0.510 | 0.475 | 0.475 | 0.491 | 0.832 | 0.523 |
## MTEB(English)
| | Classification | Clustering | Pair_Classification| Reranking | Retrieval | STS | 平均 |
| --- | ---: | ---: | ---: | ---: | ---: | ---: | ---: |
| **Llama2-Llm2vec-eng (this repo)** | 0.709 | 0.386 | 0.780 | 0.588 | 0.329| 0.723 | 0.586 |
| Llama2-Llm2vec-jpn | 0.722 | 0.428 | 0.785 | 0.594 | 0.371 | 0.717 | 0.603 |
| Swallow-Llm2vec-jpn | 0.695 | 0.385 | 0.751 | 0.576 | 0.318 | 0.710 | 0.572 |
# Training Details
## Training Data
- [Corpus for SimCSE from Wikipedia](https://github.com/McGill-NLP/llm2vec?tab=readme-ov-file#unsupervised-contrastive-training-simcse)
## Training Hyperparameter
- simcse_dropout: 0.3
- bidirectional: true
- pooling_mode: "mean"
- remove_unused_columns: false
- learning_rate: 3e-5
- loss_scale: 20
- batch_size: 256
- gradient_accumulation_steps: 1
- max_seq_length: 128
- lora_r: 16
- torch_dtype: "bfloat16"
- attn_implementation: "flash_attention_2"
- seed: 42
- bf16: true
- gradient_checkpointing: true
## Accelerator Settings
- deepspeed_config:
- gradient_accumulation_steps: 1
- gradient_clipping: 1.0
- offload_optimizer_device: nvme
- offload_optimizer_nvme_path: /nvme
- zero3_save_16bit_model: true
- zero_stage: 2
- distributed_type: DEEPSPEED
- downcast_bf16: 'no'
- dynamo_config:
- dynamo_backend: INDUCTOR
- dynamo_mode: default
- dynamo_use_dynamic: true
- dynamo_use_fullgraph: true
- enable_cpu_affinity: false
- machine_rank: 0
- main_training_function: main
- mixed_precision: bf16
- num_machines: 1
- num_processes: 2
- rdzv_backend: static
- same_network: true
- quse_cpu: false
## Framework versions
- Python: 3.12.3
- PEFT 0.11.1
- Sentence Transformers: 3.0.1
- Transformers: 4.41.0
- PyTorch: 2.3.0
- Accelerate: 0.30.1
- Datasets: 2.20.0
- Tokenizers: 0.19.1
- MTEB: 1.13.0 |