uzabase
/

LLM2Vec-Llama-2-7b-hf-mntp-unsup-simcse

Model card Files Files and versions Community

LLM2Vec-Llama-2-7b-hf-mntp-unsup-simcse / README.md

h-iida's picture

Update README.md

832a02e verified 4 months ago

|

history blame contribute delete

3.38 kB

	---
	base_model: meta-llama/Llama-2-7b-hf
	library_name: peft
	license: apache-2.0
	language:
	- en
	---

	# Model Info

	This is a model that applies LLM2Vec to Llama-2. Only the PEFT Adapter is distributed.
	LLM2Vec is fine-tuned on two tasks: MNTP and SimCSE, and this repository contains the results of applying SimCSE after MNTP.
	For the MNTP Adapter, please refer to [this link](https://huggingface.co/uzabase/LLM2Vec-Llama-2-7b-hf-wikipedia-jp-mntp).

	## Model Details

	### Model Description

	<!-- Provide a longer summary of what this model is. -->

	- Model type: PEFT
	- Language(s) (NLP): English
	- License: Apache2.0
	- Finetuned from model: [Llama-2-7b-hf](https://huggingface.co/meta-llama/Llama-2-7b-hf)

	### Model Sources [optional]

	- Repository: https://github.com/McGill-NLP/llm2vec
	- Paper: https://arxiv.org/abs/2404.05961

	# Usage

	- Please see [original LLM2Vec repo](https://huggingface.co/McGill-NLP/LLM2Vec-Llama-2-7b-chat-hf-mntp-unsup-simcse#usage)

	# BenchMark
	- Followings are summaries. Details are [here](https://tech.uzabase.com/entry/2024/09/30/114245)
	## MTEB(Japansese)

	\| \| Classification \| Clustering \| PairClassification \| Reranking \| BitextMining \| Retrieval \| Sts \| AVG \|
	\| --- \| ---\| ---\| ---\| ---\| ---\| ---\| ---\| ---\| ---\| ---\| ---\| ---\| ---\| ---\|
	\| Llama2-Llm2vec-eng (This repo) \| 0.527 \| 0.258 \| 0.501 \| 0.217 \| 0.275 \| 0.296 \| 0.765 \| 0.408 \|
	\| Llama2-Llm2vec-jpn \| 0.570 \| 0.365 \| 0.510 \| 0.349 \| 0.470 \| 0.417 \| 0.795 \| 0.498 \|
	\| Swallow-Llm2vec-jpn \| 0.621 \| 0.391 \| 0.510 \| 0.475 \| 0.475 \| 0.491 \| 0.832 \| 0.523 \|

	## MTEB(English)

	\| \| Classification \| Clustering \| Pair_Classification\| Reranking \| Retrieval \| STS \| 平均 \|
	\| --- \| ---: \| ---: \| ---: \| ---: \| ---: \| ---: \| ---: \|
	\| Llama2-Llm2vec-eng (this repo) \| 0.709 \| 0.386 \| 0.780 \| 0.588 \| 0.329\| 0.723 \| 0.586 \|
	\| Llama2-Llm2vec-jpn \| 0.722 \| 0.428 \| 0.785 \| 0.594 \| 0.371 \| 0.717 \| 0.603 \|
	\| Swallow-Llm2vec-jpn \| 0.695 \| 0.385 \| 0.751 \| 0.576 \| 0.318 \| 0.710 \| 0.572 \|

	# Training Details

	## Training Data

	- [Corpus for SimCSE from Wikipedia](https://github.com/McGill-NLP/llm2vec?tab=readme-ov-file#unsupervised-contrastive-training-simcse)


	## Training Hyperparameter
	- simcse_dropout: 0.3
	- bidirectional: true
	- pooling_mode: "mean"
	- remove_unused_columns: false
	- learning_rate: 3e-5
	- loss_scale: 20
	- batch_size: 256
	- gradient_accumulation_steps: 1
	- max_seq_length: 128
	- lora_r: 16
	- torch_dtype: "bfloat16"
	- attn_implementation: "flash_attention_2"
	- seed: 42
	- bf16: true
	- gradient_checkpointing: true


	## Accelerator Settings
	- deepspeed_config:
	- gradient_accumulation_steps: 1
	- gradient_clipping: 1.0
	- offload_optimizer_device: nvme
	- offload_optimizer_nvme_path: /nvme
	- zero3_save_16bit_model: true
	- zero_stage: 2
	- distributed_type: DEEPSPEED
	- downcast_bf16: 'no'
	- dynamo_config:
	- dynamo_backend: INDUCTOR
	- dynamo_mode: default
	- dynamo_use_dynamic: true
	- dynamo_use_fullgraph: true
	- enable_cpu_affinity: false
	- machine_rank: 0
	- main_training_function: main
	- mixed_precision: bf16
	- num_machines: 1
	- num_processes: 2
	- rdzv_backend: static
	- same_network: true
	- quse_cpu: false


	## Framework versions

	- Python: 3.12.3
	- PEFT 0.11.1
	- Sentence Transformers: 3.0.1
	- Transformers: 4.41.0
	- PyTorch: 2.3.0
	- Accelerate: 0.30.1
	- Datasets: 2.20.0
	- Tokenizers: 0.19.1
	- MTEB: 1.13.0