File size: 2,216 Bytes
901c371 976b6a2 901c371 976b6a2 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 |
---
license: mit
language:
- ja
pipeline_tag: sentence-similarity
---
This model was created by merging [intfloat/e5-mistral-7b-instruct](https://huggingface.co/intfloat/e5-mistral-7b-instruct) and [stabilityai/japanese-stablelm-base-gamma-7b](https://huggingface.co/stabilityai/japanese-stablelm-base-gamma-7b).
See [intfloat/e5-mistral-7b-instruct](https://huggingface.co/intfloat/e5-mistral-7b-instruct) page for model usage.
The steps to merge are as follows.
1. Load intfloat/e5-mistral-7b-instruct as a "MistralForCausalLM" class and save_pretrained as is.
Because e5-mistral-7b-instruct is made with the "MistralModel" class, it could not be merged with "MistraForCausalLM" as is.
In my environment, I had to load into the CPU, not the GPU, or I would get an error.
```
from transformers import AutoModelForCausalLM, AutoTokenizer
model_id = "intfloat/e5-mistral-7b-instruct"
model = AutoModelForCausalLM.from_pretrained(model_id)#, device_map="auto")
tokenizer = AutoTokenizer.from_pretrained(model_id)
model.save_pretrained("./e5-mistral-7b-instruct_with_lm_head")
```
2. Merge using [mergekit](https://github.com/cg123/mergekit) with the following yaml configuration
merge_config.yaml
```
models:
- model: stabilityai/japanese-stablelm-base-gamma-7b
- model: ./e5-mistral-7b-instruct_with_lm_head
base_model: stabilityai/japanese-stablelm-base-gamma-7b
parameters:
t:
- filter: self_attn
value: [0.75, 0.25]
- filter: mlp
value: [0.75, 0.25]
- value: 0.5 # fallback for rest of tensors
merge_method: slerp
dtype: float16
```
I tried the "linear", "slerp", and "task_arithmetic" merging methods, and this setting seemed to be the best.
The choice of "t" parameters was set to use more japanese-stablelm-base-gamma-7b for the layer closer to the input to take advantage of Japanese word understanding,
and more e5-mistral-7b-instruct for the layer closer to the output to generate good embeddings.
As for the "ties" method, I could not find any parameters for density and weight that worked properly.
3. Copy settings related to pad_token from the e5-mistral-7b-instruct repository.
* config.json
* tokenizer.json
* tokenizer.model
* tokenizer_config.json |