|
--- |
|
license: mit |
|
language: |
|
- ja |
|
pipeline_tag: sentence-similarity |
|
--- |
|
|
|
This model was created by merging [intfloat/e5-mistral-7b-instruct](https://huggingface.co/intfloat/e5-mistral-7b-instruct) and [stabilityai/japanese-stablelm-base-gamma-7b](https://huggingface.co/stabilityai/japanese-stablelm-base-gamma-7b). |
|
See [intfloat/e5-mistral-7b-instruct](https://huggingface.co/intfloat/e5-mistral-7b-instruct) page for model usage. |
|
|
|
The steps to merge are as follows. |
|
|
|
1. Load intfloat/e5-mistral-7b-instruct as a "MistralForCausalLM" class and save_pretrained as is. |
|
|
|
Because e5-mistral-7b-instruct is made with the "MistralModel" class, it could not be merged with "MistraForCausalLM" as is. |
|
In my environment, I had to load into the CPU, not the GPU, or I would get an error. |
|
|
|
``` |
|
from transformers import AutoModelForCausalLM, AutoTokenizer |
|
|
|
model_id = "intfloat/e5-mistral-7b-instruct" |
|
model = AutoModelForCausalLM.from_pretrained(model_id)#, device_map="auto") |
|
tokenizer = AutoTokenizer.from_pretrained(model_id) |
|
|
|
model.save_pretrained("./e5-mistral-7b-instruct_with_lm_head") |
|
``` |
|
|
|
2. Merge using [mergekit](https://github.com/cg123/mergekit) with the following yaml configuration |
|
|
|
merge_config.yaml |
|
``` |
|
models: |
|
- model: stabilityai/japanese-stablelm-base-gamma-7b |
|
- model: ./e5-mistral-7b-instruct_with_lm_head |
|
base_model: stabilityai/japanese-stablelm-base-gamma-7b |
|
parameters: |
|
t: |
|
- filter: self_attn |
|
value: [0.75, 0.25] |
|
- filter: mlp |
|
value: [0.75, 0.25] |
|
- value: 0.5 # fallback for rest of tensors |
|
|
|
merge_method: slerp |
|
dtype: float16 |
|
``` |
|
|
|
I tried the "linear", "slerp", and "task_arithmetic" merging methods, and this setting seemed to be the best. |
|
The choice of "t" parameters was set to use more japanese-stablelm-base-gamma-7b for the layer closer to the input to take advantage of Japanese word understanding, |
|
and more e5-mistral-7b-instruct for the layer closer to the output to generate good embeddings. |
|
As for the "ties" method, I could not find any parameters for density and weight that worked properly. |
|
|
|
3. Copy settings related to pad_token from the e5-mistral-7b-instruct repository. |
|
|
|
* config.json |
|
* tokenizer.json |
|
* tokenizer.model |
|
* tokenizer_config.json |