Merging LoRA adapters into base weights
Hi, I am looking to merge the "text-matching" LoRA adapters into the base weights and to use the model effectively as a XLMRoberta
to make it play nice with optimum
.
Some context on what I am trying to accomplish and what I have tried:
I want to serve the jina-embeddings-v3 model using infinity to leverage async batching as I have with other model (e5, mpnet...). When using the torch
backend infinity attempts to load the model as a BetterTransformer, but fails with the following error:
2025-01-16 16:51:58,806 infinity_emb ERROR: BetterTransformer is not available for model: <class 'transformers_modules.jinaai.xlm-roberta-flash-implementation.2b6bc3f30750b3a9648fe9b63448c09920efe9be.modeling_lora.XLMRobertaLoRA'> The transformation of the model XLMRobertaLoRA to BetterTransformer failed while it should not. Please fill a bug report or open a PR to support this model at https://github.com/huggingface/optimum/. Continue without bettertransformer modeling code.
Similar issues occur when trying to use the onnx
backend as this model requires a task_id
input which onnx scrapes when preparing the input.
...
File /opt/conda/lib/python3.9/site-packages/optimum/onnxruntime/modeling_ort.py:936, in ORTModel._prepare_onnx_inputs(self, use_torch, **inputs)
934 # converts pytorch inputs into numpy inputs for onnx
935 for input_name in self.input_names.keys():
--> 936 onnx_inputs[input_name] = inputs.pop(input_name)
938 if use_torch:
939 onnx_inputs[input_name] = onnx_inputs[input_name].numpy(force=True)
KeyError: 'task_id'
Hence, I want to merge the "text-matching" LoRA adapters to the base model, converting it so that it is supported by optimum or onnx for optimized inference. I am running into issues merging with peft
as the model does not have a adapter_config.json
file.
Any advice on how to move forward, or if there is a path of least resistance I have not yet thought of to get this model to play nice with infinity
. Thanks a lot!