xiaobu-embedding

模型:基于GTE模型[1]多任务微调。
数据:闲聊类Query-Query、知识类Query-Doc、BGE开源Query-Doc[2];清洗正例,挖掘中等难度负例;累计6M(质量更重要)。

Usage (Sentence-Transformers)

pip install -U sentence-transformers

相似度计算:

from sentence_transformers import SentenceTransformer
sentences_1 = ["样例数据-1", "样例数据-2"]
sentences_2 = ["样例数据-3", "样例数据-4"]
model = SentenceTransformer('lier007/xiaobu-embedding')
embeddings_1 = model.encode(sentences_1, normalize_embeddings=True)
embeddings_2 = model.encode(sentences_2, normalize_embeddings=True)
similarity = embeddings_1 @ embeddings_2.T
print(similarity)

Evaluation

参考BGE中文CMTEB评估[2]

Finetune

参考BGE微调模块[2]

Reference

  1. https://huggingface.co/thenlper/gte-large-zh
  2. https://github.com/FlagOpen/FlagEmbedding
Downloads last month
315
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Spaces using lier007/xiaobu-embedding 4

Evaluation results