upskyy/ko-reranker

ko-rerankerλŠ” BAAI/bge-reranker-large λͺ¨λΈμ— ν•œκ΅­μ–΄ 데이터λ₯Ό finetuning ν•œ model μž…λ‹ˆλ‹€.

Usage

Using FlagEmbedding

pip install -U FlagEmbedding

Get relevance scores (higher scores indicate more relevance):

from FlagEmbedding import FlagReranker


reranker = FlagReranker('upskyy/ko-reranker', use_fp16=True) # Setting use_fp16 to True speeds up computation with a slight performance degradation

score = reranker.compute_score(['query', 'passage'])
print(score) # -1.861328125

# You can map the scores into 0-1 by set "normalize=True", which will apply sigmoid function to the score
score = reranker.compute_score(['query', 'passage'], normalize=True)
print(score) # 0.13454832326359276

scores = reranker.compute_score([['what is panda?', 'hi'], ['what is panda?', 'The giant panda (Ailuropoda melanoleuca), sometimes called a panda bear or simply panda, is a bear species endemic to China.']])
print(scores) # [-7.37109375, 8.5390625]

# You can map the scores into 0-1 by set "normalize=True", which will apply sigmoid function to the score
scores = reranker.compute_score([['what is panda?', 'hi'], ['what is panda?', 'The giant panda (Ailuropoda melanoleuca), sometimes called a panda bear or simply panda, is a bear species endemic to China.']], normalize=True)
print(scores) # [0.0006287840192903181, 0.9998043646624727]

Using Sentence-Transformers

pip install -U sentence-transformers

Get relevance scores (higher scores indicate more relevance):

from sentence_transformers import SentenceTransformer


sentences_1 = ["경제 μ „λ¬Έκ°€κ°€ 금리 μΈν•˜μ— λŒ€ν•œ μ˜ˆμΈ‘μ„ ν•˜κ³  μžˆλ‹€.", "주식 μ‹œμž₯μ—μ„œ ν•œ νˆ¬μžμžκ°€ 주식을 λ§€μˆ˜ν•œλ‹€."]
sentences_2 = ["ν•œ νˆ¬μžμžκ°€ λΉ„νŠΈμ½”μΈμ„ λ§€μˆ˜ν•œλ‹€.", "금육 κ±°λž˜μ†Œμ—μ„œ μƒˆλ‘œμš΄ 디지털 μžμ‚°μ΄ 상μž₯λœλ‹€."]

model = SentenceTransformer('upskyy/ko-reranker')

embeddings_1 = model.encode(sentences_1, normalize_embeddings=True)
embeddings_2 = model.encode(sentences_2, normalize_embeddings=True)
similarity = embeddings_1 @ embeddings_2.T

print(similarity)

Using Huggingface transformers

Get relevance scores (higher scores indicate more relevance):

import torch
from transformers import AutoModelForSequenceClassification, AutoTokenizer


tokenizer = AutoTokenizer.from_pretrained('upskyy/ko-reranker')
model = AutoModelForSequenceClassification.from_pretrained('upskyy/ko-reranker')
model.eval()

pairs = [['what is panda?', 'hi'], ['what is panda?', 'The giant panda (Ailuropoda melanoleuca), sometimes called a panda bear or simply panda, is a bear species endemic to China.']]

with torch.no_grad():
    inputs = tokenizer(pairs, padding=True, truncation=True, return_tensors='pt', max_length=512)
    scores = model(**inputs, return_dict=True).logits.view(-1, ).float()
    print(scores)

Citation

@misc{bge_embedding,
      title={C-Pack: Packaged Resources To Advance General Chinese Embedding}, 
      author={Shitao Xiao and Zheng Liu and Peitian Zhang and Niklas Muennighoff},
      year={2023},
      eprint={2309.07597},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

License

FlagEmbedding is licensed under the MIT License. The released models can be used for commercial purposes free of charge.

Reference

Downloads last month
801
Safetensors
Model size
560M params
Tensor type
F32
Β·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Collection including upskyy/ko-reranker