sdadas
/

polish-reranker-bge-v2

Text Classification

information-retrieval

Inference Endpoints

Model card Files Files and versions Community

sdadas commited on Sep 28

Commit

186b706

•

1 Parent(s): 59f364e

Update README.md

Files changed (1) hide show

README.md +1 -0

README.md CHANGED Viewed

@@ -12,6 +12,7 @@ license: gemma
 This is a reranker for Polish based on [BAAI/bge-reranker-v2-m3](https://huggingface.co/BAAI/bge-reranker-v2-m3) and further fine-tuned on large dataset of text pairs:
 - We utilised [RankNet loss](https://icml.cc/Conferences/2015/wp-content/uploads/2015/06/icml_ranking.pdf) and trained the model on the same data as [sdadas/polish-reranker-roberta-v2](https://huggingface.co/sdadas/polish-reranker-roberta-v2)
 - [BAAI/bge-reranker-v2.5-gemma2-lightweight](https://huggingface.co/BAAI/bge-reranker-v2.5-gemma2-lightweight) was used as the teacher model for distillation
 - We used a custom implementation of XLM-RoBERTa with support for Flash Attention 2. If you want to use these features, load the model with the arguments `trust_remote_code=True` and `attn_implementation="flash_attention_2"`. This is especially important for this model, since [BAAI/bge-reranker-v2-m3](https://huggingface.co/BAAI/bge-reranker-v2-m3) supports long contexts of 8192 tokens. For such input length, the inference can be up to 400% faster with Flash Attention in comparison to the original model.

 This is a reranker for Polish based on [BAAI/bge-reranker-v2-m3](https://huggingface.co/BAAI/bge-reranker-v2-m3) and further fine-tuned on large dataset of text pairs:
 - We utilised [RankNet loss](https://icml.cc/Conferences/2015/wp-content/uploads/2015/06/icml_ranking.pdf) and trained the model on the same data as [sdadas/polish-reranker-roberta-v2](https://huggingface.co/sdadas/polish-reranker-roberta-v2)
+- After the training, we merged the original and fine-tuned weights to create the final checkpoint
 - [BAAI/bge-reranker-v2.5-gemma2-lightweight](https://huggingface.co/BAAI/bge-reranker-v2.5-gemma2-lightweight) was used as the teacher model for distillation
 - We used a custom implementation of XLM-RoBERTa with support for Flash Attention 2. If you want to use these features, load the model with the arguments `trust_remote_code=True` and `attn_implementation="flash_attention_2"`. This is especially important for this model, since [BAAI/bge-reranker-v2-m3](https://huggingface.co/BAAI/bge-reranker-v2-m3) supports long contexts of 8192 tokens. For such input length, the inference can be up to 400% faster with Flash Attention in comparison to the original model.