sdadas commited on
Commit
186b706
1 Parent(s): 59f364e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -0
README.md CHANGED
@@ -12,6 +12,7 @@ license: gemma
12
 
13
  This is a reranker for Polish based on [BAAI/bge-reranker-v2-m3](https://huggingface.co/BAAI/bge-reranker-v2-m3) and further fine-tuned on large dataset of text pairs:
14
  - We utilised [RankNet loss](https://icml.cc/Conferences/2015/wp-content/uploads/2015/06/icml_ranking.pdf) and trained the model on the same data as [sdadas/polish-reranker-roberta-v2](https://huggingface.co/sdadas/polish-reranker-roberta-v2)
 
15
  - [BAAI/bge-reranker-v2.5-gemma2-lightweight](https://huggingface.co/BAAI/bge-reranker-v2.5-gemma2-lightweight) was used as the teacher model for distillation
16
  - We used a custom implementation of XLM-RoBERTa with support for Flash Attention 2. If you want to use these features, load the model with the arguments `trust_remote_code=True` and `attn_implementation="flash_attention_2"`. This is especially important for this model, since [BAAI/bge-reranker-v2-m3](https://huggingface.co/BAAI/bge-reranker-v2-m3) supports long contexts of 8192 tokens. For such input length, the inference can be up to 400% faster with Flash Attention in comparison to the original model.
17
 
 
12
 
13
  This is a reranker for Polish based on [BAAI/bge-reranker-v2-m3](https://huggingface.co/BAAI/bge-reranker-v2-m3) and further fine-tuned on large dataset of text pairs:
14
  - We utilised [RankNet loss](https://icml.cc/Conferences/2015/wp-content/uploads/2015/06/icml_ranking.pdf) and trained the model on the same data as [sdadas/polish-reranker-roberta-v2](https://huggingface.co/sdadas/polish-reranker-roberta-v2)
15
+ - After the training, we merged the original and fine-tuned weights to create the final checkpoint
16
  - [BAAI/bge-reranker-v2.5-gemma2-lightweight](https://huggingface.co/BAAI/bge-reranker-v2.5-gemma2-lightweight) was used as the teacher model for distillation
17
  - We used a custom implementation of XLM-RoBERTa with support for Flash Attention 2. If you want to use these features, load the model with the arguments `trust_remote_code=True` and `attn_implementation="flash_attention_2"`. This is especially important for this model, since [BAAI/bge-reranker-v2-m3](https://huggingface.co/BAAI/bge-reranker-v2-m3) supports long contexts of 8192 tokens. For such input length, the inference can be up to 400% faster with Flash Attention in comparison to the original model.
18