sdadas commited on
Commit
62f1366
1 Parent(s): 186b706

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -12,8 +12,8 @@ license: gemma
12
 
13
  This is a reranker for Polish based on [BAAI/bge-reranker-v2-m3](https://huggingface.co/BAAI/bge-reranker-v2-m3) and further fine-tuned on large dataset of text pairs:
14
  - We utilised [RankNet loss](https://icml.cc/Conferences/2015/wp-content/uploads/2015/06/icml_ranking.pdf) and trained the model on the same data as [sdadas/polish-reranker-roberta-v2](https://huggingface.co/sdadas/polish-reranker-roberta-v2)
15
- - After the training, we merged the original and fine-tuned weights to create the final checkpoint
16
  - [BAAI/bge-reranker-v2.5-gemma2-lightweight](https://huggingface.co/BAAI/bge-reranker-v2.5-gemma2-lightweight) was used as the teacher model for distillation
 
17
  - We used a custom implementation of XLM-RoBERTa with support for Flash Attention 2. If you want to use these features, load the model with the arguments `trust_remote_code=True` and `attn_implementation="flash_attention_2"`. This is especially important for this model, since [BAAI/bge-reranker-v2-m3](https://huggingface.co/BAAI/bge-reranker-v2-m3) supports long contexts of 8192 tokens. For such input length, the inference can be up to 400% faster with Flash Attention in comparison to the original model.
18
 
19
  In most cases, the use of [sdadas/polish-reranker-roberta-v2](https://huggingface.co/sdadas/polish-reranker-roberta-v2) is preferred to this model as it achieves better results for Polish. The main advantage of this model is its context length, so it can perform better on datasets with long documents.
 
12
 
13
  This is a reranker for Polish based on [BAAI/bge-reranker-v2-m3](https://huggingface.co/BAAI/bge-reranker-v2-m3) and further fine-tuned on large dataset of text pairs:
14
  - We utilised [RankNet loss](https://icml.cc/Conferences/2015/wp-content/uploads/2015/06/icml_ranking.pdf) and trained the model on the same data as [sdadas/polish-reranker-roberta-v2](https://huggingface.co/sdadas/polish-reranker-roberta-v2)
 
15
  - [BAAI/bge-reranker-v2.5-gemma2-lightweight](https://huggingface.co/BAAI/bge-reranker-v2.5-gemma2-lightweight) was used as the teacher model for distillation
16
+ - After the training, we merged the original and fine-tuned weights to create the final checkpoint
17
  - We used a custom implementation of XLM-RoBERTa with support for Flash Attention 2. If you want to use these features, load the model with the arguments `trust_remote_code=True` and `attn_implementation="flash_attention_2"`. This is especially important for this model, since [BAAI/bge-reranker-v2-m3](https://huggingface.co/BAAI/bge-reranker-v2-m3) supports long contexts of 8192 tokens. For such input length, the inference can be up to 400% faster with Flash Attention in comparison to the original model.
18
 
19
  In most cases, the use of [sdadas/polish-reranker-roberta-v2](https://huggingface.co/sdadas/polish-reranker-roberta-v2) is preferred to this model as it achieves better results for Polish. The main advantage of this model is its context length, so it can perform better on datasets with long documents.