Tochka-AI
/

ruRoPEBert-classic-base-512

Feature Extraction

text-embeddings-inference

Inference Endpoints

Model card Files Files and versions Community

Tochka-AI commited on Mar 4

Commit

09a1e5f

•

1 Parent(s): b02839a

Update README.md

Files changed (1) hide show

README.md +17 -1

README.md CHANGED Viewed

@@ -15,6 +15,8 @@ This is an encoder model from **Tochka AI** based on the **RoPEBert** architectu
 The model source code is available in the file [modeling_rope_bert.py](https://huggingface.co/Tochka-AI/ruRoPEBert-classic-base-512/blob/main/modeling_rope_bert.py)
 ### Usage
 Important: To load the model correctly, you must enable code from the model repository `trust_remote_code=True`, this will download the modeling_rope_bert.py script and load the weights into the correct architecture.
@@ -60,6 +62,20 @@ To load the model with trainable classification head on top (change `num_labels`
 model = AutoModelForSequenceClassification.from_pretrained(model_name, trust_remote_code=True, attn_implementation='sdpa', num_labels=4)
 ```
 P.S. Don't forget to specify the dtype and device you need to use resources efficiently.
@@ -73,4 +89,4 @@ Evaluation of this model on encodechka benchmark:
 ### Authors
 - Sergei Bratchikov (Tochka AI Team, [HF](https://huggingface.co/hivaze), [GitHub](https://huggingface.co/hivaze))
-- Maxim Afanasiev (Tochka AI Team)

 The model source code is available in the file [modeling_rope_bert.py](https://huggingface.co/Tochka-AI/ruRoPEBert-classic-base-512/blob/main/modeling_rope_bert.py)
+The model is trained on contexts up to 512 tokens in length, but can be used on larger contexts. For better quality, use the version of this model with extended context - Tochka-AI/ruRoPEBert-classic-base-2k
 ### Usage
 Important: To load the model correctly, you must enable code from the model repository `trust_remote_code=True`, this will download the modeling_rope_bert.py script and load the weights into the correct architecture.
 model = AutoModelForSequenceClassification.from_pretrained(model_name, trust_remote_code=True, attn_implementation='sdpa', num_labels=4)
 ```
+#### With RoPE scaling
+Allowed types for RoPE scaling are: `linear` and `dynamic`. To extend the model's context window you need to change tokenizer max length and add rope_scaling parameter.
+If you want to scale your model context by 2x:
+```python
+tokenizer.model_max_length = 1024
+model = RoPEBertForMaskedLM.from_pretrained(model_name,
+ attn_implementation='sdpa',
+ max_position_embeddings=1024,
+ rope_scaling={'type': 'dynamic','factor': 2.0}
+ ) # 2.0 for x2 scaling, 4.0 for x6, etc..
+```
 P.S. Don't forget to specify the dtype and device you need to use resources efficiently.
 ### Authors
 - Sergei Bratchikov (Tochka AI Team, [HF](https://huggingface.co/hivaze), [GitHub](https://huggingface.co/hivaze))
+- Maxim Afanasiev (Tochka AI Team, [HF](https://huggingface.co/mrapplexz), [GitHub](https://github.com/mrapplexz))