Tochka-AI commited on
Commit
09a1e5f
1 Parent(s): b02839a

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +17 -1
README.md CHANGED
@@ -15,6 +15,8 @@ This is an encoder model from **Tochka AI** based on the **RoPEBert** architectu
15
 
16
  The model source code is available in the file [modeling_rope_bert.py](https://huggingface.co/Tochka-AI/ruRoPEBert-classic-base-512/blob/main/modeling_rope_bert.py)
17
 
 
 
18
  ### Usage
19
 
20
  Important: To load the model correctly, you must enable code from the model repository `trust_remote_code=True`, this will download the modeling_rope_bert.py script and load the weights into the correct architecture.
@@ -60,6 +62,20 @@ To load the model with trainable classification head on top (change `num_labels`
60
  model = AutoModelForSequenceClassification.from_pretrained(model_name, trust_remote_code=True, attn_implementation='sdpa', num_labels=4)
61
  ```
62
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
63
  P.S. Don't forget to specify the dtype and device you need to use resources efficiently.
64
 
65
 
@@ -73,4 +89,4 @@ Evaluation of this model on encodechka benchmark:
73
 
74
  ### Authors
75
  - Sergei Bratchikov (Tochka AI Team, [HF](https://huggingface.co/hivaze), [GitHub](https://huggingface.co/hivaze))
76
- - Maxim Afanasiev (Tochka AI Team)
 
15
 
16
  The model source code is available in the file [modeling_rope_bert.py](https://huggingface.co/Tochka-AI/ruRoPEBert-classic-base-512/blob/main/modeling_rope_bert.py)
17
 
18
+ The model is trained on contexts up to 512 tokens in length, but can be used on larger contexts. For better quality, use the version of this model with extended context - Tochka-AI/ruRoPEBert-classic-base-2k
19
+
20
  ### Usage
21
 
22
  Important: To load the model correctly, you must enable code from the model repository `trust_remote_code=True`, this will download the modeling_rope_bert.py script and load the weights into the correct architecture.
 
62
  model = AutoModelForSequenceClassification.from_pretrained(model_name, trust_remote_code=True, attn_implementation='sdpa', num_labels=4)
63
  ```
64
 
65
+ #### With RoPE scaling
66
+
67
+ Allowed types for RoPE scaling are: `linear` and `dynamic`. To extend the model's context window you need to change tokenizer max length and add rope_scaling parameter.
68
+
69
+ If you want to scale your model context by 2x:
70
+ ```python
71
+ tokenizer.model_max_length = 1024
72
+ model = RoPEBertForMaskedLM.from_pretrained(model_name,
73
+ attn_implementation='sdpa',
74
+ max_position_embeddings=1024,
75
+ rope_scaling={'type': 'dynamic','factor': 2.0}
76
+ ) # 2.0 for x2 scaling, 4.0 for x6, etc..
77
+ ```
78
+
79
  P.S. Don't forget to specify the dtype and device you need to use resources efficiently.
80
 
81
 
 
89
 
90
  ### Authors
91
  - Sergei Bratchikov (Tochka AI Team, [HF](https://huggingface.co/hivaze), [GitHub](https://huggingface.co/hivaze))
92
+ - Maxim Afanasiev (Tochka AI Team, [HF](https://huggingface.co/mrapplexz), [GitHub](https://github.com/mrapplexz))