Update README.md
Browse files
README.md
CHANGED
@@ -15,6 +15,8 @@ This is an encoder model from **Tochka AI** based on the **RoPEBert** architectu
|
|
15 |
|
16 |
The model source code is available in the file [modeling_rope_bert.py](https://huggingface.co/Tochka-AI/ruRoPEBert-classic-base-512/blob/main/modeling_rope_bert.py)
|
17 |
|
|
|
|
|
18 |
### Usage
|
19 |
|
20 |
Important: To load the model correctly, you must enable code from the model repository `trust_remote_code=True`, this will download the modeling_rope_bert.py script and load the weights into the correct architecture.
|
@@ -60,6 +62,20 @@ To load the model with trainable classification head on top (change `num_labels`
|
|
60 |
model = AutoModelForSequenceClassification.from_pretrained(model_name, trust_remote_code=True, attn_implementation='sdpa', num_labels=4)
|
61 |
```
|
62 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
63 |
P.S. Don't forget to specify the dtype and device you need to use resources efficiently.
|
64 |
|
65 |
|
@@ -73,4 +89,4 @@ Evaluation of this model on encodechka benchmark:
|
|
73 |
|
74 |
### Authors
|
75 |
- Sergei Bratchikov (Tochka AI Team, [HF](https://huggingface.co/hivaze), [GitHub](https://huggingface.co/hivaze))
|
76 |
-
- Maxim Afanasiev (Tochka AI Team)
|
|
|
15 |
|
16 |
The model source code is available in the file [modeling_rope_bert.py](https://huggingface.co/Tochka-AI/ruRoPEBert-classic-base-512/blob/main/modeling_rope_bert.py)
|
17 |
|
18 |
+
The model is trained on contexts up to 512 tokens in length, but can be used on larger contexts. For better quality, use the version of this model with extended context - Tochka-AI/ruRoPEBert-classic-base-2k
|
19 |
+
|
20 |
### Usage
|
21 |
|
22 |
Important: To load the model correctly, you must enable code from the model repository `trust_remote_code=True`, this will download the modeling_rope_bert.py script and load the weights into the correct architecture.
|
|
|
62 |
model = AutoModelForSequenceClassification.from_pretrained(model_name, trust_remote_code=True, attn_implementation='sdpa', num_labels=4)
|
63 |
```
|
64 |
|
65 |
+
#### With RoPE scaling
|
66 |
+
|
67 |
+
Allowed types for RoPE scaling are: `linear` and `dynamic`. To extend the model's context window you need to change tokenizer max length and add rope_scaling parameter.
|
68 |
+
|
69 |
+
If you want to scale your model context by 2x:
|
70 |
+
```python
|
71 |
+
tokenizer.model_max_length = 1024
|
72 |
+
model = RoPEBertForMaskedLM.from_pretrained(model_name,
|
73 |
+
attn_implementation='sdpa',
|
74 |
+
max_position_embeddings=1024,
|
75 |
+
rope_scaling={'type': 'dynamic','factor': 2.0}
|
76 |
+
) # 2.0 for x2 scaling, 4.0 for x6, etc..
|
77 |
+
```
|
78 |
+
|
79 |
P.S. Don't forget to specify the dtype and device you need to use resources efficiently.
|
80 |
|
81 |
|
|
|
89 |
|
90 |
### Authors
|
91 |
- Sergei Bratchikov (Tochka AI Team, [HF](https://huggingface.co/hivaze), [GitHub](https://huggingface.co/hivaze))
|
92 |
+
- Maxim Afanasiev (Tochka AI Team, [HF](https://huggingface.co/mrapplexz), [GitHub](https://github.com/mrapplexz))
|