Update README.md
Browse files
README.md
CHANGED
@@ -38,7 +38,7 @@ model = AutoModel.from_pretrained(model_name, trust_remote_code=True, attn_imple
|
|
38 |
|
39 |
#### Getting embeddings
|
40 |
|
41 |
-
The correct pooler (`mean`) is already **built into the model architecture**, which averages embeddings based on the attention mask. You can also select the pooler type (`first_token_transform`), which performs a learnable linear transformation on the first token
|
42 |
|
43 |
To change built-in pooler implementation use `pooler_type` parameter in `AutoModel.from_pretrained` function
|
44 |
|
@@ -49,6 +49,7 @@ with torch.inference_mode():
|
|
49 |
```
|
50 |
|
51 |
In addition, you can calculate cosine similarities between texts in batch using normalization and matrix multiplication:
|
|
|
52 |
```python
|
53 |
import torch.nn.functional as F
|
54 |
F.normalize(pooled_output, dim=1) @ F.normalize(pooled_output, dim=1).T
|
@@ -64,24 +65,25 @@ model = AutoModelForSequenceClassification.from_pretrained(model_name, trust_rem
|
|
64 |
|
65 |
#### With RoPE scaling
|
66 |
|
67 |
-
Allowed types for RoPE scaling are: `linear` and `dynamic`. To extend the model's context window you need to change tokenizer max length and add rope_scaling parameter.
|
68 |
|
69 |
If you want to scale your model context by 2x:
|
70 |
```python
|
71 |
tokenizer.model_max_length = 1024
|
72 |
-
model =
|
73 |
-
|
74 |
-
|
75 |
-
|
76 |
-
|
|
|
77 |
```
|
78 |
|
79 |
P.S. Don't forget to specify the dtype and device you need to use resources efficiently.
|
80 |
|
81 |
-
|
82 |
### Metrics
|
83 |
|
84 |
Evaluation of this model on encodechka benchmark:
|
|
|
85 |
| Model name | STS | PI | NLI | SA | TI | IA | IC | ICX | NE1 | NE2 | Avg S (no NE) | Avg S+W (with NE) |
|
86 |
| --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
|
87 |
| **ruRoPEBert-classic-base-512** | 0.695 | 0.605 | 0.396 | 0.794 | 0.975 | 0.797 | 0.769 | 0.386 | 0.410 | 0.609 | 0.677 | 0.630 |
|
|
|
38 |
|
39 |
#### Getting embeddings
|
40 |
|
41 |
+
The correct pooler (`mean`) is already **built into the model architecture**, which averages embeddings based on the attention mask. You can also select the pooler type (`first_token_transform`), which performs a learnable linear transformation on the first token.
|
42 |
|
43 |
To change built-in pooler implementation use `pooler_type` parameter in `AutoModel.from_pretrained` function
|
44 |
|
|
|
49 |
```
|
50 |
|
51 |
In addition, you can calculate cosine similarities between texts in batch using normalization and matrix multiplication:
|
52 |
+
|
53 |
```python
|
54 |
import torch.nn.functional as F
|
55 |
F.normalize(pooled_output, dim=1) @ F.normalize(pooled_output, dim=1).T
|
|
|
65 |
|
66 |
#### With RoPE scaling
|
67 |
|
68 |
+
Allowed types for RoPE scaling are: `linear` and `dynamic`. To extend the model's context window you need to change tokenizer max length and add `rope_scaling` parameter.
|
69 |
|
70 |
If you want to scale your model context by 2x:
|
71 |
```python
|
72 |
tokenizer.model_max_length = 1024
|
73 |
+
model = AutoModel.from_pretrained(model_name,
|
74 |
+
trust_remote_code=True,
|
75 |
+
attn_implementation='sdpa',
|
76 |
+
max_position_embeddings=1024,
|
77 |
+
rope_scaling={'type': 'dynamic','factor': 2.0}
|
78 |
+
) # 2.0 for x2 scaling, 4.0 for x4, etc..
|
79 |
```
|
80 |
|
81 |
P.S. Don't forget to specify the dtype and device you need to use resources efficiently.
|
82 |
|
|
|
83 |
### Metrics
|
84 |
|
85 |
Evaluation of this model on encodechka benchmark:
|
86 |
+
|
87 |
| Model name | STS | PI | NLI | SA | TI | IA | IC | ICX | NE1 | NE2 | Avg S (no NE) | Avg S+W (with NE) |
|
88 |
| --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
|
89 |
| **ruRoPEBert-classic-base-512** | 0.695 | 0.605 | 0.396 | 0.794 | 0.975 | 0.797 | 0.769 | 0.386 | 0.410 | 0.609 | 0.677 | 0.630 |
|