Tochka-AI commited on
Commit
ce7fce4
1 Parent(s): cdac354

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +10 -8
README.md CHANGED
@@ -38,7 +38,7 @@ model = AutoModel.from_pretrained(model_name, trust_remote_code=True, attn_imple
38
 
39
  #### Getting embeddings
40
 
41
- The correct pooler (`mean`) is already **built into the model architecture**, which averages embeddings based on the attention mask. You can also select the pooler type (`first_token_transform`), which performs a learnable linear transformation on the first token
42
 
43
  To change built-in pooler implementation use `pooler_type` parameter in `AutoModel.from_pretrained` function
44
 
@@ -49,6 +49,7 @@ with torch.inference_mode():
49
  ```
50
 
51
  In addition, you can calculate cosine similarities between texts in batch using normalization and matrix multiplication:
 
52
  ```python
53
  import torch.nn.functional as F
54
  F.normalize(pooled_output, dim=1) @ F.normalize(pooled_output, dim=1).T
@@ -64,24 +65,25 @@ model = AutoModelForSequenceClassification.from_pretrained(model_name, trust_rem
64
 
65
  #### With RoPE scaling
66
 
67
- Allowed types for RoPE scaling are: `linear` and `dynamic`. To extend the model's context window you need to change tokenizer max length and add rope_scaling parameter.
68
 
69
  If you want to scale your model context by 2x:
70
  ```python
71
  tokenizer.model_max_length = 1024
72
- model = RoPEBertForMaskedLM.from_pretrained(model_name,
73
- attn_implementation='sdpa',
74
- max_position_embeddings=1024,
75
- rope_scaling={'type': 'dynamic','factor': 2.0}
76
- ) # 2.0 for x2 scaling, 4.0 for x4, etc..
 
77
  ```
78
 
79
  P.S. Don't forget to specify the dtype and device you need to use resources efficiently.
80
 
81
-
82
  ### Metrics
83
 
84
  Evaluation of this model on encodechka benchmark:
 
85
  | Model name | STS | PI | NLI | SA | TI | IA | IC | ICX | NE1 | NE2 | Avg S (no NE) | Avg S+W (with NE) |
86
  | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
87
  | **ruRoPEBert-classic-base-512** | 0.695 | 0.605 | 0.396 | 0.794 | 0.975 | 0.797 | 0.769 | 0.386 | 0.410 | 0.609 | 0.677 | 0.630 |
 
38
 
39
  #### Getting embeddings
40
 
41
+ The correct pooler (`mean`) is already **built into the model architecture**, which averages embeddings based on the attention mask. You can also select the pooler type (`first_token_transform`), which performs a learnable linear transformation on the first token.
42
 
43
  To change built-in pooler implementation use `pooler_type` parameter in `AutoModel.from_pretrained` function
44
 
 
49
  ```
50
 
51
  In addition, you can calculate cosine similarities between texts in batch using normalization and matrix multiplication:
52
+
53
  ```python
54
  import torch.nn.functional as F
55
  F.normalize(pooled_output, dim=1) @ F.normalize(pooled_output, dim=1).T
 
65
 
66
  #### With RoPE scaling
67
 
68
+ Allowed types for RoPE scaling are: `linear` and `dynamic`. To extend the model's context window you need to change tokenizer max length and add `rope_scaling` parameter.
69
 
70
  If you want to scale your model context by 2x:
71
  ```python
72
  tokenizer.model_max_length = 1024
73
+ model = AutoModel.from_pretrained(model_name,
74
+ trust_remote_code=True,
75
+ attn_implementation='sdpa',
76
+ max_position_embeddings=1024,
77
+ rope_scaling={'type': 'dynamic','factor': 2.0}
78
+ ) # 2.0 for x2 scaling, 4.0 for x4, etc..
79
  ```
80
 
81
  P.S. Don't forget to specify the dtype and device you need to use resources efficiently.
82
 
 
83
  ### Metrics
84
 
85
  Evaluation of this model on encodechka benchmark:
86
+
87
  | Model name | STS | PI | NLI | SA | TI | IA | IC | ICX | NE1 | NE2 | Avg S (no NE) | Avg S+W (with NE) |
88
  | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
89
  | **ruRoPEBert-classic-base-512** | 0.695 | 0.605 | 0.396 | 0.794 | 0.975 | 0.797 | 0.769 | 0.386 | 0.410 | 0.609 | 0.677 | 0.630 |