Upload README.md
Browse files
README.md
CHANGED
@@ -1062,6 +1062,12 @@ model-index:
|
|
1062 |
|
1063 |
## acge model
|
1064 |
|
|
|
|
|
|
|
|
|
|
|
|
|
1065 |
acge是一个通用的文本编码模型,是一个可变长度的向量化模型,使用了[Matryoshka Representation Learning](https://arxiv.org/abs/2205.13147),如图所示:
|
1066 |
|
1067 |
![matryoshka-small](./img/matryoshka-small.gif)
|
@@ -1179,7 +1185,7 @@ print(similarity)
|
|
1179 |
在sentence-transformer库中的使用方法,选取不同的维度:
|
1180 |
|
1181 |
```python
|
1182 |
-
import
|
1183 |
from sentence_transformers import SentenceTransformer
|
1184 |
|
1185 |
sentences = ["数据1", "数据2"]
|
@@ -1187,8 +1193,11 @@ model = SentenceTransformer('acge_text_embedding')
|
|
1187 |
embeddings = model.encode(sentences, normalize_embeddings=False)
|
1188 |
matryoshka_dim = 1024
|
1189 |
embeddings = embeddings[..., :matryoshka_dim] # Shrink the embedding dimensions
|
1190 |
-
embeddings =
|
1191 |
print(embeddings.shape)
|
1192 |
# => (2, 1024)
|
1193 |
|
1194 |
```
|
|
|
|
|
|
|
|
1062 |
|
1063 |
## acge model
|
1064 |
|
1065 |
+
![logo](./img/logo.png)
|
1066 |
+
|
1067 |
+
acge模型来自于[合合信息](https://www.intsig.com/)技术团队,对外技术试用平台[TextIn](https://www.textin.com/)。合合信息是行业领先的人工智能及大数据科技企业,致力于通过智能文字识别及商业大数据领域的核心技术、C端和B端产品以及行业解决方案为全球企业和个人用户提供创新的数字化、智能化服务。
|
1068 |
+
|
1069 |
+
技术交流请联系[yanhui]([email protected]),商务合作联系[simon]([email protected]),可以[点击图片](https://huggingface.co/aspire/acge_text_embedding/img/wx.jpg),扫面二维码来加入我们的微信社群。
|
1070 |
+
|
1071 |
acge是一个通用的文本编码模型,是一个可变长度的向量化模型,使用了[Matryoshka Representation Learning](https://arxiv.org/abs/2205.13147),如图所示:
|
1072 |
|
1073 |
![matryoshka-small](./img/matryoshka-small.gif)
|
|
|
1185 |
在sentence-transformer库中的使用方法,选取不同的维度:
|
1186 |
|
1187 |
```python
|
1188 |
+
from sklearn.preprocessing import normalize
|
1189 |
from sentence_transformers import SentenceTransformer
|
1190 |
|
1191 |
sentences = ["数据1", "数据2"]
|
|
|
1193 |
embeddings = model.encode(sentences, normalize_embeddings=False)
|
1194 |
matryoshka_dim = 1024
|
1195 |
embeddings = embeddings[..., :matryoshka_dim] # Shrink the embedding dimensions
|
1196 |
+
embeddings = normalize(embeddings, norm="l2", axis=1)
|
1197 |
print(embeddings.shape)
|
1198 |
# => (2, 1024)
|
1199 |
|
1200 |
```
|
1201 |
+
|
1202 |
+
|
1203 |
+
|