thenlper
/

gte-base

Sentence Similarity

sentence-transformers

Sentence Transformers

text-embeddings-inference

Inference Endpoints

Model card Files Files and versions Community

thenlper commited on Aug 8, 2023

Commit

2b1a85a

•

1 Parent(s): 5b7a05d

Update README.md

Files changed (1) hide show

README.md +16 -1

README.md CHANGED Viewed

@@ -2606,7 +2606,7 @@ license: mit
 # gte-base
-Gegeral Text Embeddings (GTE) model.
 The GTE models are trained by Alibaba DAMO Academy. They are mainly based on the BERT framework and currently offer three different sizes of models, including [GTE-large](https://huggingface.co/thenlper/gte-large), [GTE-base](https://huggingface.co/thenlper/gte-base), and [GTE-small](https://huggingface.co/thenlper/gte-small). The GTE models are trained on a large-scale corpus of relevance text pairs, covering a wide range of domains and scenarios. This enables the GTE models to be applied to various downstream tasks of text embeddings, including **information retrieval**, **semantic textual similarity**, **text reranking**, etc.
@@ -2684,3 +2684,18 @@ print(cos_sim(embeddings[0], embeddings[1]))
 ### Limitation
 This model exclusively caters to English texts, and any lengthy texts will be truncated to a maximum of 512 tokens.

 # gte-base
+Gegeral Text Embeddings (GTE) model. [Towards General Text Embeddings with Multi-stage Contrastive Learning](https://arxiv.org/abs/2308.03281)
 The GTE models are trained by Alibaba DAMO Academy. They are mainly based on the BERT framework and currently offer three different sizes of models, including [GTE-large](https://huggingface.co/thenlper/gte-large), [GTE-base](https://huggingface.co/thenlper/gte-base), and [GTE-small](https://huggingface.co/thenlper/gte-small). The GTE models are trained on a large-scale corpus of relevance text pairs, covering a wide range of domains and scenarios. This enables the GTE models to be applied to various downstream tasks of text embeddings, including **information retrieval**, **semantic textual similarity**, **text reranking**, etc.
 ### Limitation
 This model exclusively caters to English texts, and any lengthy texts will be truncated to a maximum of 512 tokens.
+### Citation
+If you find our paper or models helpful, please consider citing them as follows:
+```
+@misc{li2023general,
+      title={Towards General Text Embeddings with Multi-stage Contrastive Learning},
+      author={Zehan Li and Xin Zhang and Yanzhao Zhang and Dingkun Long and Pengjun Xie and Meishan Zhang},
+      year={2023},
+      eprint={2308.03281},
+      archivePrefix={arXiv},
+      primaryClass={cs.CL}
+}
+```