Spaces:

Kurian07
/

RAG_LLM

Paused

App Files Files Community

Kurian07 commited on Nov 21, 2024

Commit

326e9c1

verified ·

1 Parent(s): 3f5717f

Update embeddingModel/README.md

Browse files

Files changed (1) hide show

embeddingModel/README.md +1 -49

embeddingModel/README.md CHANGED Viewed

@@ -2634,57 +2634,9 @@ We compared the performance of the GTE models with other popular text embedding
 | [sentence-t5-base](https://huggingface.co/sentence-transformers/sentence-t5-base) 	| 0.22 | 768 | 512 	| 55.27 | 40.21 | 85.18 | 53.09 | 33.63 | 81.14 | 31.39 | 69.81 |
-## Usage
-Code example
-```python
-import torch.nn.functional as F
-from torch import Tensor
-from transformers import AutoTokenizer, AutoModel
-def average_pool(last_hidden_states: Tensor,
-                 attention_mask: Tensor) -> Tensor:
-    last_hidden = last_hidden_states.masked_fill(~attention_mask[..., None].bool(), 0.0)
-    return last_hidden.sum(dim=1) / attention_mask.sum(dim=1)[..., None]
-input_texts = [
-    "what is the capital of China?",
-    "how to implement quick sort in python?",
-    "Beijing",
-    "sorting algorithms"
-]
-tokenizer = AutoTokenizer.from_pretrained("thenlper/gte-base")
-model = AutoModel.from_pretrained("thenlper/gte-base")
-# Tokenize the input texts
-batch_dict = tokenizer(input_texts, max_length=512, padding=True, truncation=True, return_tensors='pt')
-outputs = model(**batch_dict)
-embeddings = average_pool(outputs.last_hidden_state, batch_dict['attention_mask'])
-# (Optionally) normalize embeddings
-embeddings = F.normalize(embeddings, p=2, dim=1)
-scores = (embeddings[:1] @ embeddings[1:].T) * 100
-print(scores.tolist())
-```
-Use with sentence-transformers:
-```python
-from sentence_transformers import SentenceTransformer
-from sentence_transformers.util import cos_sim
-sentences = ['That is a happy person', 'That is a very happy person']
-model = SentenceTransformer('thenlper/gte-base')
-embeddings = model.encode(sentences)
-print(cos_sim(embeddings[0], embeddings[1]))
-```
-### Limitation
-This model exclusively caters to English texts, and any lengthy texts will be truncated to a maximum of 512 tokens.
 ### Citation

 | [sentence-t5-base](https://huggingface.co/sentence-transformers/sentence-t5-base) 	| 0.22 | 768 | 512 	| 55.27 | 40.21 | 85.18 | 53.09 | 33.63 | 81.14 | 31.39 | 69.81 |
+##
 ### Citation