File size: 1,308 Bytes
88a1c2b 4ca9be5 87eeba4 39d24ba 4ca9be5 88a1c2b 4ca9be5 fa320c2 4ca9be5 65ff1ff |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 |
---
license: mit
language:
- en
- ar
- fr
- de
- pt
- it
- es
- zh
- ja
- ko
pipeline_tag: feature-extraction
tags:
- sentiment-analysis
- text-classification
- generic
- sentiment-classification
- multilingual
---
## Model
Base version of e5-multilingual finetunned on an annotated subset of mC4 (multilingual C4). This model provide generic embedding for sentiment analysis. Embeddings can be used out of the box or fine tune on specific datasets.
Blog post: https://www.numind.ai/blog/creating-task-specific-foundation-models-with-gpt-4
## Usage
Below is an example to encode text and get embedding.
```python
import torch
from transformers import AutoTokenizer, AutoModel
model = AutoModel.from_pretrained("Numind/e5-multilingual-sentiment_analysis")
tokenizer = AutoTokenizer.from_pretrained("Numind/e5-multilingual-sentiment_analysis")
device = torch.device('cuda') if torch.cuda.is_available() else torch.device('cpu')
model.to(device)
size = 256
text = "This movie is amazing"
encoding = tokenizer(
text,
truncation=True,
padding='max_length',
max_length= size,
)
emb = model(
torch.reshape(torch.tensor(encoding.input_ids),(1,len(encoding.input_ids))).to(device),output_hidden_states=True
).hidden_states[-1].cpu().detach()
embText = torch.mean(emb,axis = 1)
``` |