|
--- |
|
library_name: transformers |
|
tags: [] |
|
--- |
|
|
|
## AfriSenti Yoruba Sentiment Regressor Description |
|
|
|
Takes a text and predicts the sentiment value between -1 (Negative) to 1 (Positive) with 0 being Neutral. |
|
|
|
Regression Value Description: |
|
|
|
| Value | Sentiment | |
|
|--|--| |
|
| -1 | Negative | |
|
| 0 | Neutral | |
|
| 1 | Positive | |
|
|
|
|
|
## How to Get Started with the Model |
|
|
|
Use the code below to get started with the model. |
|
|
|
``` |
|
import math |
|
import torch |
|
import pandas as pd |
|
from transformers import AutoModelForSequenceClassification, AutoTokenizer |
|
|
|
BATCH_SIZE = 32 |
|
ds = pd.read_csv('test.csv') |
|
BASE_MODEL = 'HausaNLP/afrisenti-yor-regression' |
|
|
|
device = 'cuda' if torch.cuda.is_available() else 'cpu' |
|
|
|
tokenizer = AutoTokenizer.from_pretrained(BASE_MODEL) |
|
model = AutoModelForSequenceClassification.from_pretrained(BASE_MODEL) |
|
|
|
nb_batches = math.ceil(len(ds)/BATCH_SIZE) |
|
y_preds = [] |
|
|
|
for i in range(nb_batches): |
|
input_texts = ds[i * BATCH_SIZE: (i+1) * BATCH_SIZE]["tweet"] |
|
encoded = tokenizer(input_texts, truncation=True, padding="max_length", max_length=256, return_tensors="pt").to(device) |
|
y_preds += model(**encoded).logits.reshape(-1).tolist() |
|
|
|
df = pd.DataFrame([ds['tweet'], ds['label'], y_preds], ["Text", "Label", "Prediction"]).T |
|
df.to_csv('predictions.csv', index=False) |
|
``` |